Oct 312014

In penetration testing, as in life, there’s no substitute for reconnaissance.

Simply put, the better prepared man has a much better chance of success.

Reconnaissance a.k.a. information gathering or research is a crucial first step in the penetration testing process.

Unlike other phases of penetration testing, there’s no clear, defined structure or path to reconnaissance.

Depending on the complexity of your task, the reconnaissance exercise may take hours, days or in some instances even weeks (well, nobody ever said Rome was built in a day).

Most reconnaissance tasks can be accomplished with any Linux distro but for quick learning purposes I recommend you install Kali Linux (successor to Backtrack Linux) because the necessary tools are already built into this distro and will save you enormous time.

In this post, I have no intention of writing an encyclopedia on reconnaissance techniques for penetration testing. The options are just far too many to be contained in a single blog post.

Instead, I will provide you with enough information to whet your appetite to explore the topic in greater depth on your own.

Chapter 2 of The Basics of Hacking and Penetration Testing by Patrick Engebretson provides a brief overview of the reconnaissance concept.

Without further ado, I suggest you go ahead and install Kali Linux on your computer.

Reconnaisance – Two Kinds

Now all reconnaissance is not the same in penetration testing.

Specifically, there are two types of reconnaissance in penetration testing:

* Active Reconnaissance – Those that involve interacting directly with the target computer or network
* Passive Reconnaissance – Activities that do not directly touch the target

Both kinds of reconnaissance are indispensable to penetration testing.

As a penetration tester, you must NOT embark on active reconnaissance unless you’re authorized to do so. With active reconnaissance, you are bound to leave digital footprints and your activities will be logged by the target.

Since stealth is not an option in active reconnaissance, you also open yourself to legal jeopardy.

Passive reconnaissance is non-intrusive by nature and carries far less legal risks. Here you will not be sending a single data packet to the target. Instead, you gather information by tapping into publicly available data sources.

During passive reconnaissance, since the target in 99.99% of cases is unaware of your activities there’s no possibility of your steps being logged or tracked. You operate essentially in stealth mode.

Here are a few useful tools for passive reconnaissance. If you’re using Kali Linux, these tools are built into the system. With other distros, you’ll have to install them.

Some reconnaisance tasks are accomplished on the command line and others via graphical tools or search engines like Google, Bing and Linkedin.

Passive Reconnaissance

Let’s consider a few key passive reconnaissance techniques.


A command line tool built into most Linux distributions that provides some basic information on the target domain.

* whois example.com
The above command provides information on where the target web site is hosted, nameservers, domain registrar, owner of the domain and address of the target (registrant/owner).

* dmitry -w example.com
Provides target domain’s WHOIS information plus IP address of the web site. Dmitry is built into Kali Linux. Users on other Linux distros have to download it.

It’s possible that a few Top Level Domains (TLDs) may not allow queries via WHOIS command. In such instances, try the WHOIS for major domain name registrars like GoDaddy, Network Solutions or eNom. Private and anonymous registrations also pose a hurdle in collecting WHOIS information.

IP Address

* host example.com
The above command will spit out the IP address of the target domain. Getting the IP address of the target is a huge step in the penetration testing process because the IP address is the foundation of the target’s online presence.

* host -a example.com
Besides the IP address, the above command provides more information on target domain such as nameservers and hosting provider.

* dmitry -i example.com
Pulls up the IP address of target

* dmitry -i [IP Address]
Gives you the domain name of the target IP address

Dmitry is built into Kali Linux.

Find e-mail Addresses

Getting e-mail addresses of employees is a huge step in penetration testing.

If you lay your hands on one or two e-mail addresses of the target, it’s easy to guess the e-mail IDs of other employees because most companies follow a single system for alloting e-mail to their staff.

With e-mail addresses in hand, penetration testers can now look for security holes to ‘drop’ malware into target company’s computer systems or network.

Developed by Christian Martorella, theHarvester is built into Kali Linux.

* theharvester -d example.com -l 10 -b google
Above command lets you google for e-mail addresses, subdomains and hosts of example.com. In the command, -d stands for target domain, -b for data source (google, bing, linkedin etc) and -l (lower case alphabet l, not the number 1) for number of results.

To access Harvester on Kali Linux, click on Applications on top left of your screen, scroll down to Kali Linux, highlight Information Gathering, next highlight OSINT Analysis and finally scroll down and click on theharvester.

* dmitry -e example.com
Searches google for e-mail addresses. I found theHarvester to be better for locating e-mail addresses of target domains.

Typosquatting for Theft

The more popular a target domain is, the more typosquatting domains you’ll find similar to the target. In many instances, these typosquatting domains are registered for purely malicious purposes like e-mail collection, URL hijacking, phishing and general corporate espionage.

And the tool to locate typosquatting domains is urlcrazy (developed by Andrew Horton).

* /usr/bin/urlcrazy -p example.com
The above command will pull up results of domains with very similar names to target domains.

Domain typos often allows criminals to get e-mail meant for the target business and is obviously a security hole and a cyberespionage tool.

Results from urlcrazy are often the basis for further research since you may get access to IP addresses and names of mail servers of some typosquatting domains.

To access urlcrazy on your Kali Linux distro, click on Applications on top left of your screen, scroll down to Kali Linux, then highlight Information Gathering, next highlight OSINT Analysis and finally scroll down and click on urlcrazy.

Underlying Technologies

If you’re doing penetration testing, the importance of knowing the basic technologies of the target site cannot be overemphasized.

Netcraft provides a wealth of information on the target domain’s web site, particularly the core technologies powering the site.

The results often provide information on server side technologies deployed on the target site (what OS it’s running, whether it’s running PHP, XML, WordPress blog, a custom search engine, use of HTTP compression etc) in addition to spitting out IP address, domain registrar and physical location of the target’s business.

Active Reconnaissance Tools

Unless you have authorization for penetration testing purposes, do not attempt active reconnaissance on third party web sites.

The chances of your getting caught and into serious legal trouble are high.

If you’re bent on penetration testing reconnaissance on a live site, I suggest you set up your own web site and practice your skills using it as a target.

Copying Web Sites

With copying or mirroring of target sites, we’re treading into the dangerous terrain of copyright and intellectual property theft.

Once you’re on the target site, your activities will be logged and often actively monitored.

But there are powerful incentives for penetration testers to have a copy of your target’s site on your computer because it lets you do indepth research without leaving continuous digital footprints for extended periods of time.

There are multiple tools available for penetration testers to copy web site content to your local computer.

Creates a mirror of your target web site on your local computer to analyze in depth without lingering for long on the target site.

There are two versions of httrack – The command line version (httrack) and the GUI version (WebHttrack). Both versions were not on my Kali Linux. But you can easily install them. To install httrack or Webhttrack on your Kali Linux distro, click on Applications on top left of your screen, scroll down and highlight System Tools, then highlight and click Add/Remove Software. Now you will be able to install httrack or Webhttrack by searching for it in the search box in the top left of the Add/Remove Software page. If you’re running Linux Mint, httrack or Webhttrack are available via the Software Manager.

* httrack -w www.example.com
The above command will mirror site www.example.com on your local computer.

Use httrack –help to dig deep into the capabilities of this powerful tool.

wget is a powerful command line tool that allows us to download your target domain’s content to the local computer for thorough analysis.

* wget –wait=20 –limit-rate=20K -r -p -U Mozilla http://www.example.com/private-news.html
This command is useful if you want to avoid being blacklisted by the site owner. In the above command, you have specified –wait=20 to pause 20 seconds between retrievals, and –limit-rate defaults to bytes (add K to set KB/s).

* wget -r -p -U Mozilla http://www.example.com/private-news.html
Some web sites will not allow access to outsiders without a browser. In such situations, the above command will be a huge help.

Curl is a command line utility similar to Wget.

But curl suffers from one major disadvantage.

Unlike wget, curl cannot handle recursive downloads but on the plus side it supports more protocols than wget.

* curl http://www.example.com

* curl http://www.example.com > example.html
Download and save to a filename

* curl -o new-test.html example.com/test.html
Download and save to filename new-test.html

* curl -O http://www.example.com/test.html
Download and save to filename test.html

Google Directives

Is anyone surprised that Google should be a solid source for reconnaissance.

Google’s ‘directives’ are a more effective and manageable way of getting information from the search engine than throwing a bunch of key words into the search box and hitting search.

By using Google ‘directives,” you are greatly minimizing search results overload that result from a general search.

Site directives include the name of the directive, colon and the term you wish to include in the directive.

Here are a bunch of Google “directives” that you can use in gathering information about your target:

1. site:domain name(s)
Example: site:microsoft.com satya nadella

Site-specific search, i.e. restricting your search to a single web site, is often a better way of extracting information during penetration testing of a target site.

In the above example, you get a managable set of results (3,830) compared to the 6.88 million results you get by just running the name of Microsoft’s new CEO satya nadella on Google without directives.

2. allintitle:index of
Your Google search will include only those results containing all key words in web page title

3. intitle:index of
Search results will include only those that contain at least a single key word in web page title

4. inurl:admin
Search results will include term(s) present in url

5. cache:sitename
You can limit your search results to Google’s cache of target web site via the above command.

6. filetype:pdf
Limit your search to specific files like .pdf, docx, .ppt, txt etc

7. Multiple directives
You can also combine multiple directives in the same search.
Example: site:whitehouse.gov barack obama filetype:pdf
(The above example produced 1,050 results from the White House web site including the President’s long-form birth certificate).

Subdomain Information

Dnsmap (short for DNS Network Mapper) has been around since 2006 and was conceived as a tool for use by pentesters during their information gathering process.

It’s included in Kali Linux (and was part of its earlier avatar Backtrack Linux too).

This sub-domain brute force utility provides a way for penetration testers to obtain sub-domains and IP addresses of the target domain.

Here’s an example of a dnsmap command:

* dnsmap example.com -r /testing/bf-results.txt
Obtains sub-domains and IP addresses of example.com and exports them in text format to bf-results.txt

* dnsmap example.com -w bobby.txt -r /testing/bf-results.txt
Here we’re using our own wordlist (bobby.txt) to use for brute force

Usage: dnsmap [options]
-i (useful if you’re obtaining false positives)

I hope this post has intrigued you into digging deep into the subject of reconnaisance.

What you have read above is the tip of the reconnaisance iceberg.

If you’re intent on becoming a penetration testing expert, I recommend you download Kali Linux and check out all the information gathering tools available there and the various options for each of them.

Useful Resources:
There’s Money in Penetration Testing
Download Kali Linux and explore various information gathering options in the distro

Sorry, the comment form is closed at this time.