Intro
Web reconnaissance is the foundation of a strong security test. It maps assets and technology choices before deeper testing begins and reduces blind spots.
Types of Recon
Recon uses two approaches: active and passive. Active touches systems directly, while passive relies on public sources.
Active Recon
Active recon interacts with the target to gather accurate data. It is effective but increases detection risk because requests are logged by servers and security tools.
| Technique | Description | Example | Tools | Detection Risk |
|---|---|---|---|---|
| Port Scanning | Identify open ports and exposed services. | Scan HTTP and HTTPS. | Nmap | High |
| Vulnerability Scanning | Probe for known weaknesses. | Check outdated servers. | Nikto | High |
| Banner Grabbing | Read service banners for versions. | Capture HTTP headers. | curl | Low |
Passive Recon
Passive recon gathers information without touching the target. It relies on public sources, which lowers detection risk.
| Technique | Description | Example | Tools | Detection Risk |
|---|---|---|---|---|
| Search Queries | Discover public assets and docs. | Find exposed PDFs. | Very Low | |
| WHOIS Lookups | Retrieve domain registration data. | Query registrant info. | whois | Very Low |
| DNS Analysis | Inspect records and subdomains. | Check MX and NS. | dig | Very Low |
WHOIS
WHOIS is a query protocol for registration data on domains. It can also provide IP block ownership for scoping.
whois inlanefreight.com
Common fields include registrar, contacts, dates, and name servers, which provide ownership context for OSINT. This data helps confirm scope quickly.
Why WHOIS Matters for Web Recon
WHOIS can reveal contacts and infrastructure ownership clues. It also shows name servers that hint at hosting providers.
Historical records show how ownership or hosting changed. That context explains legacy systems and exposure.
DNS and Subdomains
DNS maps human friendly names to numeric IP addresses. It reveals service dependencies and hidden public hosts. This makes DNS a core recon surface.
DNS Overview
DNS translates names to IPs and guides traffic routing. Understanding its flow is essential for safe enumeration.
How DNS Works
A resolver queries root, then TLD, then authoritative servers. Each step returns the next authority until the IP answer is returned to the client.
DNS Files
DNS records live in zone files and configs on servers. Common types include A, AAAA, CNAME, MX, NS, TXT, SOA, and PTR, each mapping a specific role.
Digging DNS
Several tools query DNS, but dig is the most flexible.
It supports custom flags and detailed output for analysis.
Dig
dig allows precise resolver control and record queries.
This makes it ideal for targeted enumeration and repeatable
results across multiple servers.
Common Dig Commands
| Command | Description |
|---|---|
dig domain.com | Default A record lookup. |
dig domain.com MX | Mail exchanger lookup. |
dig domain.com NS | Authoritative name servers. |
dig +trace domain.com | Full resolution path. |
dig -x 192.168.1.1 | Reverse lookup for an IP. |
Subdomains
Subdomain discovery expands the visible surface of a target. Each subdomain can host a separate app or service stack.
Subdomain Enumeration
Enumeration mixes passive sources and active validation. Passive sources give coverage, while active probing confirms live hosts and hidden names.
Active Subdomain Enumeration
Active methods query resolvers or brute force names to find valid hosts. Rate control helps avoid detection and noisy logs on DNS infrastructure.
Passive Subdomain Enumeration
Passive methods rely on CT logs, archives, and OSINT feeds. They are quieter but may miss new or private names.
DNS Brute Force
Brute forcing subdomains uses wordlists to discover hidden hosts by querying resolvers and checking responses.
DNSenum
dnsenum automates subdomain discovery and DNS checks.
It performs dictionary attacks and reverse lookups quickly.
dnsenum --enum inlanefreight.com -f /usr/share/seclists/Discovery/DNS/subdomains-top1million-20000.txt -r
DNS Zone Transfer
Zone transfers (AXFR) copy full DNS zones between servers. If open, they reveal every hostname in the domain.
dig axfr inlanefreight.com @10.129.14.128
Virtual Hosts
Virtual hosting lets multiple sites share a single IP address. If you miss the correct host header, you miss the application.
Server VHost Discovery
VHosts rely on host headers, so the right name matters.
Add discovered names into /etc/hosts when needed, especially
for custom TLDs like .htb or nonstandard domains.
Virtual Hosting Types
Name based hosting uses host headers to route requests. IP based hosting assigns different IPs per site. Port based hosting uses different ports per site.
VHost Detection Tools
Tools like gobuster and ffuf can enumerate VHosts.
Use TLS ignore options and thread tuning for speed.
Gobuster
gobuster vhost -u http://inlanefreight.htb:81 -w /usr/share/seclists/Discovery/DNS/subdomains-top1million-110000.txt --append-domain
Certificate Transparency Logs
CT logs record issued certificates and their hostnames. They are strong passive sources for subdomain discovery.
CT Log Search
curl -s "https://crt.sh/?q=facebook.com&output=json" | jq -r '.[].name_value' | sort -u
Fingerprinting
Fingerprinting identifies technologies, versions, and defenses. It combines headers, server responses, and scanning tools.
Fingerprinting Techniques
wafw00f detects web application firewalls by analyzing
response patterns and signatures. nikto scans for common
web server issues and outdated components.
wafw00f inlanefreight.com
nikto -h inlanefreight.com -Tuning b
Crawling
Crawling maps the structure of a web target by following links. It helps find hidden paths, parameters, and orphaned resources.
Crawling Overview
Crawling can be manual or automated depending on scope. The goal is to map reachable pages and assets precisely.
Crawl Types
Authenticated crawling targets user only areas and APIs. Unauthenticated crawling focuses on public surfaces.
Valuable Information
Crawlers often discover backup files, docs, and endpoints. They can reveal parameters, versions, and internal paths.
Context
Crawler configuration matters for accuracy and stealth. Tune depth, rate, and scope to avoid blocking and noise.
Robots.txt
robots.txt signals which paths should be crawled or avoided.
It often lists sensitive paths developers did not intend
for public discovery.
Why robots.txt Matters
Many sites list admin, test, or private paths there. These entries can expose hidden interfaces quickly.
User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /public/
.well-known URLs
The .well-known directory can expose standard metadata.
It often includes security or configuration files.
Crawlies (Crawl Tools)
Crawl tools automate discovery and export results. They vary in speed, depth control, and auth handling.
Popular Web Crawlers
ReconSpider uses Scrapy and exports results to JSON. It is lightweight and good for fast mapping on small sites.
pip3 install scrapy
wget -O ReconSpider.zip https://academy.hackthebox.com/storage/modules/144/ReconSpider.v1.2.zip
unzip ReconSpider.zip
python3 ReconSpider.py http://inlanefreight.com
Search Engine Recon
Search engines provide passive discovery at scale. Use search operators to focus on exposed files and paths.
Search Operators Overview
Operators narrow results to domains and patterns. They help identify logins, files, and admin panels.
Google Dorking Operators
| Operator | Purpose | Example |
|---|---|---|
site: | Limit results to a domain. | site:example.com |
inurl: | Match terms in URLs. | inurl:login |
filetype: | Match file types. | filetype:pdf |
intitle: | Match terms in titles. | intitle:"confidential" |
intext: | Match terms in body text. | intext:"password reset" |
Wayback Machine
Wayback Machine archives historic snapshots of websites. It reveals old endpoints, parameters, and content that are no longer visible on the live site.
Recon Use Cases
Use the calendar view to inspect older versions of pages. Some dates return redirects before resolving correctly.
Recon Automation
Automation frameworks combine multiple recon tasks in one run. They speed up data collection but still need manual review.
Recon Frameworks
Frameworks combine DNS, crawling, and metadata checks. They often integrate CT logs and subdomain sources.
Final Recon
FinalRecon aggregates header, SSL, DNS, and crawler data. It can enumerate subdomains, directories, and Wayback URLs.
git clone https://github.com/thewhiteh4t/FinalRecon.git
cd FinalRecon
pip3 install -r requirements.txt
./finalrecon.py --headers --whois --url http://inlanefreight.com
Reference
This article is based on my personal study notes from the Information Security Foundations track.
Full repository: https://github.com/lameiro0x/pentesting-path-htb