Intro

Web reconnaissance is the foundation of a strong security test. It maps assets and technology choices before deeper testing begins and reduces blind spots.

Types of Recon

Recon uses two approaches: active and passive. Active touches systems directly, while passive relies on public sources.

Active Recon

Active recon interacts with the target to gather accurate data. It is effective but increases detection risk because requests are logged by servers and security tools.

TechniqueDescriptionExampleToolsDetection Risk
Port ScanningIdentify open ports and exposed services.Scan HTTP and HTTPS.NmapHigh
Vulnerability ScanningProbe for known weaknesses.Check outdated servers.NiktoHigh
Banner GrabbingRead service banners for versions.Capture HTTP headers.curlLow

Passive Recon

Passive recon gathers information without touching the target. It relies on public sources, which lowers detection risk.

TechniqueDescriptionExampleToolsDetection Risk
Search QueriesDiscover public assets and docs.Find exposed PDFs.GoogleVery Low
WHOIS LookupsRetrieve domain registration data.Query registrant info.whoisVery Low
DNS AnalysisInspect records and subdomains.Check MX and NS.digVery Low

WHOIS

WHOIS is a query protocol for registration data on domains. It can also provide IP block ownership for scoping.

whois inlanefreight.com

Common fields include registrar, contacts, dates, and name servers, which provide ownership context for OSINT. This data helps confirm scope quickly.

Why WHOIS Matters for Web Recon

WHOIS can reveal contacts and infrastructure ownership clues. It also shows name servers that hint at hosting providers.

Historical records show how ownership or hosting changed. That context explains legacy systems and exposure.


DNS and Subdomains

DNS maps human friendly names to numeric IP addresses. It reveals service dependencies and hidden public hosts. This makes DNS a core recon surface.

DNS Overview

DNS translates names to IPs and guides traffic routing. Understanding its flow is essential for safe enumeration.

How DNS Works

A resolver queries root, then TLD, then authoritative servers. Each step returns the next authority until the IP answer is returned to the client.

DNS Files

DNS records live in zone files and configs on servers. Common types include A, AAAA, CNAME, MX, NS, TXT, SOA, and PTR, each mapping a specific role.

Digging DNS

Several tools query DNS, but dig is the most flexible. It supports custom flags and detailed output for analysis.

Dig

dig allows precise resolver control and record queries. This makes it ideal for targeted enumeration and repeatable results across multiple servers.

Common Dig Commands

CommandDescription
dig domain.comDefault A record lookup.
dig domain.com MXMail exchanger lookup.
dig domain.com NSAuthoritative name servers.
dig +trace domain.comFull resolution path.
dig -x 192.168.1.1Reverse lookup for an IP.

Subdomains

Subdomain discovery expands the visible surface of a target. Each subdomain can host a separate app or service stack.

Subdomain Enumeration

Enumeration mixes passive sources and active validation. Passive sources give coverage, while active probing confirms live hosts and hidden names.

Active Subdomain Enumeration

Active methods query resolvers or brute force names to find valid hosts. Rate control helps avoid detection and noisy logs on DNS infrastructure.

Passive Subdomain Enumeration

Passive methods rely on CT logs, archives, and OSINT feeds. They are quieter but may miss new or private names.

DNS Brute Force

Brute forcing subdomains uses wordlists to discover hidden hosts by querying resolvers and checking responses.

DNSenum

dnsenum automates subdomain discovery and DNS checks. It performs dictionary attacks and reverse lookups quickly.

dnsenum --enum inlanefreight.com -f /usr/share/seclists/Discovery/DNS/subdomains-top1million-20000.txt -r

DNS Zone Transfer

Zone transfers (AXFR) copy full DNS zones between servers. If open, they reveal every hostname in the domain.

dig axfr inlanefreight.com @10.129.14.128

Virtual Hosts

Virtual hosting lets multiple sites share a single IP address. If you miss the correct host header, you miss the application.

Server VHost Discovery

VHosts rely on host headers, so the right name matters. Add discovered names into /etc/hosts when needed, especially for custom TLDs like .htb or nonstandard domains.

Virtual Hosting Types

Name based hosting uses host headers to route requests. IP based hosting assigns different IPs per site. Port based hosting uses different ports per site.

VHost Detection Tools

Tools like gobuster and ffuf can enumerate VHosts. Use TLS ignore options and thread tuning for speed.

Gobuster

gobuster vhost -u http://inlanefreight.htb:81 -w /usr/share/seclists/Discovery/DNS/subdomains-top1million-110000.txt --append-domain

Certificate Transparency Logs

CT logs record issued certificates and their hostnames. They are strong passive sources for subdomain discovery.

curl -s "https://crt.sh/?q=facebook.com&output=json" | jq -r '.[].name_value' | sort -u

Fingerprinting

Fingerprinting identifies technologies, versions, and defenses. It combines headers, server responses, and scanning tools.

Fingerprinting Techniques

wafw00f detects web application firewalls by analyzing response patterns and signatures. nikto scans for common web server issues and outdated components.

wafw00f inlanefreight.com
nikto -h inlanefreight.com -Tuning b

Crawling

Crawling maps the structure of a web target by following links. It helps find hidden paths, parameters, and orphaned resources.

Crawling Overview

Crawling can be manual or automated depending on scope. The goal is to map reachable pages and assets precisely.

Crawl Types

Authenticated crawling targets user only areas and APIs. Unauthenticated crawling focuses on public surfaces.

Valuable Information

Crawlers often discover backup files, docs, and endpoints. They can reveal parameters, versions, and internal paths.

Context

Crawler configuration matters for accuracy and stealth. Tune depth, rate, and scope to avoid blocking and noise.

Robots.txt

robots.txt signals which paths should be crawled or avoided. It often lists sensitive paths developers did not intend for public discovery.

Why robots.txt Matters

Many sites list admin, test, or private paths there. These entries can expose hidden interfaces quickly.

User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /public/

.well-known URLs

The .well-known directory can expose standard metadata. It often includes security or configuration files.

Crawlies (Crawl Tools)

Crawl tools automate discovery and export results. They vary in speed, depth control, and auth handling.

ReconSpider uses Scrapy and exports results to JSON. It is lightweight and good for fast mapping on small sites.

pip3 install scrapy
wget -O ReconSpider.zip https://academy.hackthebox.com/storage/modules/144/ReconSpider.v1.2.zip
unzip ReconSpider.zip
python3 ReconSpider.py http://inlanefreight.com

Search Engine Recon

Search engines provide passive discovery at scale. Use search operators to focus on exposed files and paths.

Search Operators Overview

Operators narrow results to domains and patterns. They help identify logins, files, and admin panels.

Google Dorking Operators

OperatorPurposeExample
site:Limit results to a domain.site:example.com
inurl:Match terms in URLs.inurl:login
filetype:Match file types.filetype:pdf
intitle:Match terms in titles.intitle:"confidential"
intext:Match terms in body text.intext:"password reset"

Wayback Machine

Wayback Machine archives historic snapshots of websites. It reveals old endpoints, parameters, and content that are no longer visible on the live site.

Recon Use Cases

Use the calendar view to inspect older versions of pages. Some dates return redirects before resolving correctly.

Recon Automation

Automation frameworks combine multiple recon tasks in one run. They speed up data collection but still need manual review.

Recon Frameworks

Frameworks combine DNS, crawling, and metadata checks. They often integrate CT logs and subdomain sources.

Final Recon

FinalRecon aggregates header, SSL, DNS, and crawler data. It can enumerate subdomains, directories, and Wayback URLs.

git clone https://github.com/thewhiteh4t/FinalRecon.git
cd FinalRecon
pip3 install -r requirements.txt
./finalrecon.py --headers --whois --url http://inlanefreight.com

Reference

This article is based on my personal study notes from the Information Security Foundations track.

Full repository: https://github.com/lameiro0x/pentesting-path-htb