Skip to content
← Ethical Hacking · beginner · 12 min · 03 / 31

Reconnaissance

OSINT, passive recon, Google dorks, Shodan, theHarvester, Maltego — gathering intelligence without touching the target.

OSINTreconnaissanceGoogle dorksShodantheHarvesterpassive reconactive recon

Real-World Analogy

A detective doesn’t kick in doors before reading the case file. Recon is reading the case file — finding everything publicly available before you make a move that could be logged.

Passive vs Active Recon

Passive recon — you never touch the target's systems
  → Public records, Google, Shodan, WHOIS, social media
  → Target cannot detect you
  → Always do this first

Active recon — you send packets to the target
  → DNS queries, port scans, web crawling
  → Target may detect and log your activity
  → Do after you've exhausted passive sources

WHOIS and Domain Intelligence

# Domain registration info
whois example.com

# Look for:
# Registrant email      → direct contact point
# Name servers          → hosting provider
# Creation date         → how old is this domain?
# Registrar             → where to report abuse

# Historical WHOIS (privacy guard may hide current data)
# whoisfreaks.com, domaintools.com

# Find all domains registered by same email
# reversewhois.io

DNS Enumeration

# Basic DNS records
dig example.com A       # IPv4 address
dig example.com AAAA    # IPv6 address
dig example.com MX      # mail servers
dig example.com NS      # name servers
dig example.com TXT     # SPF, DKIM, verification records
dig example.com CNAME   # aliases

# TXT records often reveal:
# - Email providers (Google Workspace, Office 365)
# - Third-party services (Stripe, Salesforce)
# - Verification tokens (sometimes expose internal project names)

# Zone transfer (dumps all DNS records — misconfigured servers only)
dig axfr @ns1.example.com example.com

# Subdomain brute force
gobuster dns -d example.com -w /usr/share/wordlists/subdomains-top1million-5000.txt -t 50
# or
ffuf -w /usr/share/wordlists/subdomains-top1million-5000.txt -u http://FUZZ.example.com -H "Host: FUZZ.example.com"

# Certificate transparency — subdomains listed in SSL certs
# crt.sh: search for %.example.com
curl -s "https://crt.sh/?q=%.example.com&output=json" | jq '.[].name_value' | sort -u

Google Dorks

Google’s search operators as a recon weapon:

site:example.com                         → all indexed pages
site:example.com filetype:pdf            → PDF documents
site:example.com inurl:admin             → admin panels
site:example.com inurl:login             → login pages
site:example.com "index of"              → directory listings
site:example.com ext:sql OR ext:db       → exposed databases
"example.com" ext:env OR ext:config      → config files indexed
intitle:"index of" site:example.com      → open directories
"DB_PASSWORD" site:github.com            → secrets in GitHub
"PRIVATE KEY" site:github.com            → private keys leaked to GitHub
inurl:/wp-content/uploads filetype:txt   → WordPress upload dirs

Automate with GoogleDorker or DorkSearch:

# Manual — just use Google, don't hammer it with automation
# Use site:example.com with different operators, build a picture

# High-value dorks for any target:
site:pastebin.com "example.com"           # pastes mentioning target
site:github.com "example.com" password    # leaked creds in repos

Shodan — The Internet-Connected Device Search Engine

Shodan indexes banners from internet-exposed services. It knows what software version your target’s servers run — without you ever sending a packet.

# Install CLI
pip install shodan
shodan init YOUR_API_KEY

# Search for target
shodan search hostname:example.com

# Lookup specific IP
shodan host 93.184.216.34

# Look for:
# - Software versions (match against known CVEs)
# - Open ports that shouldn't be public (Redis, Elasticsearch, MongoDB)
# - SSL certificate info
# - Server banners with version numbers

# Useful Shodan filters:
# hostname:example.com
# org:"Example Company"
# ssl.cert.subject.cn:example.com
# http.title:"Example App"
# product:Redis
# product:elasticsearch

Common Shodan findings:

  • port:6379 — Redis with no auth
  • port:9200 — Elasticsearch with no auth
  • port:27017 — MongoDB with no auth
  • port:5432 postgresql — exposed PostgreSQL

theHarvester — Email and Subdomain Intel

# Gather emails, subdomains, IPs from public sources
theHarvester -d example.com -b google,bing,linkedin,shodan -l 500 -f output.html

# Sources: google, bing, linkedin, twitter, hunter, shodan, crtsh
# -l 500: limit to 500 results per source

# Output includes:
# - Email addresses (for phishing simulations or credential stuffing)
# - Subdomains
# - Hosts and IP addresses

LinkedIn / Social Media OSINT

LinkedIn reveals:

  • Employee names → generate username lists (j.smith, john.smith, jsmith)
  • Job titles → understand tech stack (“Senior Kubernetes Engineer” = K8s in prod)
  • Recent job postings → “AWS Lambda experience required” = they use Lambda
  • Tech stack from profiles → Python, Go, React, Terraform
# Generate username variations from names
first = "John"
last  = "Smith"
domain = "example.com"

variants = [
  f"{first.lower()}.{last.lower()}@{domain}",
  f"{first[0].lower()}{last.lower()}@{domain}",
  f"{first.lower()}{last[0].lower()}@{domain}",
  f"{first.lower()}_{last.lower()}@{domain}",
]

Tools: LinkedInt, linkedin2username, osintgram (Instagram)

Wayback Machine & Historical Data

# Archive.org API — what did the site look like before?
curl "http://archive.org/wayback/available?url=example.com/admin"

# Useful for:
# - Finding removed pages that exposed data
# - Old login panels with known vulnerabilities
# - API keys committed to public JS then removed

# waybackurls — extract all URLs from Wayback Machine
go install github.com/tomnomnom/waybackurls@latest
echo "example.com" | waybackurls

GitHub / GitLab Recon

Source code repositories are the richest recon target:

# Search GitHub for target
# github.com/search?q=example.com&type=code

# Look for:
# API keys, AWS credentials, database passwords in:
# - .env files accidentally committed
# - config files
# - commit history (key added and then removed is still in git history)

# Tools:
pip install trufflehog
trufflehog github --repo https://github.com/example/repo

# GitLeaks — scan for secrets
docker run -v /path/to/repo:/path zricethezav/gitleaks:latest detect --source /path

# GitHub dorks (search in github.com):
# org:example-org "password"
# org:example-org "api_key"
# org:example-org "BEGIN RSA PRIVATE KEY"
# org:example-org ".env"
# org:example-org "DB_PASSWORD"

Email Verification

# hunter.io — find emails for a domain (freemium)
curl "https://api.hunter.io/v2/domain-search?domain=example.com&api_key=YOUR_KEY"

# Verify if email exists (SMTP verification without sending)
# tools: email-verify, verify-email
pip install verify-email
python -c "from verify_email import verify_email; print(verify_email('user@example.com'))"

Building the Recon Report

After passive recon, document:

## Target: example.com

### Infrastructure
- IP ranges: 93.184.216.0/24, 198.51.100.0/24
- Hosting: AWS us-east-1 (from TXT records + Shodan)
- CDN: Cloudflare (IP resolves to CF range)
- Name servers: ns1.cloudflare.com, ns2.cloudflare.com

### Subdomains Found (47 total)
- api.example.com         → 93.184.216.10
- staging.example.com     → 93.184.216.11  ← interesting
- dev.example.com         → 10.0.1.5       ← private IP leaked!
- mail.example.com        → 198.51.100.5
- vpn.example.com         → 198.51.100.6   ← VPN exposed

### Technology Stack
- Web: nginx/1.18.0 (from Shodan banner)
- App: Node.js (from X-Powered-By header)
- DB: likely PostgreSQL (job postings mention it)
- Email: Google Workspace (MX: aspmx.l.google.com)

### Employees
- 47 LinkedIn profiles
- 23 unique email addresses (format: firstname.lastname@example.com)
- CTO: john.smith@example.com (from conference speaker bio)

### Exposed Services (Shodan)
- staging.example.com:8080 — Apache Tomcat 8.5.23 (CVE-2020-1938 — check)
- 93.184.216.15:6379 — Redis, no auth detected

### GitHub Findings
- 3 repos mention "example.com"
- AWS key in commit abc123 (now deleted, still in history)
  Key: AKIAIOSFODNN7EXAMPLE

This document becomes the input for your scanning phase.