Skip to main content
Deparos is an intelligent content discovery engine that performs directory enumeration, directory fuzzing, and endpoint discovery against web applications. It goes beyond static wordlist brute-forcing by learning from every response - adapting its strategy, growing its wordlists dynamically, and filtering false positives through fingerprint-based soft-404 detection.

How It Works

Target URL


┌──────────────────────────────────────────────────────┐
│  Initialization                                      │
│  1. Probe target, extract host components            │
│  2. Fetch robots.txt                                 │
│  3. Learn baseline fingerprints (3-sample soft-404)  │
│  4. Load prior session data (if resuming)            │
│  5. Generate initial tasks from wordlists + observed │
└──────────────────┬───────────────────────────────────┘

┌──────────────────────────────────────────────────────┐
│  Priority Queue                                      │
│  ┌────┬────┬────┬────┬─────┬──────┬──────┬────────┐ │
│  │ P0 │ P1 │ P2 │ P4 │ P5  │ P7   │ P11  │ P12   │ │
│  │Spdr│Obs │Obs │Obs │Short│ExtVar│Long  │Fuzz   │ │
│  │ JS │Name│File│Dir │Word │Numric│Word  │       │ │
│  └────┴────┴────┴────┴─────┴──────┴──────┴────────┘ │
└──────────────────┬───────────────────────────────────┘

┌──────────────────────────────────────────────────────┐
│  Payload Coordinator                                 │
│  Expander pulls tasks → Expand() yields payloads     │
│  N workers execute payloads concurrently             │
│                                                      │
│  For each response:                                  │
│    Fingerprint check (soft-404?) ──→ discard         │
│    WAF detection ──→ track/backoff                   │
│    Real discovery ──→ callbacks                      │
└──────────────────┬───────────────────────────────────┘

┌──────────────────────────────────────────────────────┐
│  Discovery Callbacks                                 │
│  OnDirectoryDiscovered():                            │
│    • Learn new fingerprints for directory             │
│    • Create recursive tasks (wordlists + observed)   │
│    • Extract breadcrumb directories                  │
│  OnFileDiscovered():                                 │
│    • Extract extension → trigger extension tasks     │
│    • Numeric segment → fuzz ±10 variations           │
│    • Queue extension variant probes (.bak, .old, …)  │
└──────────────────┬───────────────────────────────────┘

        ┌──── loop back to Priority Queue ────┐
        │  (new tasks from discoveries)       │
        └─────────────────────────────────────┘

What Makes It Adaptive

1. Fingerprint-Based Soft-404 Detection

Before scanning, the engine requests 3 random non-existent paths and extracts response attributes (status code, content-type, headers, body hash, content-length ranges). Only attributes stable across all 3 samples become the baseline signature. During scanning, responses matching this signature are discarded as false positives. When an unknown response pattern appears, a 4-strategy wildcard validation (prefix, suffix, extension, middle) confirms whether the discovery is real or a new soft-404 variant - and learns the new pattern.

2. Observed Collection System

Four data pools grow continuously during the scan:
PoolSourcePriority
Observed NamesSpider links, JS parsing, response body tokenizationP1
Observed FilesComplete filenames from discoveriesP2
Observed ExtensionsFile extensions from discoveriesP5
Observed PathsFull path segments from URLsP4
Every newly discovered directory is probed with ALL observed values as high-priority tasks. When a new extension is found for the first time, it triggers tasks across ALL known directories.

3. JavaScript Intelligence

Two layers of JS analysis feed endpoints back into the discovery queue:
  • JSScan (embedded binary): Deobfuscates bundled JS, resolves string concatenation, traces variable assignments, and extracts fetch() / XMLHttpRequest / $.ajax call sites into full HTTP request specs.
  • Spider extractors: Parse inline <script> tags and JS string literals for URL patterns.
Extracted endpoints become priority-0 tasks - tested before any wordlist fuzzing.

4. Dynamic Wordlist Growth

Response bodies are tokenized (content-type-aware for HTML, JSON, JS, CSS) to extract candidate words. These feed into the observed name pool and are replayed against every directory.

5. Recursive Directory Expansion

When a file is found at /a/b/c/file.txt, the engine extracts /a/, /a/b/, /a/b/c/ as directories to test. Each new directory triggers its own full task set (wordlists + observed + modules).

Task Types

TaskPriorityDescription
Spider/JS Extracted0URLs from link extraction and JS analysis
Observed Names1Filenames seen during scan, replayed per directory
Observed Files2Complete name+extension pairs
Observed Paths4Full path segments from URLs
Short Wordlist (files)5Common filenames from short wordlist x extensions
Short Wordlist (dirs)6Common directory names from short wordlist
Extension Variants7Backup/alternate extensions (.bak, .old, .zip, .tar.gz)
Numeric Fuzz7+/-10 variations of numeric path segments
Long Wordlist (files)9Extended filename dictionary x extensions
Long Wordlist (dirs)11Extended directory dictionary
FUZZ12Template-based fuzzing (FUZZ marker replacement)

Deduplication

Multiple layers prevent redundant work:
  • Task-level: FNV-1a hash prevents duplicate task enqueueing
  • Request-level: Cache prevents sending the same HTTP request twice
  • URL-level: DiskSet tracks processed URLs
  • Body-level: Hash prevents re-analyzing identical responses with JSScan
  • Directory/file trackers: Prevent re-processing the same discovery

Built-In Modules

YAML-configured modules trigger specialized tasks when matching directories are found:
ModuleTriggers OnWhat It Does
backupAny directoryTests backup extensions (.bak, .old, .zip, .tar.gz)
jsAny directoryTests .js, .mjs, .map extensions
api/api/, /v1/, etc.REST/GraphQL/SOAP endpoint wordlists
admin/admin/, /manage/Admin panel paths
docs/docs/, /api-docs/Swagger, OpenAPI, GraphQL playground
static/static/, /assets/Blocks recursion to avoid noise

Supporting Systems

ComponentPurpose
WAF DetectionIdentifies Cloudflare, Akamai, AWS WAF, F5, Imperva, Sucuri, ModSecurity. Tracks consecutive blocks for backoff/early exit
Scope EnforcementThree modes: any (no check), subdomain (same eTLD+1), exact (same host). Checked on every discovery and redirect
Case Sensitivity DetectionAuto-detected on first file discovery by re-requesting with altered casing
StorageSQLite-backed sitemap with semantic dedup (FNV-1a-64). Supports session comparison for differential scanning across runs

Integration with Vigolium

Deparos runs as an input source (DeparosDiscoverySource) in the scanning pipeline. Each discovery is converted to an httpmsg.HttpRequestResponse and fed to the executor as a work item - where it flows through active and passive vulnerability scanning modules.
DeparosDiscoverySource.Next()
  → Engine.Start() → discoveries stream out
  → Convert to httpmsg.HttpRequestResponse
  → Save to DB (optional)
  → Return as WorkItem → Executor → Scanner Modules