How It Works
What Makes It Adaptive
1. Fingerprint-Based Soft-404 Detection
Before scanning, the engine requests 3 random non-existent paths and extracts response attributes (status code, content-type, headers, body hash, content-length ranges). Only attributes stable across all 3 samples become the baseline signature. During scanning, responses matching this signature are discarded as false positives. When an unknown response pattern appears, a 4-strategy wildcard validation (prefix, suffix, extension, middle) confirms whether the discovery is real or a new soft-404 variant - and learns the new pattern.2. Observed Collection System
Four data pools grow continuously during the scan:| Pool | Source | Priority |
|---|---|---|
| Observed Names | Spider links, JS parsing, response body tokenization | P1 |
| Observed Files | Complete filenames from discoveries | P2 |
| Observed Extensions | File extensions from discoveries | P5 |
| Observed Paths | Full path segments from URLs | P4 |
3. JavaScript Intelligence
Two layers of JS analysis feed endpoints back into the discovery queue:- JSScan (embedded binary): Deobfuscates bundled JS, resolves string concatenation, traces variable assignments, and extracts
fetch()/XMLHttpRequest/$.ajaxcall sites into full HTTP request specs. - Spider extractors: Parse inline
<script>tags and JS string literals for URL patterns.
4. Dynamic Wordlist Growth
Response bodies are tokenized (content-type-aware for HTML, JSON, JS, CSS) to extract candidate words. These feed into the observed name pool and are replayed against every directory.5. Recursive Directory Expansion
When a file is found at/a/b/c/file.txt, the engine extracts /a/, /a/b/, /a/b/c/ as directories to test. Each new directory triggers its own full task set (wordlists + observed + modules).
Task Types
| Task | Priority | Description |
|---|---|---|
| Spider/JS Extracted | 0 | URLs from link extraction and JS analysis |
| Observed Names | 1 | Filenames seen during scan, replayed per directory |
| Observed Files | 2 | Complete name+extension pairs |
| Observed Paths | 4 | Full path segments from URLs |
| Short Wordlist (files) | 5 | Common filenames from short wordlist x extensions |
| Short Wordlist (dirs) | 6 | Common directory names from short wordlist |
| Extension Variants | 7 | Backup/alternate extensions (.bak, .old, .zip, .tar.gz) |
| Numeric Fuzz | 7 | +/-10 variations of numeric path segments |
| Long Wordlist (files) | 9 | Extended filename dictionary x extensions |
| Long Wordlist (dirs) | 11 | Extended directory dictionary |
| FUZZ | 12 | Template-based fuzzing (FUZZ marker replacement) |
Deduplication
Multiple layers prevent redundant work:- Task-level: FNV-1a hash prevents duplicate task enqueueing
- Request-level: Cache prevents sending the same HTTP request twice
- URL-level: DiskSet tracks processed URLs
- Body-level: Hash prevents re-analyzing identical responses with JSScan
- Directory/file trackers: Prevent re-processing the same discovery
Built-In Modules
YAML-configured modules trigger specialized tasks when matching directories are found:| Module | Triggers On | What It Does |
|---|---|---|
backup | Any directory | Tests backup extensions (.bak, .old, .zip, .tar.gz) |
js | Any directory | Tests .js, .mjs, .map extensions |
api | /api/, /v1/, etc. | REST/GraphQL/SOAP endpoint wordlists |
admin | /admin/, /manage/ | Admin panel paths |
docs | /docs/, /api-docs/ | Swagger, OpenAPI, GraphQL playground |
static | /static/, /assets/ | Blocks recursion to avoid noise |
Supporting Systems
| Component | Purpose |
|---|---|
| WAF Detection | Identifies Cloudflare, Akamai, AWS WAF, F5, Imperva, Sucuri, ModSecurity. Tracks consecutive blocks for backoff/early exit |
| Scope Enforcement | Three modes: any (no check), subdomain (same eTLD+1), exact (same host). Checked on every discovery and redirect |
| Case Sensitivity Detection | Auto-detected on first file discovery by re-requesting with altered casing |
| Storage | SQLite-backed sitemap with semantic dedup (FNV-1a-64). Supports session comparison for differential scanning across runs |
Integration with Vigolium
Deparos runs as an input source (DeparosDiscoverySource) in the scanning pipeline. Each discovery is converted to an httpmsg.HttpRequestResponse and fed to the executor as a work item - where it flows through active and passive vulnerability scanning modules.
