vigolium scan -t https://example.com on the command line to a vulnerability finding written to the terminal. It is an architecture deep-dive intended for contributors who want to understand the scanning pipeline end-to-end.
High-Level Pipeline
Stage 1: CLI Entry and Configuration
Entry Point
cmd/vigolium/main.go prints the banner (unless --json or certain subcommands suppress it), then calls cli.Execute() which invokes the Cobra root command.
Root Command — pkg/cli/root.go
rootCmd.PersistentPreRunE fires before every subcommand and:
- Initializes the global
zap.LoggerviainitLogger(). - Falls back to the
VIGOLIUM_PROXYenvironment variable if--proxyis empty. - Runs first-time setup via
ensureInitialized()— creates~/.vigolium/and writes the default config, profiles, SAST rules, and prompt templates if they don’t exist. - Handles early-exit flags:
--list-modules,--list-input-mode,--full-example.
Scan Command — pkg/cli/scan.go
runScanCmd() is the heart of the scan flow. It performs these steps in order:
- Copy global flags into
scanOpts(*types.Options): targets, concurrency, timeout, modules, proxy, format, phases, etc. - Reconcile
--jsonand--format: if--jsonis set and format is still the default"console", switch to"jsonl". - Load config:
config.LoadSettings(configPath)reads~/.vigolium/vigolium-configs.yaml. CLI overrides are applied for origin mode, OAST URL, and database settings. Validates database, extensions, and strategy configs. - Resolve scanning profile: precedence is
--scanning-profileflag >settings.ScanningStrategy.ScanningProfile. Profiles are loaded from~/.vigolium/profiles/or embedded presets, and applied viaconfig.ApplyProfile(). - Resolve scanning strategy: precedence is
--strategyflag >settings.ScanningStrategy.DefaultStrategy. Strategy determines which phases are enabled (discovery, spidering, KnownIssueScan, etc.). - Resolve heuristics check level:
--skip-heuristics>--heuristics-check> config > default"basic". - Phase isolation:
--onlyand--skipare mutually exclusive.--only <phase>enables a single phase and disables all others.--skip <phase>disables specific phases. Phase aliases are normalized:deparos/discover→discovery,spitolas→spidering. Thedynamic-assessmentalias is accepted as a backward-compatible alias foraudit. - Validate HTML output:
--format htmlrequires--outputand is only allowed with--only discoveryor--only spidering. - Apply scanning pace: concurrency and max-per-host from config are applied unless explicitly set on CLI.
- Initialize database:
database.NewDB()→CreateSchema()→database.NewRepository(). - Handle
--source: clone git URLs or resolve local paths, link source repos to targets in DB. - Branch into one of three execution paths:
Stage 2: Input Parsing
The InputSource Interface — pkg/input/source/source.go
Input sources provide a pull-based stream of work items:
(*WorkItem, nil) = next item, (nil, io.EOF) = source exhausted, (nil, context.Canceled) = cancelled.
The optional Countable interface adds Count() int64 for progress tracking.
InputSource Implementations
| Type | File | Description | Countable |
|---|---|---|---|
TargetSource | source.go | Iterates CLI -t targets, builds GET requests via GetRawRequestFromURL() | Yes |
FileSource | file.go | Parses input files (OpenAPI, Burp, HAR, cURL, etc.) via format-specific parsers | Yes |
StdinSource | stdin.go | Reads URLs line-by-line from stdin | No |
SingleSource | single_source.go | Returns a single item, then EOF. Used by scan-url/scan-request | Yes (1) |
MultiSource | multi.go | Drains sub-sources sequentially in order | Yes (sum) |
ConcurrentMultiSource | concurrent.go | Reads all sub-sources concurrently. Used for queue-based sources | No |
ExternalHarvesterInputSource | external_harvester_source.go | Runs external harvesting (Wayback, CommonCrawl, etc.) | No |
DeparosDiscoverySource | deparos_discovery.go | Runs content discovery engine per target | No |
NewInputSource(cfg SourceConfig) is the factory function. Based on the config fields, it creates TargetSource, FileSource, and/or StdinSource, wrapping multiple sources in a MultiSource.
Supported Input Formats
Resolved byresolveFormat() in file.go:
| Format names | Parser |
|---|---|
urls, url, list | Line-delimited URLs |
openapi, swagger | OpenAPI/Swagger spec |
postman | Postman collection |
curl | cURL commands |
burpraw, burp-raw, raw | Burp raw request files |
burpxml, burp-xml, burp | Burp XML export |
nuclei, nuclei-output | Nuclei JSONL output |
deparos, deparos-output | Deparos discovery output |
The WorkItem — pkg/work/item.go
Complete() is called after processing to acknowledge queue-based sources.
Stage 3: HTTP Types
HttpRequestResponse — pkg/httpmsg/http_request_response.go
The central data type flowing through the entire pipeline. It pairs an HTTP request with an optional response:
Request(), Response(), HasResponse(), Service(), URL(), Target(), ID() (FNV-1a hash of host:port:method), Clone(), WithResponse(), CreateInsertionPoints(), BuildRetryableRequest().
Factory functions:
GetRawRequestFromURL(url)— builds a minimal GET request from a URL string (used byTargetSourceandStdinSource)ParseRawRequest(raw)— parses raw HTTP textFromStdRequest(req)— converts a stdlibhttp.Request
HttpRequest — pkg/httpmsg/http_request.go
Stores the raw HTTP request bytes as the source of truth, with lazy-parsed accessors:
ensureParsed() is thread-safe via a double-checked RW mutex. It extracts headers, method, path, and body offset from the raw bytes.
Immutable builder methods (WithMethod(), WithPath(), WithHeader(), WithBody(), etc.) return new *HttpRequest instances with rebuilt raw bytes. The RequestOption / Apply() batch builder pattern rebuilds raw bytes only once for multiple changes.
HttpResponse — pkg/httpmsg/http_response.go
Same lazy-parsing pattern as HttpRequest:
Service — pkg/httpmsg/service.go
A host/port/protocol triple:
Stage 4: Runner Orchestration
The Runner — internal/runner/runner.go
The Runner is the high-level orchestrator. It builds shared infrastructure and executes the multi-phase scan pipeline.
buildInfrastructure()
Called once at the top ofRunNativeScan(). Creates all shared services in the phaseInfra container:
- Notifier — Telegram and/or Discord backends (from config or env vars).
- Services — wraps Options, Notifier, DedupManager, and HostErrors (circuit breaker for unresponsive hosts).
- HostRateLimiter — per-host concurrency control (default: 2 concurrent per host, 1000 max tracked hosts, 30s idle eviction).
- HTTP Requester — HTTP client with retry, proxy, redirect, and middleware support.
- ScopeMatcher — host/path/status/content-type/body-string filtering from config.
- JS Engine — Grafana Sobek engine for JavaScript extensions, including pre/post hook chains.
RunNativeScan() — The 7-Phase Pipeline
Phase 5 Detail: KnownIssueScan
- Queries distinct paths from DB via
GetDistinctPaths(). - Builds target URLs — either path-enriched (default,
enrich_targets: true) or host-level only. - Runs Nuclei templates + Kingfisher secret scanning against targets.
- Post-phase dedup: calls
DeduplicateFindings()to group findings with identical(module_id, severity, matched_at URL).
Phase 6 Detail: Audit
- Creates a
database.Scanrecord with cursor tracking. - Resolves DA concurrency from config (separate from discovery concurrency).
- Optionally starts the OAST (out-of-band) service.
- Runs a feedback loop (up to
maxFeedbackRounds = 3):- Creates a
OneShotDBInputSourcethat reads records after the scan cursor. - Builds an Executor with all active + passive modules,
SkipBaseline: true(responses already in DB). - The Executor enforces a per-module finding cap (
MaxFindingsPerModule, default 10) — once a module emits this many findings, further results from that module are suppressed. - After each round, checks for newly created records. Breaks early if none.
- Creates a
- Post-phase dedup: calls
DeduplicateFindings()to merge findings where the same module fired on the same URL with different payloads. - Marks the scan as completed.
Stage 5: The Executor
Executor Struct — pkg/core/executor.go
The Executor is the central dispatch engine. It receives work items, distributes them to a worker pool, and dispatches modules.
Module Pre-Grouping
At construction time,NewExecutor() pre-groups all modules by their ScanScope bitmask into five slices. A module declaring ScanScopeInsertionPoint | ScanScopeRequest appears in both perIPActive and perRequestActive. This avoids per-item scope-check iteration.
Execute() — Worker Pool
- Spawns
Workersgoroutines reading from a buffered channel (cap = Workers * 2). - Calls
feedItems()on the calling goroutine (producer loop). - Closes the channel, waits for all workers to drain.
- Flushes passive modules (
Flusherinterface) and OAST service. - Returns
(foundResults, nil).
feedItems() — The Producer
For each item fromsource.Next():
- Static file filter: if path matches a static file extension (
.jpg,.css, etc.), skip. - Pre-request scope check:
ScopeMatcher.InScopeRequest(host, path, "", "")— host + path only, no HTTP round-trip. Rejects obviously out-of-scope items early. - Host error check: if
HostErrors.Check(hostID)returns true (host has been circuit-broken), skip. - Send item to the worker channel.
worker() — The Consumer
Each worker goroutine loops on the channel:Stage 6: Processing an Item
processItem() is the per-item hot path. Every item that passes feedItems() goes through these steps:
Step 1: Baseline HTTP Fetch
sync.Pool of recycled buffers (32 KiB initial, max 1 MiB for pool return) to reduce GC pressure.
Step 2: Traffic Callback
If configured, callsOnTraffic(method, url, statusCode, contentType) — an observer hook for printing traffic lines to stderr.
Step 3: Pre-Hooks
Step 4: Body Size Enforcement
IfScopeMatcher is set, checks request and response body sizes:
BodySizeDrop→ drop item entirely.BodySizeTruncate→ truncate bodies to limits, continue scanning.BodySizeSkipScan→ truncate, save to DB, but skip scanning.
Step 5: Scope Check + Database Save
saveToDatabase() calls repo.SaveRecord() and stores the returned UUID in the requestUUIDs sharded map (keyed by request SHA-256 hash) for later finding linkage.
Step 6: Eligibility Pre-Computation
computeEligibility() runs once per item (not per module):
- Request nil check
- URL parse check
- Media/JS URL check (
utils.IsMediaAndJSURL) - HTTP method check (skip
OPTIONS,CONNECT,HEAD,TRACE)
baseEligible result lets the executor skip calling CanProcess() on modules that embed the standard base checks when the base would reject.
Step 7: Module Filter
Ifitem.EnableModules is non-empty, builds a map-based O(1) filter. Otherwise uses the allModulesFilter sentinel.
Step 8: Passive Module Execution (Sequential)
CanProcess() → call scan method → process results. No goroutines — passive modules do not perform network I/O.
Step 9: Active Module Execution (Parallel)
Three categories run in parallel viaconc.WaitGroup:
conc.WaitGroup).
For the insertion-point category specifically, insertion points are iterated serially (one at a time), but all eligible modules for a given point run concurrently:
Concurrency Model Summary
Stage 7: Insertion Points
The InsertionPoint Interface — pkg/httpmsg/insertion_point.go
InsertionPointType Constants
| Constant | Value | Description |
|---|---|---|
INS_PARAM_URL | 0 | URL query parameter value |
INS_PARAM_BODY | 1 | POST body parameter value |
INS_PARAM_COOKIE | 2 | Cookie value |
INS_PARAM_XML | 3 | XML element value |
INS_PARAM_XML_ATTR | 4 | XML attribute value |
INS_PARAM_MULTIPART_ATTR | 5 | Multipart attribute value |
INS_PARAM_JSON | 6 | JSON value |
INS_HEADER | 32 | HTTP header value |
INS_URL_PATH_FOLDER | 33 | REST URL path folder |
INS_PARAM_NAME_URL | 34 | URL parameter name |
INS_PARAM_NAME_BODY | 35 | Body parameter name |
INS_ENTIRE_BODY | 36 | Entire request body |
INS_URL_PATH_FILENAME | 37 | REST URL path filename |
INS_USER_PROVIDED | 64 | User-defined position |
INS_EXTENSION_PROVIDED | 65 | Extension-provided position |
InsertionPoint Implementations
| Type | Description |
|---|---|
ParameterInsertionPoint | Standard parameter replacement. Uses offset-based splicing with type-aware payload encoding (URL-encode for URL/body/cookie, JSON-aware for JSON params, raw for XML). |
HeaderInsertionPoint | Header value replacement. Uses AddOrReplaceHeader() instead of offset splicing. Created for existing injectable headers + synthetic headers (X-Forwarded-For, X-Forwarded-Host, Referer, True-Client-IP, X-Real-IP). |
NestedInsertionPoint | Multi-level encoding chains (e.g., URL-encoded JSON inside a body parameter). BuildRequest() applies inner-to-outer: child builds first, then parent encodes the result. |
EncodedInsertionPoint | Custom encoder chain. Applies prefix + payload → encoder.Encode() → splice. Used for complex encoding scenarios. |
LRU Cache
The Executor maintains a 4096-entry LRU cache (ipCache) keyed by request SHA-256 hash. CreateAllInsertionPoints() is called once per unique request, and the results are reused for all modules scanning that request.
Shared Base Request
CreateAllInsertionPoints() creates a single sharedBaseRequest clone of the raw bytes, shared across all ParameterInsertionPoint instances from that call. This is safe because BuildRequest() never mutates the shared bytes — it always allocates a new result slice.
Stage 8: Module Dispatch
Module Interface Hierarchy — pkg/modules/
ScanScope Bitmask — pkg/modules/modkit/types.go
ScanScopes().Has(scope) to pre-group modules at startup.
InsertionPointTypeSet — pkg/modules/modkit/types.go
A uint32 bitmask where each bit corresponds to an InsertionPointType. Checked by the executor before calling ScanPerInsertionPoint():
URLParamTypes, BodyParamTypes, CookieTypes, HeaderTypes, AllParamTypes.
CanProcess Semantics
Active modules (viaBaseActiveModule): reject nil requests, unparseable URLs, media/JS URLs, and non-testable HTTP methods (OPTIONS, CONNECT, HEAD, TRACE). The executor pre-computes these checks in computeEligibility() and skips calling CanProcess() when the base would reject.
Passive modules (via BasePassiveModule): only check that the required HTTP transaction parts (request and/or response) are present. They process all content types including media — no method filtering.
Execution Pattern
conc.WaitGroup.
ScanContext — pkg/modules/modkit/context.go
Shared resources available to all modules during scanning:
- DedupManager — request-level deduplication.
- OASTProvider — generates out-of-band callback URLs for blind vulnerability detection.
- MutationGenerator — classifies parameter values and generates test mutations.
- baselineCache — caches baseline responses for diff-based scanning.
Flusher Interface
Passive modules that buffer state across many requests (e.g.,anomaly_ranking) implement Flusher:
Module Development Defaults — pkg/modules/modkit/
Module authors embed BaseActiveModule or BasePassiveModule to get default implementations of all interface methods. Module IDs must be lowercase kebab-case with prefix active- or passive- (validated at construction, panics on violation). The modkit package also provides NewBaseModule(), NewBaseActiveModule(), and NewBasePassiveModule() constructors.
Stage 9: Result Emission
ResultEvent — pkg/output/output.go
ResultEvent.ID() computes a SHA-1 hash over ModuleID | Description | Severity | Matched — this becomes finding_hash in the database for deduplication.
processResults() and emitResult()
When a module returns results, the executor processes them:Stage 10: Output
Writer Interface — pkg/output/output.go
StandardWriter
The defaultWriter implementation:
- Sets
Timestamp = time.Now(), defaultsType = "http", forcesMatcherStatus = true. - Serializes to JSON via
jsoniter.Marshal(). - Under mutex:
- Stdout: writes JSON (if
--json) or formatted console output (if not--silent). - File: appends JSON line to output file (JSONL format).
- Stdout: writes JSON (if
Console Format — pkg/output/format_screen.go
- Module ID split into type (
active/passive) and name, colored accordingly. - Severity shown with symbol and ANSI color (Critical=magenta, High=red, Medium=yellow, Low=green).
- Output truncated to terminal width.
JSON Format — pkg/output/format_json.go
Serializes ResultEvent via jsoniter.Marshal(). Response body is stripped unless --include-response is set.
HTML Format — pkg/output/format_html.go
Uses a streaming approach: splits the embedded HTML template at {{.ResultsJSON}}, writes the before-portion with simple string replacement (avoids text/template because bundled JS contains {{ sequences), then streams JSON array items one at a time, then writes the after-portion.
File Output Writer — pkg/output/file_output_writer.go
O_APPEND|O_CREATE|O_WRONLY for safe resume across invocations.
Stage 11: Database Persistence
Data Models — pkg/database/models.go
HTTPRecord (table: http_records)
Fully denormalized — no separate hosts or parameters tables. Key fields:
- Identity:
UUID(primary key),RequestHash(SHA-256 of raw request) - Host info:
Scheme,Hostname,Port,IP - Request:
Method,Path,URL,RequestHeaders(JSONB),RawRequest(bytea),RequestBody(bytea) - Response:
StatusCode,ResponseHeaders(JSONB),RawResponse(bytea),ResponseBody(bytea),ResponseTitle,ResponseWords - Parameters:
Parameters(JSONB array ofEmbeddedParam) - Risk:
RiskScore,Remarks(JSONB array) - Metadata:
Source,SentAt,ReceivedAt,CreatedAt
Finding (table: findings)
- Identity:
ID(auto-increment),FindingHash(unique constraint for dedup) - Module info:
ModuleID,ModuleName,Description,Severity,Confidence - Match data:
MatchedAt(JSONB array),ExtractedResults,Request,Response - Relations:
HTTPRecordUUIDs(JSONB array),ScanUUID - Grouped evidence:
AdditionalEvidence(JSONB array of strings) — request/response pairs from duplicate findings that were merged into this survivor (capped at 10 entries)
finding_records junction table links findings to HTTP records (many-to-many).
Converters — pkg/database/converters.go
HTTPRecord.FromHttpRequestResponse()— converts the in-memory type to the DB model. Generates UUID, parses URL, copies headers/body, computes hashes, extracts HTML title, counts response words.Finding.FromResultEvent()— mapsResultEventfields toFinding. SetsFindingHash = event.ID()(the SHA-1 dedup hash).
Repository — pkg/database/repository.go
Key methods:
| Method | Description |
|---|---|
SaveRecord() | Single INSERT, returns UUID |
SaveRecordsBatch() | Bulk INSERT in one transaction |
SaveFinding() | INSERT ON CONFLICT (finding_hash) DO NOTHING + evidence append + junction table |
DeduplicateFindings() | Post-phase grouping: merge findings sharing (module_id, severity, matched_at URL) |
CreateScanWithCursor() | Creates scan record, copies cursor from last completed scan |
CountRecordsAfterCursor() | Counts new records since cursor (used for feedback loop) |
GetRecordsWithResponseBody() | UUID-cursor pagination for batch scanning (Kingfisher) |
UpdateRiskScores() | Batch CASE/WHEN UPDATE, 500 UUIDs per statement |
RecordWriter — pkg/database/record_writer.go
Batched asynchronous persistence for high-throughput ingestion:
Write()converts toHTTPRecord, sends to buffered channel, blocks until flushed.flushLoop()runs as a single background goroutine: accumulates batch, flushes on batch-full or ticker-fire viarepo.SaveRecordsBatch().- Each caller gets a
WriteResult{UUID, Err}back on a per-request result channel.
Stage 12: Supporting Systems
Scope Matching — internal/config/scope_matcher.go
ScopeMatcher evaluates items against configurable rules across multiple dimensions (all AND-ed):
- Host: glob match + origin mode filtering (cached per host)
- Path:
filepath.Matchglob patterns - Static file extension: configurable extension set
- Status code: exact, wildcard (
2xx), or range (400-499) - Content type: glob patterns for request and response
- Body strings: case-insensitive substring matching on request/response bodies
| Mode | Matching Rule |
|---|---|
all | No restriction |
strict | Exact hostname match |
balanced | eTLD+1 must match (e.g., *.example.com) |
relaxed (default) | Host contains target keyword |
Rate Limiting — pkg/core/ratelimit/host_limiter.go
HostRateLimiter provides per-host concurrency control:
- 32 fixed shards with inline FNV-1a hashing for shard selection.
- Each host gets a buffered channel semaphore (capacity =
MaxPerHost, default 2). Acquire(ctx, host)blocks until a slot is free;Release(host)frees a slot.- Background eviction goroutine removes idle entries (default: 30s idle, checked every 10s).
- Per-shard capacity cap with oldest-entry eviction when exceeded.
Host Error Circuit Breaker — pkg/core/hosterrors/
hosterrors.Cache tracks consecutive errors per host:
MarkFailed()increments the error counter (with regex-based error matching).Check()returns true when the counter reachesMaxHostError(default 30).MarkSuccess()resets the counter (but not if already at threshold).- The executor’s
feedItems()pre-checks this to skip items for quarantined hosts.
JS Extension Hooks — pkg/jsext/hooks.go
Pre-hooks (PreHookExecutor): transform or filter requests before module dispatch. Return nil to skip the item.
Post-hooks (PostHookExecutor): transform or filter results before output. Return nil to drop the result.
HookChain executes hooks sequentially, passing each hook’s output to the next. On error, the hook is skipped (non-fatal). On nil return, the chain is aborted immediately.
Each hook uses a VMPool (sync.Pool of Sobek VMs) — VMs are reused across concurrent invocations with no shared mutable state.
OAST (Out-of-Band)
Out-of-band callback detection for blind vulnerabilities (SSRF, XXE, etc.). The OAST service generates unique callback URLs per module/parameter/request, and is flushed at the end of the scan with a grace period to catch late callbacks.Deduplication and Finding Grouping
Three levels of deduplication prevent noise and redundancy:-
Request-level:
DedupManagerprevents scanning duplicate requests (checked before module dispatch). -
Finding-level (inline):
finding_hashunique constraint in the database usesINSERT ON CONFLICT DO NOTHING. When a duplicate hash is detected at insert time,appendRecordsToFinding()appends the new HTTP record UUIDs and request/response pair (asAdditionalEvidence) to the existing finding instead of creating a new row. -
Finding-level (post-phase grouping):
DeduplicateFindings()runs after the KnownIssueScan and audit phases. It groups findings that share the same(module_id, severity, matched_at[0] URL)within a project — this catches cases where the same module fires multiple times on the same URL with different payloads (e.g., an injection probe producing dozens of results per endpoint). The grouping process:- Partitions findings by
module_id || severity || matched_at[0]and orders bycreated_at ASC - Keeps the earliest finding per group as the survivor
- Collects request/response pairs from duplicates into the survivor’s
AdditionalEvidencefield (capped at 10 entries to bound storage) - Deletes all duplicate findings and their
finding_recordsjunction rows - Returns counts of deleted findings and merged groups for user feedback
- Partitions findings by
Putting It All Together
End-to-End Flow
Summary Table
| Stage | Key File | Key Function | Data In | Data Out |
|---|---|---|---|---|
| CLI Entry | cmd/vigolium/main.go | main() → cli.Execute() | CLI args | — |
| Config | pkg/cli/scan.go | runScanCmd() | Flags + YAML | *types.Options, *config.Settings |
| Input | pkg/input/source/ | InputSource.Next() | URLs/files/stdin | *work.WorkItem |
| HTTP Types | pkg/httpmsg/ | GetRawRequestFromURL() | URL string | *HttpRequestResponse |
| Runner | internal/runner/runner.go | RunNativeScan() | Options + Settings | Phase results |
| Executor | pkg/core/executor.go | Execute() → processItem() | InputSource + modules | bool (found results) |
| Insertion Points | pkg/httpmsg/insertion_point.go | CreateAllInsertionPoints() | Raw request bytes | []InsertionPoint |
| Module Dispatch | pkg/modules/ | ScanPer{Host,Request,InsertionPoint}() | *HttpRequestResponse | []*ResultEvent |
| Result Emission | pkg/core/executor.go | emitResult() | *ResultEvent | DB write + output |
| Output | pkg/output/output.go | StandardWriter.Write() | *ResultEvent | Console/JSON/HTML/file |
| DB Persistence | pkg/database/ | SaveRecord(), SaveFinding() | HTTP types / ResultEvent | HTTPRecord, Finding |
