Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.vigolium.com/llms.txt

Use this file to discover all available pages before exploring further.

Architecture series: Overview · Native Scan · Agentic Scan · Data & Storage · Server & API
Every scan path in Vigolium, native, agentic, ingestion, converges on the same persistence layer. This document explains how data is scoped (multi-tenancy), modeled (the database schema), written (the repository pattern and async writer), and moved between machines (cloud storage). It is the architectural companion to the task-oriented Projects API and Storage API guides; reach for those for CLI/API recipes.

1. Multi-Tenancy: the project_uuid spine

All scan data is partitioned by project, a named container with a UUID, optional config overlay, and optional access-control lists. There is no separate database per project; isolation is a project_uuid column on every major table, filtered on every read and stamped on every write.
Built-in defaults
  → ~/.vigolium/vigolium-configs.yaml          (global config)
    → ~/.vigolium/projects/<uuid>/config.yaml  (per-project overlay)
      → --scanning-profile                     (scanning profile)
        → CLI flags                            (highest precedence)
  • Default project: 00000000-0000-0000-0000-000000000001, created during vigolium init. Used whenever no project is selected.
  • Selection precedence: --project-uuid > --project-name > VIGOLIUM_PROJECT_UUID > VIGOLIUM_PROJECT (legacy) > default. On the server, the X-Project-UUID request header plays the same role.
  • Project config is a partial YAML overlay (same shape as a scanning profile) at ~/.vigolium/projects/<uuid>/config.yaml; only the keys it sets are overridden.
  • Access control: allowed_emails / allowed_domains on the project row gate server requests that carry X-User-Email, exact-email list wins, else domain list, else open; a missing email header skips the check; denial is 403. VIGOLIUM_PROJECT_READONLY=true disables all mutating project CLI subcommands.

Tables carrying project_uuid

scans · http_records · findings · scopes · source_repos · oast_interactions · scan_logs Existing databases are migrated in place, the column is added with the default-project UUID as its default, so pre-multi-tenancy data lands in the default project.

2. The database backend

Vigolium uses the repository pattern over Bun ORM, with two interchangeable backends:
BackendDriverUse
SQLite (default)sqliteshim → moderncSingle-binary, zero-config, local scans
PostgreSQLpgdriverShared/server deployments, concurrent writers
The schema is intentionally denormalized, there are no separate hosts or parameters tables; JSONB columns carry structured sub-data. This keeps a single http_records row self-contained and avoids join fan-out on the hot ingestion path.

Core models (pkg/database/models.go)

HTTPRecord (table http_records), the unit of ingested traffic:
  • Identity: UUID (PK), RequestHash (SHA-256 of the raw request, used for per-source dedup)
  • Host: Scheme, Hostname, Port, IP
  • Request: Method, Path, URL, RequestHeaders (JSONB), RawRequest, RequestBody
  • Response: StatusCode, ResponseHeaders (JSONB), RawResponse, ResponseBody, ResponseTitle, ResponseWords
  • Derived: Parameters (JSONB array of EmbeddedParam), RiskScore, Remarks
  • Metadata: Source, SentAt, ReceivedAt, CreatedAt
Finding (table findings), a detected issue:
  • Identity: ID (auto-increment), FindingHash (unique constraint → dedup key, set from the ResultEvent ID)
  • Module: ModuleID, ModuleName, Description, Severity, Confidence
  • Evidence: MatchedAt (JSONB), ExtractedResults, Request, Response, AdditionalEvidence (merged-duplicate request/response pairs, capped at 10)
  • Relations: HTTPRecordUUIDs (JSONB), ScanUUID; the finding_records junction table is the many-to-many link to HTTP records.
Agent-produced findings (autopilot/swarm/vigolium-audit/piolium) flow into the same findings table, tagged by source so native and AI results coexist and dedup together.

Converters (pkg/database/converters.go)

The in-memory scan types never touch the DB directly. HTTPRecord.FromHttpRequestResponse() and Finding.FromResultEvent() are the only seam, they generate UUIDs, compute hashes, parse URLs, extract titles, and count words, keeping persistence concerns out of the executor and modules.

3. Write paths

Method (pkg/database/repository.go)Role
SaveRecord() / SaveRecordsBatch()Single vs. bulk INSERT (batch = one transaction)
SaveFinding()INSERT … ON CONFLICT (finding_hash) DO NOTHING + evidence append + junction rows
DeduplicateFindings()Post-phase grouping: merge findings sharing (module_id, severity, matched_at URL)
CreateScanWithCursor() / CountRecordsAfterCursor()Cursor bookkeeping for incremental rescans
GetRecordsWithResponseBody()UUID-cursor pagination for batch scanners (e.g. Kingfisher)
UpdateRiskScores()Batched CASE/WHEN UPDATE, 500 UUIDs per statement

Async batched ingestion, RecordWriter

High-throughput ingestion (proxy capture, bulk import, spidering) does not call the repository synchronously. pkg/database/record_writer.go fronts it with a buffered channel:
Write() ──► buffered chan (cap 4096) ──► single flushLoop goroutine
                                            │  batch of 128, or 50ms tick

                                   repo.SaveRecordsBatch()  (one txn)
Each caller blocks until its row is flushed and gets a WriteResult{UUID, Err} back on a per-request result channel, backpressure is the channel capacity, ordering is preserved, and the DB sees large batched transactions instead of a write per request.
SQLite DSN note: the modernc driver needs pragmas in _pragma=name(value) form; the mattn-style _busy_timeout= is silently ignored. Relevant when tuning concurrent-writer behavior.

4. Deduplication

Two layers, by design:
  1. Per-source HTTP-record dedup: RequestHash (SHA-256 of the raw request) plus DeduplicateRecordsBySource collapses re-ingested identical requests within a source.
  2. Finding dedup: the finding_hash unique constraint prevents exact duplicates at insert time; DeduplicateFindings() runs after a phase to group near-duplicates (same module/severity/URL), folding the extra request/response pairs into the survivor’s AdditionalEvidence (capped). The multi-driver audit command runs an additional project-wide findings dedup pass once its drivers exit.

5. Cloud storage (optional)

Storage is disabled by default. When enabled (storage.enabled: true), a single minio-go S3 client talks to GCS (HMAC), S3, or self-hosted MinIO, the driver differs, the rest is identical.
gs://<project-uuid>/<key>   ⇒   s3://<storage.bucket>/<project-uuid>/<key>
The key architectural point: the project UUID is the in-bucket prefix, not the bucket name. Every key is validated (storage.ValidateKey rejects .., backslashes, absolute paths) and project-prefixed server-side, so one bucket safely holds many projects and clients cannot reach outside their own.
Conventional prefixProducer
ugc/<file>vigolium storage upload (default)
imports/<base>-<ts>.<ext>vigolium import --upload
native-scans/<scan-uuid>/results.tar.gzvigolium scan --upload-results
agentic-scans/<run-uuid>/results.tar.gzvigolium agent … --upload-results
gs:// URLs are first-class inputs/outputs: vigolium import gs://… downloads-then-imports (detecting archon folders or JSONL inside .tar.gz/.zip), and any export -o gs://… writes locally then uploads on success. The {ts} and {project-uuid} placeholders expand in any -o path. The bundle export format round-trips a full snapshot (JSONL + HTML report + manifest + agent session dirs) that another machine can re-import.

Projects API

Project CLI/API recipes and access-control management.

Storage API

Full vigolium storage command and gs:// reference.

Native Scan

Stage 11 traces a finding from ResultEvent to row.

Configuration

The storage: and project config YAML blocks.