Skip to main content
This document covers the benchmark testing system for validating Vigolium’s detection capabilities across all scanner modules. It explains the architecture, target applications, how to run benchmarks, how to add new test cases, and how to interpret the coverage report.

Table of Contents


Overview

The benchmark system validates that Vigolium’s active and passive scanner modules detect known vulnerabilities in controlled environments. It uses a data-driven approach: YAML files define target applications, endpoints, expected modules, and assertions. A shared Go test harness loads these definitions and drives execution.

Test Categories

CategoryBuild TagTargetsAssertionsRequirements
WhiteboxcanaryDocker containers (DVWA, VAmPI, Juice Shop, OopsSec Store, NextJS VulnExamples, etc.)Strict + SoftDocker
crAPIcanaryDocker Compose (crAPI — 10 services)SoftDocker + make crapi-up
XBOWxbowCTF-style benchmarks built from source (XSS, SSTI, SQLi, LFI, CmdI, SSRF, XXE)Strict + SoftDocker + XBOW_SOURCE_DIR
BlackboxblackboxExternal demo sites (Acunetix, PortSwigger, IBM)Soft onlyInternet
SASTsastStatic analysis pipeline (route extraction, SARIF parsing, handoff)Strict + Softast-grep binary (Layer 1 only)
CoveragecanaryNone (analyzes YAML definitions)N/ANone

Relationship to Existing Tests

The benchmark system complements the existing test tiers:
TierLocationPurpose
Unit testspkg/*/Fast, isolated function-level tests
E2E teststest/e2e/HTTP client, server, and pipeline integration
Canary teststest/e2e/Original per-app vulnerability detection tests
Benchmark teststest/benchmark/Data-driven module coverage validation (whitebox, blackbox, xbow, SAST)
Integration teststest/benchmark/xss_scanner/Brutelogic XSS gym (external)
The benchmark system is designed to eventually supersede the per-app canary tests in test/e2e/ (dvwa_test.go, vampi_test.go, juiceshop_test.go) by providing the same coverage through YAML definitions with less boilerplate.

Target Applications

Vigolium benchmarks against a diverse set of intentionally vulnerable applications. Each application covers different vulnerability categories, tech stacks, and scanning approaches (DAST, SAST, or both).

DAST Targets (Docker-based)

ApplicationTech StackVulnerability CategoriesDocker SourcePort
DVWAPHP / MySQLSQLi, XSS (reflected + DOM), LFI, Command Injection, CRLF, CSRFvulnerables/web-dvwa:latest80
VAmPIPython / FlaskSQLi, NoSQLi, CORS, JWT, Mass Assignment, Info Disclosureerev0s/vampi:latest5000
Juice ShopNode.js / AngularSQLi, XSS, Swagger exposure, JWT, CSRF, Info Disclosurebkimminich/juice-shop:latest3000
crAPIGo + Python + Node.js microservicesOWASP API Top 10 (BOLA, BFLA, Mass Assignment, SSRF, SQLi, NoSQLi)Docker Compose (10 services)8888
OopsSec StoreNext.js / SQLiteSQLi, XSS, SSRF, LFI, XXE, IDOR, CORS, CSRF, Open Redirect, File UploadBuilt from source3000
NextJS VulnExamplesNext.js / PostgreSQLMissing Authentication, Missing Authorization, Secrets Exposure, Stored XSSBuilt from source3000
Vulnerable JavaJava / SpringSQLi, XSS, SSRF, Path TraversalDocker image8080
Vulnerable NginxNginxMisconfigurations, Path Traversal, CRLF, Header InjectionDocker image80

NextJS VulnExamples — Detailed Breakdown

Source: upleveled/security-vulnerability-examples-next-js-postgres This application is an educational project demonstrating six categories of security flaws in a Next.js + PostgreSQL stack. It provides both vulnerable implementations and secure solutions for each category, making it valuable for both positive detection and negative (false positive) testing.
ExampleVulnerabilityVulnerable RouteTypeWhat’s Wrong
1Missing AuthenticationGET /api/example-1-.../vulnerableRoute HandlerNo session token check — returns blog posts to anyone
2Missing AuthenticationGET /example-2-.../vulnerableServer ComponentNo session check — queries DB and renders directly
3Missing AuthorizationGET /api/example-3-.../vulnerableRoute HandlerChecks auth but returns ALL users’ unpublished posts
4Missing AuthorizationGET /example-4-.../vulnerableServer ComponentNo auth + returns all users’ data
5Secrets ExposureGET /example-5-.../vulnerableServer ComponentLeaks process.env.API_KEY and password hashes to client
6Stored XSSGET /example-6-.../vulnerableServer ComponentdangerouslySetInnerHTML with <img onerror="alert('pwned')">
Seed data includes two users (alice/abc, bob/def) and 7 blog posts, including two with XSS payloads. Each example also has 1-3 solution variants that fix the vulnerability. These serve as negative test cases — the scanner should NOT flag them.

SAST Targets (Static Analysis)

The SAST benchmark suite validates the source-aware scanning pipeline using source stubs — minimal, syntactically valid framework code that exercises key patterns. See whitebox-sast for full details.
FrameworkSource StubRoutesKey Patterns
Gin (Go)sast-stubs/gin/~12 routesCRUD, groups, Any, path params
FastAPI (Python)sast-stubs/fastapi/~11 routesPath, Query, Body params
Express (JS)sast-stubs/express/~8 routesRouter, groups, all
Django (Python)sast-stubs/django/~9 routesURL patterns, class views
Flask (Python)sast-stubs/flask/~7 routesDecorators, add_url_rule
Next.js (TS)sast-stubs/nextjs/3+ handlersApp Router, Pages Router
Next.js OopsSec (TS)sast-stubs/nextjs-oopssec/15+ handlersDynamic routes, middleware, body parsing, header extraction
Next.js VulnExamples (TS)sast-stubs/nextjs-vulnexamples/9+ handlersMissing auth/authz, secrets exposure, XSS, solution variants
Go HTTP (Go)sast-stubs/gohttp/3 routesHandleFunc
The Next.js VulnExamples SAST stub is unique because it includes both vulnerable and secure code paths. Its SARIF fixture (semgrep-nextjs-vulnexamples.sarif) covers 6 findings across 4 vulnerability categories:
FindingSeverityFileCategory
dangerouslySetInnerHTML with unsanitized DB contentMediumexample-6-cross-site-scripting/vulnerable/page.tsxXSS
Route handler missing authentication checkHighexample-1-missing-authentication/vulnerable/route.tsMissing AuthN
Data access without authorization scopingHighexample-3-missing-authorization/vulnerable/route.tsMissing AuthZ
process.env.API_KEY passed to client componentHighexample-5-secrets-exposure/vulnerable/page.tsxSecrets Exposure
getUsersWithPasswordHash() result sent to clientHighexample-5-secrets-exposure/vulnerable/page.tsxSecrets Exposure
SELECT * on users table exposes password hashesMediumdatabase/users.tsData Exposure

Blackbox Targets (External Sites)

SiteURLVulnerability Categories
Acunetix TestPHPtestphp.vulnweb.comSQLi, XSS, LFI, Directory Traversal
Gin & Juice Shopginandjuice.shop (PortSwigger)SQLi, XSS, SSTI, SSRF, Access Control
IBM Testfiredemo.testfire.netSQLi, XSS, Authentication Bypass

XBOW Targets (CTF-style)

13 self-contained vulnerable applications from the validation-benchmarks repository:
Vuln TypeCountBenchmarks
XSS2XBEN-013-24, XBEN-047-24
SSTI3XBEN-009-24, XBEN-053-24, XBEN-076-24
SQLi2XBEN-083-24, XBEN-071-24
LFI2XBEN-019-24, XBEN-061-24
Command Injection1XBEN-073-24
SSRF1XBEN-020-24
XXE2XBEN-006-24, XBEN-096-24

Architecture

                    YAML Definitions
                    (dvwa.yaml, vampi.yaml, nextjs-vulnexamples.yaml,
                     xbow/*.yaml, blackbox/*.yaml, whitebox/*.yaml)


                   ┌───────────────┐
                   │  Go Harness   │
                   │  (harness/)   │
                   │               │
                   │ LoadDefinition│  ← expands $XBOW_SOURCE_DIR
                   │ SetupTestInfra│
                   │ StartApp...   │  ← routes by app type
                   └───────┬───────┘

           ┌───────┬───────┼───────┬───────┐
           ▼       ▼       ▼       ▼       ▼
       ┌───────┐┌──────┐┌──────┐┌──────┐┌──────┐
       │docker ││compo-││ xbow ││exter-││Cover-│
       │       ││se    ││      ││nal   ││age   │
       │testcon││wait  ││build ││check ││      │
       │tainers││for   ││start ││avail ││scan  │
       │-go    ││base  ││port  ││      ││YAMLs │
       │       ││url   ││stop  ││      ││      │
       └───┬───┘└──┬───┘└──┬───┘└──┬───┘└──────┘
           └───────┴───────┴───────┘

              ┌────────┼────────┐
              ▼        ▼        ▼
        ┌──────────┐┌──────────┐
        │  Active  ││ Passive  │
        │  Runner  ││  Runner  │
        │          ││          │
        │ Resolve  ││ Fetch URL│
        │ module   ││ Attach   │
        │ Build RR ││ response │
        │ (GET or  ││          │
        │  POST)   ││ Call     │
        │ Call     ││ ScanPer* │
        │ ScanPer* ││          │
        └──────────┘└──────────┘
              │            │
              ▼            ▼
        Apply Assertion (strict/soft/negative)

Key Design Decisions

  1. YAML-driven: Test cases are defined in YAML, not Go code. Adding a new test case is a one-line YAML addition.
  2. Module resolution by ID: Test cases reference modules by their registry ID (e.g., active-sqli-error-based). The harness resolves them from modules.DefaultRegistry.
  3. Scan type dispatch: The harness checks module.ScanScopes() to dispatch to the correct method:
    • ScanScopeInsertionPoint: Creates insertion points via httpmsg.CreateAllInsertionPoints(), filters by AllowedInsertionPointTypes(), calls ScanPerInsertionPoint() for each.
    • ScanScopeRequest: Calls ScanPerRequest() once with the full request.
    • ScanScopeHost: Calls ScanPerHost() once.
  4. Passive fetch-then-scan: Passive tests fetch the URL first using the HTTP client, attach the raw response to the HttpRequestResponse, then pass it to the passive module.
  5. App-specific auth: Some apps (e.g., DVWA) require authentication before vulnerability pages work. The SetupAppAuth() function dispatches per-app setup (DB init, CSRF token extraction, login) and returns headers (cookies) to inject into all test cases.
  6. Network init safety: network.Init() is called once per process via sync.Once to avoid LevelDB close/reopen issues when running multiple test functions sequentially.

Directory Structure

test/benchmark/
├── harness/                        # Shared Go library (not a test package)
│   ├── types.go                    # BenchmarkDefinition, TestCase, AppConfig structs
│   ├── sast_types.go               # SAST definition types
│   ├── sast_loader.go              # SAST YAML loaders
│   ├── harness.go                  # TestInfra, YAML loader, module resolver, assertions
│   ├── container.go                # Docker container lifecycle (testcontainers-go + compose)
│   ├── compose.go                  # Docker Compose CLI lifecycle (build/start/port/stop)
│   ├── external.go                 # External site availability checks
│   ├── passive_helper.go           # Fetch-then-scan helper for passive modules
│   ├── oast_helper.go              # OAST mocking for active scanning
│   ├── report.go                   # Coverage report generator
│   └── harness_test.go             # Unit tests for the harness itself

├── definitions/                    # YAML benchmark definitions (DAST)
│   ├── dvwa.yaml                   # DVWA: XSS, SQLi, LFI, cmd injection, passive checks
│   ├── vampi.yaml                  # VAmPI: SQLi, NoSQLi, CORS, passive checks
│   ├── juiceshop.yaml              # Juice Shop: SQLi, XSS, Swagger, JWT, passive checks
│   ├── crapi.yaml                  # crAPI: OWASP API Top 10 with auth flow
│   ├── oopssec-store.yaml          # OopsSec Store: SQLi, XSS, SSRF, LFI, XXE, IDOR, CORS
│   ├── nextjs-vulnexamples.yaml    # NextJS VulnExamples: missing auth/authz, secrets, XSS
│   ├── vulnerable-java.yaml        # DataDog vulnerable-java
│   ├── vulnerable-nginx.yaml       # Detectify vulnerable-nginx
│   ├── blackbox/
│   │   ├── acunetix.yaml           # testphp.vulnweb.com
│   │   ├── ginandjuice.yaml        # ginandjuice.shop (PortSwigger)
│   │   └── testfire.yaml           # demo.testfire.net (IBM AppScan)
│   ├── whitebox/                   # SAST benchmark definitions
│   │   ├── extraction/             #   Layer 1: route extraction
│   │   │   ├── gin-extraction.yaml
│   │   │   ├── fastapi-extraction.yaml
│   │   │   ├── express-extraction.yaml
│   │   │   ├── django-extraction.yaml
│   │   │   ├── flask-extraction.yaml
│   │   │   ├── nextjs-extraction.yaml
│   │   │   ├── nextjs-oopssec-extraction.yaml
│   │   │   ├── nextjs-vulnexamples-extraction.yaml
│   │   │   └── gohttp-extraction.yaml
│   │   ├── sarif/                  #   Layer 2: SARIF parsing
│   │   │   ├── semgrep-normal.yaml
│   │   │   ├── semgrep-multirule.yaml
│   │   │   ├── semgrep-nextjs-vulnexamples.yaml
│   │   │   ├── trivy-normal.yaml
│   │   │   ├── trivy-multirule.yaml
│   │   │   └── sarif-edge-cases.yaml
│   │   └── handoff/                #   Layer 3: route-to-HRR conversion
│   │       ├── gin-handoff.yaml
│   │       ├── fastapi-handoff.yaml
│   │       ├── express-handoff.yaml
│   │       ├── nextjs-oopssec-handoff.yaml
│   │       └── nextjs-vulnexamples-handoff.yaml
│   └── xbow/                       # XBOW CTF-style validation benchmarks
│       ├── xbow-xss-013.yaml
│       ├── xbow-ssti-009.yaml
│       ├── ...                      # 13 XBOW definitions total
│       └── xbow-xxe-096.yaml

├── whitebox/                       # Docker-based tests (build tag: canary)
│   ├── active_test.go              # Data-driven active module test runner
│   ├── passive_test.go             # Data-driven passive module test runner
│   ├── crapi_test.go               # crAPI with auth flow handling
│   └── debug_test.go               # Debug helpers for direct module invocation

├── blackbox/                       # External site tests (build tag: blackbox)
│   ├── active_test.go              # Active scanning with rate limiting
│   └── passive_test.go             # Passive analysis

├── sast/                           # SAST pipeline tests (build tag: sast / sast_e2e)
│   ├── helpers.go                  # Shared utilities (no build tag)
│   ├── extraction_test.go          # Layer 1: route extraction
│   ├── sarif_test.go               # Layer 2: SARIF parsing
│   ├── handoff_test.go             # Layer 3: route-to-HRR conversion
│   └── e2e_test.go                 # Layer 4: full pipeline

├── xbow/                           # XBOW validation tests (build tag: xbow)
│   └── xbow_test.go                # Data-driven runner with per-vuln-type functions

├── coverage/
│   └── report_test.go              # Module coverage matrix generator

└── xss_scanner/                    # Pre-existing Brutelogic XSS gym
    └── brutelogic_test.go

test/testdata/
├── sast-stubs/                     # Minimal framework source code for SAST
│   ├── gin/
│   ├── fastapi/
│   ├── express/
│   ├── django/
│   ├── flask/
│   ├── nextjs/
│   ├── nextjs-oopssec/             # 15 API routes + middleware
│   ├── nextjs-vulnexamples/        # 9 routes + database layer + vulnerable/solution variants
│   └── gohttp/

├── sast-sarif/                     # SARIF fixture JSON files
│   ├── semgrep-normal.sarif
│   ├── semgrep-nextjs-vulnexamples.sarif
│   ├── ...                          # 10 fixtures total
│   └── sarif-severity-mapping.sarif

└── vulnerable-apps/                # Docker Compose configs for vulnerable apps
    ├── crapi/
    ├── oopssec-store/
    ├── nextjs-vulnexamples/         # Next.js + PostgreSQL (built from GitHub)
    ├── vulnerable-java/
    └── vulnerable-nginx/

Prerequisites

Whitebox Tests

  • Docker: Required for testcontainers-go to start vulnerable app containers
  • Docker images: Pulled automatically on first run
    • vulnerables/web-dvwa:latest
    • erev0s/vampi:latest
    • bkimminich/juice-shop:latest
  • Docker Compose: Required for apps that build from source (OopsSec Store, NextJS VulnExamples)

crAPI Tests

  • Docker Compose: crAPI requires 10 services (PostgreSQL, MongoDB, multiple microservices)
  • Manual startup: crAPI must be started before running tests:
    make crapi-up          # Start crAPI (takes ~2 minutes)
    make crapi-status      # Verify all services are healthy
    

SAST Tests

  • ast-grep binary: Required for Layer 1 (route extraction). Auto-downloaded on first run, or install manually:
    brew install ast-grep   # macOS
    
  • No Docker needed: Layers 2-3 use static fixture data only

XBOW Validation Benchmarks

  • Docker with Compose: Required to build and run benchmark containers
  • XBOW source directory: The validation-benchmarks repository checked out locally
  • XBOW_SOURCE_DIR environment variable: Must point to the root of the validation-benchmarks checkout (e.g., /path/to/validation-benchmarks). Set via environment or passed through the Makefile.
  • Disk space: Each benchmark builds a Docker image from source. Pre-build with make xbow-build to cache layers.

Blackbox Tests

  • Internet connectivity: Required to reach external demo sites
  • No Docker needed: Tests run against public websites

Running Benchmarks

Make Targets

CommandWhat it runsRequirements
make test-benchmark-whiteboxDVWA, VAmPI, Juice Shop, OopsSec, VulnExamples (Docker)Docker
make test-benchmark-blackboxAcunetix, Gin&Juice, Testfire (external)Internet
make test-benchmark-allAll whitebox + blackboxDocker + Internet
make test-benchmark-crapicrAPI onlyDocker + make crapi-up
make test-benchmark-coverageGenerate coverage reportNone
make test-sastSAST Layers 1-3 (extraction + SARIF + handoff)ast-grep binary
make test-sast-extractionLayer 1 onlyast-grep binary
make test-sast-sarifLayer 2 onlyNone
make test-sast-handoffLayer 3 onlyNone
make test-sast-e2eLayer 4 (full pipeline)ast-grep binary
make test-xbowAll 13 XBOW validation benchmarksDocker + XBOW_SOURCE_DIR
make test-xbow-ssti3 SSTI benchmarksDocker + XBOW_SOURCE_DIR
make test-xbow-xss2 XSS benchmarksDocker + XBOW_SOURCE_DIR
make test-xbow-sqli2 SQLi benchmarksDocker + XBOW_SOURCE_DIR
make test-xbow-lfi2 LFI benchmarksDocker + XBOW_SOURCE_DIR
make test-xbow-cmdi1 CmdI benchmarkDocker + XBOW_SOURCE_DIR
make test-xbow-ssrf1 SSRF benchmarkDocker + XBOW_SOURCE_DIR
make test-xbow-xxe2 XXE benchmarksDocker + XBOW_SOURCE_DIR
make xbow-buildPre-build all XBOW Docker imagesDocker + XBOW_SOURCE_DIR

Running Individual App Benchmarks

# DVWA active modules only
go test -v -tags=canary -run TestWhitebox_DVWA_Active ./test/benchmark/whitebox/...

# VAmPI passive modules only
go test -v -tags=canary -run TestWhitebox_VAmPI_Passive ./test/benchmark/whitebox/...

# Juice Shop all (active + passive)
go test -v -tags=canary -run "TestWhitebox_JuiceShop" ./test/benchmark/whitebox/...

# NextJS VulnExamples active modules
go test -v -tags=canary -run TestWhitebox_NextJSVulnExamples_Active ./test/benchmark/whitebox/...

# NextJS VulnExamples passive modules
go test -v -tags=canary -run TestWhitebox_NextJSVulnExamples_Passive ./test/benchmark/whitebox/...

# OopsSec Store all
go test -v -tags=canary -run "TestWhitebox_OopssecStore" ./test/benchmark/whitebox/...

# crAPI (requires `make crapi-up`)
go test -v -tags=canary -run TestWhitebox_CrAPI ./test/benchmark/whitebox/...

Running SAST Benchmarks

# All SAST layers
make test-sast

# Layer 2 only (no external deps)
make test-sast-sarif

# NextJS VulnExamples extraction
go test -tags=sast -v -run TestExtraction_NextJS_VulnExamples ./test/benchmark/sast/...

# NextJS VulnExamples handoff
go test -tags=sast -v -run TestHandoff_NextJS_VulnExamples ./test/benchmark/sast/...

# NextJS VulnExamples SARIF fixture
go test -tags=sast -v -run "TestSARIF_All/semgrep-nextjs-vulnexamples" ./test/benchmark/sast/...

# Full E2E pipeline
go test -tags=sast_e2e -v -run TestSAST_E2E ./test/benchmark/sast/...

Running XBOW Validation Benchmarks

# Run all xbow benchmarks
make test-xbow

# Run by vulnerability type
make test-xbow-ssti
make test-xbow-xss
make test-xbow-sqli

# Run a single benchmark by name
XBOW_SOURCE_DIR=/path/to/validation-benchmarks \
  go test -v -tags=xbow -timeout 15m -run "TestXbow_All/xbow-ssti-053" ./test/benchmark/xbow/...

# Override the source directory
make test-xbow XBOW_SOURCE_DIR=/custom/path/to/validation-benchmarks

# Pre-build all containers (recommended before first run)
make xbow-build

Running Individual Blackbox Benchmarks

# Acunetix testphp.vulnweb.com
go test -v -tags=blackbox -run TestBlackbox_Acunetix ./test/benchmark/blackbox/...

# PortSwigger ginandjuice.shop
go test -v -tags=blackbox -run TestBlackbox_GinAndJuice ./test/benchmark/blackbox/...

# IBM Testfire
go test -v -tags=blackbox -run TestBlackbox_Testfire ./test/benchmark/blackbox/...

Running All Benchmarks for a Specific Module

To check a specific module’s detection across all apps, filter by test case ID:
# All SQLi error-based tests across all whitebox apps
go test -v -tags=canary -run "sqli-error" ./test/benchmark/whitebox/...

# All security headers tests
go test -v -tags=canary -run "security-headers" ./test/benchmark/whitebox/...

# All missing-authentication tests (NextJS VulnExamples)
go test -v -tags=canary -run "missing-authn" ./test/benchmark/whitebox/...

YAML Definition Format

Each YAML file describes one target application and its test cases.

Full Schema

# Application configuration
app:
  name: dvwa                          # Unique app identifier
  type: docker                        # docker | compose | external | xbow
  image: "vulnerables/web-dvwa:latest"  # Docker image (type: docker)
  port: 80                            # Container port
  exposed_port: "80/tcp"              # Override port format (optional)
  wait_endpoint: "/"                  # Endpoint to poll for readiness
  startup_timeout: 120s               # Max wait for container startup
  base_url: "http://127.0.0.1:8888"  # Base URL (type: compose | external)
  compose_file: "path/to/compose.yaml"  # Docker Compose file (type: compose)
  build_context: "${XBOW_SOURCE_DIR}/benchmarks/XBEN-053-24"  # Path to docker-compose.yml dir (type: xbow)
  service_name: app                   # Docker Compose service to get port from (type: xbow)
  internal_port: 80                   # Port inside the container (type: xbow)
  env:                                # Environment variables (type: docker)
    vulnerable: "1"
  rate_limit: 2                       # Requests per second (type: external)

# Optional authentication flow (executed before test cases)
setup:
  auth_flow:
    - name: login                     # Step name (for logging)
      method: POST
      path: "/api/auth/login"
      headers:
        Content-Type: "application/json"
      body: '{"email":"[email protected]","password":"Admin!123"}'
      extract:
        token: "$.token"              # JSONPath to extract from response

# Test cases
test_cases:
  - id: "dvwa-xss-reflected"         # Unique test case ID
    endpoint: "/vuln?param=test"      # URL path (appended to base URL)
    method: GET                       # HTTP method (default: GET)
    headers:                          # Additional headers (optional)
      Authorization: "Bearer {{token}}"
    body: ""                          # Request body (optional)
    modules:                          # Module IDs to test (from DefaultRegistry)
      - "xss-light-url-params"
    vuln_types:                       # Expected vulnerability types (informational)
      - "xss-reflected"
    assertion: strict                 # strict | soft | negative
    min_findings: 1                   # Minimum expected findings (default: 1)
    scan_mode: active                 # active | passive
    timeout: 30s                      # Per-test timeout (blackbox only)
    description: "Reflected XSS in name parameter"

App Types

TypeContainer ManagementBase URL
dockerTestcontainers-go starts/stops container automaticallyAuto-assigned (mapped port)
composeMust be started externally (make crapi-up)Specified in base_url
xbowDocker Compose CLI builds from source, starts/stops automaticallyAuto-discovered via docker compose port
externalNo containers — uses public websitesSpecified in base_url

Scan Modes

ModeWhat Happens
activeHarness creates HttpRequestResponse from URL, resolves active module, calls ScanPerRequest/ScanPerHost
passiveHarness fetches URL with HTTP client first (to get actual response), then passes full request+response to passive module’s ScanPerRequest/ScanPerHost

Adding New Test Cases

The simplest way to expand coverage is to add test cases to existing YAML definitions.

Example: Add a new SQLi test to DVWA

Edit test/benchmark/definitions/dvwa.yaml:
test_cases:
  # ... existing cases ...

  - id: "dvwa-sqli-blind"
    endpoint: "/vulnerabilities/sqli_blind/?id=1&Submit=Submit"
    method: GET
    modules: ["sqli-time-based-params"]
    vuln_types: ["sqli-time-based"]
    assertion: soft
    min_findings: 1
    scan_mode: active
    description: "Blind SQL injection in id parameter"

Example: Add a passive module check

  - id: "dvwa-mixed-content"
    endpoint: "/"
    method: GET
    modules: ["mixed-content-detect"]
    assertion: soft
    min_findings: 0
    scan_mode: passive
    description: "Mixed content detection on main page"

Guidelines

  • Use strict assertion only when you’re confident the module will detect the vulnerability (e.g., DVWA SQLi with error-based detection).
  • Use soft assertion for new/experimental test cases or apps with protections that may block detection.
  • Use negative assertion for endpoints that should NOT trigger findings (false positive testing).
  • Module IDs must match exactly what’s registered in pkg/modules/default_registry.go. Run go test -v -run TestResolveActiveModules ./test/benchmark/harness/... to verify module resolution.

Finding Available Module IDs

# List all active module IDs
go test -v -run TestGenerateCoverageReport ./test/benchmark/harness/...

# Or check the registry directly
grep 'Register' pkg/modules/default_registry.go

Adding a New Vulnerable App

Docker-based (Whitebox)

  1. Create a YAML definition at test/benchmark/definitions/<app>.yaml:
app:
  name: webgoat
  type: docker
  image: "webgoat/webgoat:latest"
  port: 8080
  wait_endpoint: "/WebGoat"
  startup_timeout: 120s

test_cases:
  - id: "webgoat-sqli"
    endpoint: "/WebGoat/SqlInjection/attack5a?account=test&operator=test&injection=test"
    method: GET
    modules: ["sqli-error-based"]
    assertion: soft
    min_findings: 1
    scan_mode: active
  1. No Go code changes needed — the existing TestWhitebox_Active runner automatically picks up new YAML files from the definitions directory.
  2. Run it:
go test -v -tags=canary -run "TestWhitebox_Active/webgoat" ./test/benchmark/whitebox/...

Docker Compose-based (Built from Source)

For apps that need to be built from source or require multiple services (like NextJS VulnExamples):
  1. Place the Docker Compose file in test/testdata/vulnerable-apps/<app>/docker-compose.yaml. Example (nextjs-vulnexamples/docker-compose.yaml):
    services:
      db:
        image: postgres:16-alpine
        environment:
          - POSTGRES_DB=myapp
          - POSTGRES_USER=myapp
          - POSTGRES_PASSWORD=myapp
        healthcheck:
          test: ["CMD-SHELL", "pg_isready -U myapp"]
          interval: 5s
          timeout: 5s
          retries: 5
    
      app:
        build:
          context: https://github.com/org/repo.git
        ports:
          - "3000:3000"
        environment:
          - PGHOST=db
          - PGDATABASE=myapp
        depends_on:
          db:
            condition: service_healthy
    
  2. Create a YAML definition with type: xbow:
    app:
      name: my-nextjs-app
      type: xbow
      build_context: test/testdata/vulnerable-apps/my-nextjs-app
      service_name: app
      internal_port: 3000
      port: 3000
      wait_endpoint: "/"
      startup_timeout: 180s
    
  3. Add a dedicated test function in test/benchmark/whitebox/active_test.go:
    func TestWhitebox_MyApp_Active(t *testing.T) {
        if testing.Short() {
            t.Skip("Skipping benchmark test in short mode")
        }
        defPath := filepath.Join(harness.DefinitionsDir(), "my-app.yaml")
        def, err := harness.LoadDefinition(defPath)
        require.NoError(t, err, "Failed to load my-app definition")
        runActiveDefinition(t, def)
    }
    
  4. Optionally add SAST coverage — create a source stub and YAML definitions for the whitebox SAST pipeline (see whitebox-sast).

Adding a New Blackbox Site

  1. Create a YAML definition at test/benchmark/definitions/blackbox/<site>.yaml:
app:
  name: hackazon
  type: external
  base_url: "http://hackazon.webscantest.com"
  rate_limit: 2    # Requests per second (be conservative with external sites)

test_cases:
  - id: "hackazon-xss"
    endpoint: "/search?searchString=test"
    method: GET
    modules: ["xss-light-url-params"]
    assertion: soft    # Always soft for blackbox
    min_findings: 1
    scan_mode: active
  1. Important rules for blackbox definitions:
    • All assertions must be soft — external sites may change, go down, or add protections.
    • Set rate_limit to be respectful (2 req/sec is a good default).
    • The test runner automatically skips if the site is unreachable.
  2. Run it:
go test -v -tags=blackbox -run "TestBlackbox_Active/hackazon" ./test/benchmark/blackbox/...

Coverage Report

The coverage report compares all module IDs referenced in YAML definitions against the full DefaultRegistry.

Generate the Report

make test-benchmark-coverage
Or directly:
go test -v -tags=canary -run TestBenchmark_CoverageReport ./test/benchmark/coverage/...
This outputs a markdown table to stdout and writes test/benchmark/coverage-report.md.

Sample Output

# Vigolium Module Benchmark Coverage

**Total test cases:** 85+

**Active modules:** 20/36 (56%)

**Passive modules:** 14/17 (82%)

## Coverage Matrix

| Module ID | Type | Covered | Apps |
|-----------|------|---------|------|
| active-authn-bypass | active | Yes | nextjs-vulnexamples |
| active-code-exec | active | Yes | dvwa, crapi |
| active-cors-misconfiguration | active | Yes | vampi, juiceshop, crapi, oopssec-store, nextjs-vulnexamples |
| active-sqli-error-based | active | Yes | dvwa, vampi, juiceshop, oopssec-store, ... |
| passive-security-headers-missing | passive | Yes | dvwa, vampi, juiceshop, crapi, oopssec-store, nextjs-vulnexamples |
| passive-ssr-data-exposure | passive | Yes | oopssec-store, nextjs-vulnexamples |
| ...

## Uncovered Modules

- `active-http-request-smuggling` (active)
- `active-race-interference` (active)
- `passive-anomaly-ranking` (passive)
- ...

Understanding Coverage

  • Covered means at least one YAML test case references the module ID.
  • Coverage does NOT mean detection — a soft-asserted test case counts as covered even if the module finds nothing.
  • Some modules are intentionally deferred (see Hard-to-Benchmark Modules).

Assertion Modes

ModeBehaviorUse Case
strictTest fails if len(findings) < min_findingsKnown vulnerabilities in controlled Docker apps
softLogs a warning but test passes regardlessExperimental tests, blackbox sites, modern apps with protections
negativeTest fails if len(findings) > 0False positive testing — endpoints that should NOT trigger

Choosing the Right Assertion

Is this a Docker app with a known, reliably detectable vuln?
├─ Yes → strict
└─ No
    ├─ Is this an external site or might detection fail?
    │   └─ Yes → soft
    └─ Should this endpoint have NO findings?
        └─ Yes → negative

Harness Package Reference

The test/benchmark/harness/ package is a shared Go library (not a test package) that provides the core benchmark infrastructure.

Key Types

TypeDescription
BenchmarkDefinitionRoot struct parsed from YAML — contains AppConfig and []TestCase
AppConfigTarget application configuration (image, port, type, env, build_context, service_name, internal_port)
ComposeAppRunning Docker Compose project (project name, directory, base URL)
TestCaseSingle test case (endpoint, modules, assertion, scan mode)
TestInfraTest infrastructure (HTTP client, host errors, rate limiter, scan context)
TestResultOutcome of a single test case execution
CoverageReportModule coverage matrix
SASTExtractionDefinitionRoute extraction test: framework, source_dir, expected routes, bounds
SASTSARIFDefinitionSARIF parsing test: fixture path, tool name, format, expectations
SASTHandoffDefinitionHandoff test: framework, base URL, routes with expected requests

Key Functions

FunctionDescription
LoadDefinition(path)Load a single YAML definition file
LoadDefinitionsFromDir(dir)Load all YAML files from a directory
SetupTestInfra()Initialize HTTP client, rate limiter, scan context
SetupTestInfraWithOAST()Initialize with OAST mock provider
StartContainer(ctx, config)Start a Docker container via testcontainers-go
StartAppFromDefinition(ctx, app)Start an app based on its type (docker/compose/xbow/external)
StartComposeApp(ctx, app)Build and start an xbow Docker Compose project from source
RunActiveTestCase(t, tc, baseURL, infra)Execute an active test case
RunPassiveTestCase(t, tc, baseURL, infra)Execute a passive test case (fetch + scan)
FetchForPassiveScan(url, headers, infra)Fetch a URL and return HRR with response attached
SetupAppAuth(t, appName, baseURL)Perform app-specific auth/setup, return headers to inject
MergeHeaders(authHeaders, tcHeaders)Merge auth headers into test case headers
SetupDVWA(t, baseURL)Initialize DVWA DB, login, return session cookies
ResolveActiveModules(ids)Look up active modules from DefaultRegistry
ResolvePassiveModules(ids)Look up passive modules from DefaultRegistry
ApplyAssertion(t, tc, moduleID, findings)Check findings against assertion mode
GenerateCoverageReport(dirs...)Generate module coverage matrix
FormatCoverageMarkdown(report)Render coverage report as markdown
CheckExternalAvailability(t, url)Skip test if external site is unreachable
LoadSASTExtractionDefinitionsFromDir(dir)Load all extraction YAMLs
LoadSASTSARIFDefinitionsFromDir(dir)Load all SARIF YAMLs
LoadSASTHandoffDefinitionsFromDir(dir)Load all handoff YAMLs

Active Test Case Flow

// 1. Build HttpRequestResponse from URL (GET or POST with optional headers/body)
if tc.Method == "POST" && tc.Body != "" {
    rr, err = buildPOSTRequest(baseURL + tc.Endpoint, tc.Body, tc.Headers)
} else {
    rr, err = buildRequestWithHeaders(baseURL + tc.Endpoint, tc.Headers)
}

// 2. Resolve module by ID
mods := modules.GetActiveModulesByIDs(tc.Modules)

// 3. Dispatch based on module's ScanScope
switch {
case mod.ScanScopes().Has(modkit.ScanScopeInsertionPoint):
    // Create insertion points and scan each one
    points, _ := httpmsg.CreateAllInsertionPoints(rr.Request().Raw(), true)
    for _, ip := range points {
        if mod.AllowedInsertionPointTypes().Contains(ip.Type()) {
            findings, _ = mod.ScanPerInsertionPoint(rr, ip, httpClient, scanCtx)
        }
    }
case mod.ScanScopes().Has(modkit.ScanScopeRequest):
    findings, err = mod.ScanPerRequest(rr, httpClient, scanCtx)
case mod.ScanScopes().Has(modkit.ScanScopeHost):
    findings, err = mod.ScanPerHost(rr, httpClient, scanCtx)
}

// 4. Apply assertion
ApplyAssertion(t, tc, mod.ID(), findings)

Passive Test Case Flow

// 1. Fetch URL to get actual response (with optional auth headers)
rr, err := FetchForPassiveScan(baseURL + tc.Endpoint, tc.Headers, infra)
//    Internally: Execute request → respChain.FullResponse() → rr.WithResponse(httpResp)

// 2. Resolve passive module
mods := modules.GetPassiveModulesByIDs(tc.Modules)

// 3. Pass full request+response to passive module
findings, err = mod.ScanPerRequest(rr, scanCtx)

// 4. Apply assertion
ApplyAssertion(t, tc, mod.ID(), findings)

CI Integration

TriggerWhat to RunTimeout
On every PRmake test-sast-sarif && make test-sast-handoff5 min
On every PRmake test-benchmark-whitebox (DVWA + VAmPI only)15 min
Nightlymake test-benchmark-all (all whitebox + blackbox)30 min
Nightlymake test-sast (all SAST layers)10 min
Weeklymake test-benchmark-crapi (requires crAPI up)20 min
Weeklymake test-xbow (requires XBOW_SOURCE_DIR)30 min
Weeklymake test-sast-e2e (full pipeline)15 min
On releasemake test-benchmark-coverage1 min

Example GitHub Actions Workflow

- name: Run SAST benchmarks (no external deps)
  run: |
    make test-sast-sarif
    make test-sast-handoff
  timeout-minutes: 5

- name: Run whitebox benchmarks
  run: |
    make test-benchmark-whitebox
  timeout-minutes: 15

- name: Run blackbox benchmarks (nightly only)
  if: github.event_name == 'schedule'
  run: |
    make test-benchmark-blackbox
  timeout-minutes: 20
  continue-on-error: true  # Blackbox may fail due to external site issues

- name: Run XBOW validation benchmarks (weekly)
  if: github.event_name == 'schedule'
  run: |
    make xbow-build
    make test-xbow
  timeout-minutes: 30
  env:
    XBOW_SOURCE_DIR: ${{ github.workspace }}/validation-benchmarks

XBOW Validation Benchmarks

XBOW benchmarks are CTF-style, self-contained vulnerable applications sourced from the validation-benchmarks repository. Each benchmark is a Docker Compose project with its own source code, Dockerfile, and docker-compose.yml. Unlike the whitebox benchmarks (which pull pre-built images from Docker Hub), XBOW apps are built from source via docker compose build.

How It Works

The xbow app type in the harness triggers a Docker Compose CLI lifecycle managed by harness/compose.go:
  1. Build: docker compose -f <dir>/docker-compose.yml -p xbow-<name> build --build-arg FLAG=test
  2. Start: docker compose ... up -d --wait
  3. Port discovery: docker compose ... port <service> <internal_port> — parses the ephemeral host port
  4. Health check: Polls wait_endpoint until it responds with status < 500
  5. Test execution: Runs active test cases (GET or POST) through the standard harness
  6. Teardown: docker compose ... down -v --remove-orphans
Each benchmark uses a unique project name (xbow-<name>) to prevent container name collisions when running in parallel.

YAML Definition Format (XBOW)

XBOW definitions use the build_context, service_name, and internal_port fields instead of image and port:
app:
  name: xbow-ssti-053
  type: xbow
  build_context: "${XBOW_SOURCE_DIR}/benchmarks/XBEN-053-24"
  service_name: app             # which docker-compose service to get port from
  internal_port: 80             # port inside the container
  wait_endpoint: "/ping"
  startup_timeout: 180s

test_cases:
  - id: "xbow-053-ssti-jinja"
    endpoint: "/?username=test"
    method: GET
    modules: ["reflected-ssti"]
    vuln_types: ["ssti"]
    assertion: strict
    min_findings: 1
    scan_mode: active
    description: "Jinja2 SSTI via username query parameter (XBEN-053-24)"
POST-based test cases include body and headers:
test_cases:
  - id: "xbow-073-cmdi-ping"
    endpoint: "/ping"
    method: POST
    body: "ip_address=127.0.0.1"
    headers:
      Content-Type: "application/x-www-form-urlencoded"
    modules: ["code-exec"]
    vuln_types: ["code-exec"]
    assertion: strict
    min_findings: 1
    scan_mode: active
The ${XBOW_SOURCE_DIR} environment variable is expanded via os.ExpandEnv() when the YAML is loaded. The Makefile sets it automatically, or it can be set in your shell.

Phase 1 Benchmarks (13 total)

IDXBOWVuln TypeEndpointService:PortModuleAssertion
xbow-xss-013XBEN-013-24XSSGET /page?name=testweb:5000active-xss-light-url-paramssoft
xbow-xss-047XBEN-047-24XSSGET /page?url=testweb:5000active-xss-light-url-paramssoft
xbow-ssti-009XBEN-009-24SSTIGET /greet/?name=testweb:8000active-reflected-sstistrict
xbow-ssti-053XBEN-053-24SSTIGET /?username=testapp:80active-reflected-sstistrict
xbow-ssti-076XBEN-076-24SSTIGET /greet?name=testweb:5000active-reflected-sstistrict
xbow-sqli-083XBEN-083-24SQLiGET /?category=testweb:80active-sqli-error-basedstrict
xbow-sqli-071XBEN-071-24SQLiPOST /send.phpapp:80active-sqli-error-basedsoft
xbow-cmdi-073XBEN-073-24CmdIPOST /pingweb:80active-code-execstrict
xbow-lfi-019XBEN-019-24LFIGET /resource?filename=test.txtlfi_static_resource:80active-lfi-genericstrict
xbow-lfi-061XBEN-061-24LFIGET /post.php?id=testweb:80active-lfi-genericstrict
xbow-ssrf-020XBEN-020-24SSRFGET /redirect.php?url=...ssrf-demo-app:80active-ssrf-detectionsoft
xbow-xxe-006XBEN-006-24XXEPOST /addbook (XML)app:80active-xxe-genericsoft
xbow-xxe-096XBEN-096-24SQLiPOST /jobs (JSON)libraryv1:5000active-sqli-error-basedsoft

Test Runner Structure

The test runner at test/benchmark/xbow/xbow_test.go (build tag xbow) provides:
Test FunctionWhat it runs
TestXbow_AllAll definitions in definitions/xbow/
TestXbow_XSSOnly xbow-xss-*.yaml
TestXbow_SSTIOnly xbow-ssti-*.yaml
TestXbow_SQLiOnly xbow-sqli-*.yaml
TestXbow_CmdIOnly xbow-cmdi-*.yaml
TestXbow_LFIOnly xbow-lfi-*.yaml
TestXbow_SSRFOnly xbow-ssrf-*.yaml
TestXbow_XXEOnly xbow-xxe-*.yaml
Tests are skipped automatically if XBOW_SOURCE_DIR is not set or the directory is inaccessible.

Adding a New XBOW Benchmark

  1. Identify the XBEN benchmark in the validation-benchmarks repo. Read its docker-compose.yml to find the service name, internal port, and health check endpoint.
  2. Create a YAML definition at test/benchmark/definitions/xbow/xbow-<type>-<num>.yaml:
app:
  name: xbow-<type>-<num>
  type: xbow
  build_context: "${XBOW_SOURCE_DIR}/benchmarks/XBEN-<num>-24"
  service_name: <service>        # from docker-compose.yml
  internal_port: <port>          # from docker-compose.yml ports section
  wait_endpoint: "/"             # from healthcheck or "/" as default
  startup_timeout: 180s

test_cases:
  - id: "xbow-<num>-<type>-<param>"
    endpoint: "/vulnerable?param=test"
    method: GET                  # or POST
    modules: ["<module-id>"]
    vuln_types: ["<vuln-type>"]
    assertion: strict            # or soft for uncertain detections
    min_findings: 1
    scan_mode: active
    description: "Description (XBEN-<num>-24)"
  1. Run it:
make test-xbow XBOW_SOURCE_DIR=/path/to/validation-benchmarks
No Go code changes are needed — the test runner automatically picks up new YAML files.

Troubleshooting

Container startup fails

Failed to start container vulnerables/web-dvwa:latest: ...
  • Ensure Docker is running: docker info
  • Pull the image manually: docker pull vulnerables/web-dvwa:latest
  • Check available disk space and memory

Module not found

active modules not found: [active-nonexistent-module]
  • Verify the module ID in pkg/modules/default_registry.go
  • Module IDs are case-sensitive and use kebab-case (e.g., active-sqli-error-based)

crAPI tests skip

crAPI not available (run 'make crapi-up' first)
  • Start crAPI manually: make crapi-up
  • Wait for all services: make crapi-status (all should show “healthy”)
  • crAPI takes 2-3 minutes to fully start

XBOW tests skip

XBOW_SOURCE_DIR not set; skipping xbow benchmarks
  • Set the environment variable: export XBOW_SOURCE_DIR=/path/to/validation-benchmarks
  • Or pass it via make: make test-xbow XBOW_SOURCE_DIR=/path/to/validation-benchmarks

XBOW build fails

xbow app xbow-ssti-053: build failed: ...
  • Ensure Docker is running and has sufficient resources (CPU, memory, disk)
  • Try building the image directly: cd $XBOW_SOURCE_DIR/benchmarks/XBEN-053-24 && docker compose build --build-arg FLAG=test
  • Pre-build all images with make xbow-build to isolate build issues from test failures

XBOW port discovery fails

xbow app xbow-ssti-053: port discovery failed: no port mapping found
  • The docker-compose service may not have started. Check: docker compose -p xbow-xbow-ssti-053 ps
  • Verify the service_name and internal_port in the YAML match the docker-compose.yml
  • Some benchmarks (XBEN-083-24, XBEN-071-24) have database services that need time to initialize. Increase startup_timeout if needed.

Blackbox tests skip

External site http://testphp.vulnweb.com is unreachable
  • Check internet connectivity
  • The site may be temporarily down — blackbox tests are designed to gracefully skip

ast-grep compatibility issue

Skipping: ast-grep ast-grep-config.yaml compatibility issue (known)
  • Some ast-grep versions (v0.41.0+) treat ast-grep-config.yaml as a rule file, causing a parse error. Extraction tests skip gracefully. Other SAST layers (SARIF, handoff) are unaffected.

No findings from a strict-asserted test

If a test that should find vulnerabilities returns 0 findings:
  1. Run the test with verbose logging: go test -v -tags=canary -run TestName ...
  2. Check if the endpoint is accessible from the container
  3. For DVWA: ensure auth setup completed successfully (look for “DVWA setup: verified access to vulnerability pages” in logs). Without auth, all /vulnerabilities/ endpoints redirect to /login.php.
  4. Check that the module’s ScanScope matches how the endpoint should be scanned:
    • PerInsertionPoint modules (SQLi, LFI) require URL parameters to create insertion points
    • PerRequest modules (XSS, code-exec) handle insertion points internally
    • PerHost modules (CORS) are called once per unique host
  5. Try the TestDebug_DirectVsHarness test to compare direct module invocation:
    go test -v -tags=canary -run TestDebug ./test/benchmark/whitebox/...
    
  6. Consider changing the assertion to soft if the detection is unreliable

Hard-to-Benchmark Modules

Some modules require specialized infrastructure that is impractical for automated benchmarks:
ModuleReasonWorkaround
active-http-request-smugglingRequires specific server configurations (CL.TE, TE.CL)Manual testing with custom servers
active-race-interferenceNeeds concurrent request handling with precise timingDedicated race condition test harness
active-xml-saml-securityNeeds SAML IdP setupTest against SAML-vulnerable test apps
passive-anomaly-rankingNeeds large traffic corpus for statistical analysisReplay captured traffic
passive-oauth-facebook-detectNeeds Facebook OAuth flowMock OAuth server
passive-serialized-object-detectNeeds apps with Java/.NET serializationCustom test server
These modules should be tested through dedicated test files rather than the YAML-driven benchmark system.