Table of Contents
- Overview
- Target Applications
- Architecture
- Directory Structure
- Prerequisites
- Running Benchmarks
- YAML Definition Format
- Adding New Test Cases
- Adding a New Vulnerable App
- Adding a New Blackbox Site
- Coverage Report
- Assertion Modes
- Harness Package Reference
- CI Integration
- XBOW Validation Benchmarks
- Troubleshooting
Overview
The benchmark system validates that Vigolium’s active and passive scanner modules detect known vulnerabilities in controlled environments. It uses a data-driven approach: YAML files define target applications, endpoints, expected modules, and assertions. A shared Go test harness loads these definitions and drives execution.Test Categories
| Category | Build Tag | Targets | Assertions | Requirements |
|---|---|---|---|---|
| Whitebox | canary | Docker containers (DVWA, VAmPI, Juice Shop, OopsSec Store, NextJS VulnExamples, etc.) | Strict + Soft | Docker |
| crAPI | canary | Docker Compose (crAPI — 10 services) | Soft | Docker + make crapi-up |
| XBOW | xbow | CTF-style benchmarks built from source (XSS, SSTI, SQLi, LFI, CmdI, SSRF, XXE) | Strict + Soft | Docker + XBOW_SOURCE_DIR |
| Blackbox | blackbox | External demo sites (Acunetix, PortSwigger, IBM) | Soft only | Internet |
| SAST | sast | Static analysis pipeline (route extraction, SARIF parsing, handoff) | Strict + Soft | ast-grep binary (Layer 1 only) |
| Coverage | canary | None (analyzes YAML definitions) | N/A | None |
Relationship to Existing Tests
The benchmark system complements the existing test tiers:| Tier | Location | Purpose |
|---|---|---|
| Unit tests | pkg/*/ | Fast, isolated function-level tests |
| E2E tests | test/e2e/ | HTTP client, server, and pipeline integration |
| Canary tests | test/e2e/ | Original per-app vulnerability detection tests |
| Benchmark tests | test/benchmark/ | Data-driven module coverage validation (whitebox, blackbox, xbow, SAST) |
| Integration tests | test/benchmark/xss_scanner/ | Brutelogic XSS gym (external) |
test/e2e/ (dvwa_test.go, vampi_test.go, juiceshop_test.go) by providing the same coverage through YAML definitions with less boilerplate.
Target Applications
Vigolium benchmarks against a diverse set of intentionally vulnerable applications. Each application covers different vulnerability categories, tech stacks, and scanning approaches (DAST, SAST, or both).DAST Targets (Docker-based)
| Application | Tech Stack | Vulnerability Categories | Docker Source | Port |
|---|---|---|---|---|
| DVWA | PHP / MySQL | SQLi, XSS (reflected + DOM), LFI, Command Injection, CRLF, CSRF | vulnerables/web-dvwa:latest | 80 |
| VAmPI | Python / Flask | SQLi, NoSQLi, CORS, JWT, Mass Assignment, Info Disclosure | erev0s/vampi:latest | 5000 |
| Juice Shop | Node.js / Angular | SQLi, XSS, Swagger exposure, JWT, CSRF, Info Disclosure | bkimminich/juice-shop:latest | 3000 |
| crAPI | Go + Python + Node.js microservices | OWASP API Top 10 (BOLA, BFLA, Mass Assignment, SSRF, SQLi, NoSQLi) | Docker Compose (10 services) | 8888 |
| OopsSec Store | Next.js / SQLite | SQLi, XSS, SSRF, LFI, XXE, IDOR, CORS, CSRF, Open Redirect, File Upload | Built from source | 3000 |
| NextJS VulnExamples | Next.js / PostgreSQL | Missing Authentication, Missing Authorization, Secrets Exposure, Stored XSS | Built from source | 3000 |
| Vulnerable Java | Java / Spring | SQLi, XSS, SSRF, Path Traversal | Docker image | 8080 |
| Vulnerable Nginx | Nginx | Misconfigurations, Path Traversal, CRLF, Header Injection | Docker image | 80 |
NextJS VulnExamples — Detailed Breakdown
Source: upleveled/security-vulnerability-examples-next-js-postgres This application is an educational project demonstrating six categories of security flaws in a Next.js + PostgreSQL stack. It provides both vulnerable implementations and secure solutions for each category, making it valuable for both positive detection and negative (false positive) testing.| Example | Vulnerability | Vulnerable Route | Type | What’s Wrong |
|---|---|---|---|---|
| 1 | Missing Authentication | GET /api/example-1-.../vulnerable | Route Handler | No session token check — returns blog posts to anyone |
| 2 | Missing Authentication | GET /example-2-.../vulnerable | Server Component | No session check — queries DB and renders directly |
| 3 | Missing Authorization | GET /api/example-3-.../vulnerable | Route Handler | Checks auth but returns ALL users’ unpublished posts |
| 4 | Missing Authorization | GET /example-4-.../vulnerable | Server Component | No auth + returns all users’ data |
| 5 | Secrets Exposure | GET /example-5-.../vulnerable | Server Component | Leaks process.env.API_KEY and password hashes to client |
| 6 | Stored XSS | GET /example-6-.../vulnerable | Server Component | dangerouslySetInnerHTML with <img onerror="alert('pwned')"> |
SAST Targets (Static Analysis)
The SAST benchmark suite validates the source-aware scanning pipeline using source stubs — minimal, syntactically valid framework code that exercises key patterns. See whitebox-sast for full details.| Framework | Source Stub | Routes | Key Patterns |
|---|---|---|---|
| Gin (Go) | sast-stubs/gin/ | ~12 routes | CRUD, groups, Any, path params |
| FastAPI (Python) | sast-stubs/fastapi/ | ~11 routes | Path, Query, Body params |
| Express (JS) | sast-stubs/express/ | ~8 routes | Router, groups, all |
| Django (Python) | sast-stubs/django/ | ~9 routes | URL patterns, class views |
| Flask (Python) | sast-stubs/flask/ | ~7 routes | Decorators, add_url_rule |
| Next.js (TS) | sast-stubs/nextjs/ | 3+ handlers | App Router, Pages Router |
| Next.js OopsSec (TS) | sast-stubs/nextjs-oopssec/ | 15+ handlers | Dynamic routes, middleware, body parsing, header extraction |
| Next.js VulnExamples (TS) | sast-stubs/nextjs-vulnexamples/ | 9+ handlers | Missing auth/authz, secrets exposure, XSS, solution variants |
| Go HTTP (Go) | sast-stubs/gohttp/ | 3 routes | HandleFunc |
semgrep-nextjs-vulnexamples.sarif) covers 6 findings across 4 vulnerability categories:
| Finding | Severity | File | Category |
|---|---|---|---|
dangerouslySetInnerHTML with unsanitized DB content | Medium | example-6-cross-site-scripting/vulnerable/page.tsx | XSS |
| Route handler missing authentication check | High | example-1-missing-authentication/vulnerable/route.ts | Missing AuthN |
| Data access without authorization scoping | High | example-3-missing-authorization/vulnerable/route.ts | Missing AuthZ |
process.env.API_KEY passed to client component | High | example-5-secrets-exposure/vulnerable/page.tsx | Secrets Exposure |
getUsersWithPasswordHash() result sent to client | High | example-5-secrets-exposure/vulnerable/page.tsx | Secrets Exposure |
SELECT * on users table exposes password hashes | Medium | database/users.ts | Data Exposure |
Blackbox Targets (External Sites)
| Site | URL | Vulnerability Categories |
|---|---|---|
| Acunetix TestPHP | testphp.vulnweb.com | SQLi, XSS, LFI, Directory Traversal |
| Gin & Juice Shop | ginandjuice.shop (PortSwigger) | SQLi, XSS, SSTI, SSRF, Access Control |
| IBM Testfire | demo.testfire.net | SQLi, XSS, Authentication Bypass |
XBOW Targets (CTF-style)
13 self-contained vulnerable applications from the validation-benchmarks repository:| Vuln Type | Count | Benchmarks |
|---|---|---|
| XSS | 2 | XBEN-013-24, XBEN-047-24 |
| SSTI | 3 | XBEN-009-24, XBEN-053-24, XBEN-076-24 |
| SQLi | 2 | XBEN-083-24, XBEN-071-24 |
| LFI | 2 | XBEN-019-24, XBEN-061-24 |
| Command Injection | 1 | XBEN-073-24 |
| SSRF | 1 | XBEN-020-24 |
| XXE | 2 | XBEN-006-24, XBEN-096-24 |
Architecture
Key Design Decisions
- YAML-driven: Test cases are defined in YAML, not Go code. Adding a new test case is a one-line YAML addition.
- Module resolution by ID: Test cases reference modules by their registry ID (e.g.,
active-sqli-error-based). The harness resolves them frommodules.DefaultRegistry. - Scan type dispatch: The harness checks
module.ScanScopes()to dispatch to the correct method:ScanScopeInsertionPoint: Creates insertion points viahttpmsg.CreateAllInsertionPoints(), filters byAllowedInsertionPointTypes(), callsScanPerInsertionPoint()for each.ScanScopeRequest: CallsScanPerRequest()once with the full request.ScanScopeHost: CallsScanPerHost()once.
- Passive fetch-then-scan: Passive tests fetch the URL first using the HTTP client, attach the raw response to the
HttpRequestResponse, then pass it to the passive module. - App-specific auth: Some apps (e.g., DVWA) require authentication before vulnerability pages work. The
SetupAppAuth()function dispatches per-app setup (DB init, CSRF token extraction, login) and returns headers (cookies) to inject into all test cases. - Network init safety:
network.Init()is called once per process viasync.Onceto avoid LevelDB close/reopen issues when running multiple test functions sequentially.
Directory Structure
Prerequisites
Whitebox Tests
- Docker: Required for testcontainers-go to start vulnerable app containers
- Docker images: Pulled automatically on first run
vulnerables/web-dvwa:latesterev0s/vampi:latestbkimminich/juice-shop:latest
- Docker Compose: Required for apps that build from source (OopsSec Store, NextJS VulnExamples)
crAPI Tests
- Docker Compose: crAPI requires 10 services (PostgreSQL, MongoDB, multiple microservices)
- Manual startup: crAPI must be started before running tests:
SAST Tests
- ast-grep binary: Required for Layer 1 (route extraction). Auto-downloaded on first run, or install manually:
- No Docker needed: Layers 2-3 use static fixture data only
XBOW Validation Benchmarks
- Docker with Compose: Required to build and run benchmark containers
- XBOW source directory: The
validation-benchmarksrepository checked out locally XBOW_SOURCE_DIRenvironment variable: Must point to the root of the validation-benchmarks checkout (e.g.,/path/to/validation-benchmarks). Set via environment or passed through the Makefile.- Disk space: Each benchmark builds a Docker image from source. Pre-build with
make xbow-buildto cache layers.
Blackbox Tests
- Internet connectivity: Required to reach external demo sites
- No Docker needed: Tests run against public websites
Running Benchmarks
Make Targets
| Command | What it runs | Requirements |
|---|---|---|
make test-benchmark-whitebox | DVWA, VAmPI, Juice Shop, OopsSec, VulnExamples (Docker) | Docker |
make test-benchmark-blackbox | Acunetix, Gin&Juice, Testfire (external) | Internet |
make test-benchmark-all | All whitebox + blackbox | Docker + Internet |
make test-benchmark-crapi | crAPI only | Docker + make crapi-up |
make test-benchmark-coverage | Generate coverage report | None |
make test-sast | SAST Layers 1-3 (extraction + SARIF + handoff) | ast-grep binary |
make test-sast-extraction | Layer 1 only | ast-grep binary |
make test-sast-sarif | Layer 2 only | None |
make test-sast-handoff | Layer 3 only | None |
make test-sast-e2e | Layer 4 (full pipeline) | ast-grep binary |
make test-xbow | All 13 XBOW validation benchmarks | Docker + XBOW_SOURCE_DIR |
make test-xbow-ssti | 3 SSTI benchmarks | Docker + XBOW_SOURCE_DIR |
make test-xbow-xss | 2 XSS benchmarks | Docker + XBOW_SOURCE_DIR |
make test-xbow-sqli | 2 SQLi benchmarks | Docker + XBOW_SOURCE_DIR |
make test-xbow-lfi | 2 LFI benchmarks | Docker + XBOW_SOURCE_DIR |
make test-xbow-cmdi | 1 CmdI benchmark | Docker + XBOW_SOURCE_DIR |
make test-xbow-ssrf | 1 SSRF benchmark | Docker + XBOW_SOURCE_DIR |
make test-xbow-xxe | 2 XXE benchmarks | Docker + XBOW_SOURCE_DIR |
make xbow-build | Pre-build all XBOW Docker images | Docker + XBOW_SOURCE_DIR |
Running Individual App Benchmarks
Running SAST Benchmarks
Running XBOW Validation Benchmarks
Running Individual Blackbox Benchmarks
Running All Benchmarks for a Specific Module
To check a specific module’s detection across all apps, filter by test case ID:YAML Definition Format
Each YAML file describes one target application and its test cases.Full Schema
App Types
| Type | Container Management | Base URL |
|---|---|---|
docker | Testcontainers-go starts/stops container automatically | Auto-assigned (mapped port) |
compose | Must be started externally (make crapi-up) | Specified in base_url |
xbow | Docker Compose CLI builds from source, starts/stops automatically | Auto-discovered via docker compose port |
external | No containers — uses public websites | Specified in base_url |
Scan Modes
| Mode | What Happens |
|---|---|
active | Harness creates HttpRequestResponse from URL, resolves active module, calls ScanPerRequest/ScanPerHost |
passive | Harness fetches URL with HTTP client first (to get actual response), then passes full request+response to passive module’s ScanPerRequest/ScanPerHost |
Adding New Test Cases
The simplest way to expand coverage is to add test cases to existing YAML definitions.Example: Add a new SQLi test to DVWA
Edittest/benchmark/definitions/dvwa.yaml:
Example: Add a passive module check
Guidelines
- Use
strictassertion only when you’re confident the module will detect the vulnerability (e.g., DVWA SQLi with error-based detection). - Use
softassertion for new/experimental test cases or apps with protections that may block detection. - Use
negativeassertion for endpoints that should NOT trigger findings (false positive testing). - Module IDs must match exactly what’s registered in
pkg/modules/default_registry.go. Rungo test -v -run TestResolveActiveModules ./test/benchmark/harness/...to verify module resolution.
Finding Available Module IDs
Adding a New Vulnerable App
Docker-based (Whitebox)
- Create a YAML definition at
test/benchmark/definitions/<app>.yaml:
-
No Go code changes needed — the existing
TestWhitebox_Activerunner automatically picks up new YAML files from the definitions directory. - Run it:
Docker Compose-based (Built from Source)
For apps that need to be built from source or require multiple services (like NextJS VulnExamples):-
Place the Docker Compose file in
test/testdata/vulnerable-apps/<app>/docker-compose.yaml. Example (nextjs-vulnexamples/docker-compose.yaml): -
Create a YAML definition with
type: xbow: -
Add a dedicated test function in
test/benchmark/whitebox/active_test.go: - Optionally add SAST coverage — create a source stub and YAML definitions for the whitebox SAST pipeline (see whitebox-sast).
Adding a New Blackbox Site
- Create a YAML definition at
test/benchmark/definitions/blackbox/<site>.yaml:
-
Important rules for blackbox definitions:
- All assertions must be
soft— external sites may change, go down, or add protections. - Set
rate_limitto be respectful (2 req/sec is a good default). - The test runner automatically skips if the site is unreachable.
- All assertions must be
- Run it:
Coverage Report
The coverage report compares all module IDs referenced in YAML definitions against the fullDefaultRegistry.
Generate the Report
test/benchmark/coverage-report.md.
Sample Output
Understanding Coverage
- Covered means at least one YAML test case references the module ID.
- Coverage does NOT mean detection — a soft-asserted test case counts as covered even if the module finds nothing.
- Some modules are intentionally deferred (see Hard-to-Benchmark Modules).
Assertion Modes
| Mode | Behavior | Use Case |
|---|---|---|
strict | Test fails if len(findings) < min_findings | Known vulnerabilities in controlled Docker apps |
soft | Logs a warning but test passes regardless | Experimental tests, blackbox sites, modern apps with protections |
negative | Test fails if len(findings) > 0 | False positive testing — endpoints that should NOT trigger |
Choosing the Right Assertion
Harness Package Reference
Thetest/benchmark/harness/ package is a shared Go library (not a test package) that provides the core benchmark infrastructure.
Key Types
| Type | Description |
|---|---|
BenchmarkDefinition | Root struct parsed from YAML — contains AppConfig and []TestCase |
AppConfig | Target application configuration (image, port, type, env, build_context, service_name, internal_port) |
ComposeApp | Running Docker Compose project (project name, directory, base URL) |
TestCase | Single test case (endpoint, modules, assertion, scan mode) |
TestInfra | Test infrastructure (HTTP client, host errors, rate limiter, scan context) |
TestResult | Outcome of a single test case execution |
CoverageReport | Module coverage matrix |
SASTExtractionDefinition | Route extraction test: framework, source_dir, expected routes, bounds |
SASTSARIFDefinition | SARIF parsing test: fixture path, tool name, format, expectations |
SASTHandoffDefinition | Handoff test: framework, base URL, routes with expected requests |
Key Functions
| Function | Description |
|---|---|
LoadDefinition(path) | Load a single YAML definition file |
LoadDefinitionsFromDir(dir) | Load all YAML files from a directory |
SetupTestInfra() | Initialize HTTP client, rate limiter, scan context |
SetupTestInfraWithOAST() | Initialize with OAST mock provider |
StartContainer(ctx, config) | Start a Docker container via testcontainers-go |
StartAppFromDefinition(ctx, app) | Start an app based on its type (docker/compose/xbow/external) |
StartComposeApp(ctx, app) | Build and start an xbow Docker Compose project from source |
RunActiveTestCase(t, tc, baseURL, infra) | Execute an active test case |
RunPassiveTestCase(t, tc, baseURL, infra) | Execute a passive test case (fetch + scan) |
FetchForPassiveScan(url, headers, infra) | Fetch a URL and return HRR with response attached |
SetupAppAuth(t, appName, baseURL) | Perform app-specific auth/setup, return headers to inject |
MergeHeaders(authHeaders, tcHeaders) | Merge auth headers into test case headers |
SetupDVWA(t, baseURL) | Initialize DVWA DB, login, return session cookies |
ResolveActiveModules(ids) | Look up active modules from DefaultRegistry |
ResolvePassiveModules(ids) | Look up passive modules from DefaultRegistry |
ApplyAssertion(t, tc, moduleID, findings) | Check findings against assertion mode |
GenerateCoverageReport(dirs...) | Generate module coverage matrix |
FormatCoverageMarkdown(report) | Render coverage report as markdown |
CheckExternalAvailability(t, url) | Skip test if external site is unreachable |
LoadSASTExtractionDefinitionsFromDir(dir) | Load all extraction YAMLs |
LoadSASTSARIFDefinitionsFromDir(dir) | Load all SARIF YAMLs |
LoadSASTHandoffDefinitionsFromDir(dir) | Load all handoff YAMLs |
Active Test Case Flow
Passive Test Case Flow
CI Integration
Recommended CI Strategy
| Trigger | What to Run | Timeout |
|---|---|---|
| On every PR | make test-sast-sarif && make test-sast-handoff | 5 min |
| On every PR | make test-benchmark-whitebox (DVWA + VAmPI only) | 15 min |
| Nightly | make test-benchmark-all (all whitebox + blackbox) | 30 min |
| Nightly | make test-sast (all SAST layers) | 10 min |
| Weekly | make test-benchmark-crapi (requires crAPI up) | 20 min |
| Weekly | make test-xbow (requires XBOW_SOURCE_DIR) | 30 min |
| Weekly | make test-sast-e2e (full pipeline) | 15 min |
| On release | make test-benchmark-coverage | 1 min |
Example GitHub Actions Workflow
XBOW Validation Benchmarks
XBOW benchmarks are CTF-style, self-contained vulnerable applications sourced from the validation-benchmarks repository. Each benchmark is a Docker Compose project with its own source code, Dockerfile, and docker-compose.yml. Unlike the whitebox benchmarks (which pull pre-built images from Docker Hub), XBOW apps are built from source viadocker compose build.
How It Works
Thexbow app type in the harness triggers a Docker Compose CLI lifecycle managed by harness/compose.go:
- Build:
docker compose -f <dir>/docker-compose.yml -p xbow-<name> build --build-arg FLAG=test - Start:
docker compose ... up -d --wait - Port discovery:
docker compose ... port <service> <internal_port>— parses the ephemeral host port - Health check: Polls
wait_endpointuntil it responds with status < 500 - Test execution: Runs active test cases (GET or POST) through the standard harness
- Teardown:
docker compose ... down -v --remove-orphans
xbow-<name>) to prevent container name collisions when running in parallel.
YAML Definition Format (XBOW)
XBOW definitions use thebuild_context, service_name, and internal_port fields instead of image and port:
body and headers:
${XBOW_SOURCE_DIR} environment variable is expanded via os.ExpandEnv() when the YAML is loaded. The Makefile sets it automatically, or it can be set in your shell.
Phase 1 Benchmarks (13 total)
| ID | XBOW | Vuln Type | Endpoint | Service:Port | Module | Assertion |
|---|---|---|---|---|---|---|
| xbow-xss-013 | XBEN-013-24 | XSS | GET /page?name=test | web:5000 | active-xss-light-url-params | soft |
| xbow-xss-047 | XBEN-047-24 | XSS | GET /page?url=test | web:5000 | active-xss-light-url-params | soft |
| xbow-ssti-009 | XBEN-009-24 | SSTI | GET /greet/?name=test | web:8000 | active-reflected-ssti | strict |
| xbow-ssti-053 | XBEN-053-24 | SSTI | GET /?username=test | app:80 | active-reflected-ssti | strict |
| xbow-ssti-076 | XBEN-076-24 | SSTI | GET /greet?name=test | web:5000 | active-reflected-ssti | strict |
| xbow-sqli-083 | XBEN-083-24 | SQLi | GET /?category=test | web:80 | active-sqli-error-based | strict |
| xbow-sqli-071 | XBEN-071-24 | SQLi | POST /send.php | app:80 | active-sqli-error-based | soft |
| xbow-cmdi-073 | XBEN-073-24 | CmdI | POST /ping | web:80 | active-code-exec | strict |
| xbow-lfi-019 | XBEN-019-24 | LFI | GET /resource?filename=test.txt | lfi_static_resource:80 | active-lfi-generic | strict |
| xbow-lfi-061 | XBEN-061-24 | LFI | GET /post.php?id=test | web:80 | active-lfi-generic | strict |
| xbow-ssrf-020 | XBEN-020-24 | SSRF | GET /redirect.php?url=... | ssrf-demo-app:80 | active-ssrf-detection | soft |
| xbow-xxe-006 | XBEN-006-24 | XXE | POST /addbook (XML) | app:80 | active-xxe-generic | soft |
| xbow-xxe-096 | XBEN-096-24 | SQLi | POST /jobs (JSON) | libraryv1:5000 | active-sqli-error-based | soft |
Test Runner Structure
The test runner attest/benchmark/xbow/xbow_test.go (build tag xbow) provides:
| Test Function | What it runs |
|---|---|
TestXbow_All | All definitions in definitions/xbow/ |
TestXbow_XSS | Only xbow-xss-*.yaml |
TestXbow_SSTI | Only xbow-ssti-*.yaml |
TestXbow_SQLi | Only xbow-sqli-*.yaml |
TestXbow_CmdI | Only xbow-cmdi-*.yaml |
TestXbow_LFI | Only xbow-lfi-*.yaml |
TestXbow_SSRF | Only xbow-ssrf-*.yaml |
TestXbow_XXE | Only xbow-xxe-*.yaml |
XBOW_SOURCE_DIR is not set or the directory is inaccessible.
Adding a New XBOW Benchmark
-
Identify the XBEN benchmark in the validation-benchmarks repo. Read its
docker-compose.ymlto find the service name, internal port, and health check endpoint. -
Create a YAML definition at
test/benchmark/definitions/xbow/xbow-<type>-<num>.yaml:
- Run it:
Troubleshooting
Container startup fails
- Ensure Docker is running:
docker info - Pull the image manually:
docker pull vulnerables/web-dvwa:latest - Check available disk space and memory
Module not found
- Verify the module ID in
pkg/modules/default_registry.go - Module IDs are case-sensitive and use kebab-case (e.g.,
active-sqli-error-based)
crAPI tests skip
- Start crAPI manually:
make crapi-up - Wait for all services:
make crapi-status(all should show “healthy”) - crAPI takes 2-3 minutes to fully start
XBOW tests skip
- Set the environment variable:
export XBOW_SOURCE_DIR=/path/to/validation-benchmarks - Or pass it via make:
make test-xbow XBOW_SOURCE_DIR=/path/to/validation-benchmarks
XBOW build fails
- Ensure Docker is running and has sufficient resources (CPU, memory, disk)
- Try building the image directly:
cd $XBOW_SOURCE_DIR/benchmarks/XBEN-053-24 && docker compose build --build-arg FLAG=test - Pre-build all images with
make xbow-buildto isolate build issues from test failures
XBOW port discovery fails
- The docker-compose service may not have started. Check:
docker compose -p xbow-xbow-ssti-053 ps - Verify the
service_nameandinternal_portin the YAML match the docker-compose.yml - Some benchmarks (XBEN-083-24, XBEN-071-24) have database services that need time to initialize. Increase
startup_timeoutif needed.
Blackbox tests skip
- Check internet connectivity
- The site may be temporarily down — blackbox tests are designed to gracefully skip
ast-grep compatibility issue
- Some ast-grep versions (v0.41.0+) treat
ast-grep-config.yamlas a rule file, causing a parse error. Extraction tests skip gracefully. Other SAST layers (SARIF, handoff) are unaffected.
No findings from a strict-asserted test
If a test that should find vulnerabilities returns 0 findings:- Run the test with verbose logging:
go test -v -tags=canary -run TestName ... - Check if the endpoint is accessible from the container
- For DVWA: ensure auth setup completed successfully (look for “DVWA setup: verified access to vulnerability pages” in logs). Without auth, all
/vulnerabilities/endpoints redirect to/login.php. - Check that the module’s
ScanScopematches how the endpoint should be scanned:PerInsertionPointmodules (SQLi, LFI) require URL parameters to create insertion pointsPerRequestmodules (XSS, code-exec) handle insertion points internallyPerHostmodules (CORS) are called once per unique host
- Try the
TestDebug_DirectVsHarnesstest to compare direct module invocation: - Consider changing the assertion to
softif the detection is unreliable
Hard-to-Benchmark Modules
Some modules require specialized infrastructure that is impractical for automated benchmarks:| Module | Reason | Workaround |
|---|---|---|
active-http-request-smuggling | Requires specific server configurations (CL.TE, TE.CL) | Manual testing with custom servers |
active-race-interference | Needs concurrent request handling with precise timing | Dedicated race condition test harness |
active-xml-saml-security | Needs SAML IdP setup | Test against SAML-vulnerable test apps |
passive-anomaly-ranking | Needs large traffic corpus for statistical analysis | Replay captured traffic |
passive-oauth-facebook-detect | Needs Facebook OAuth flow | Mock OAuth server |
passive-serialized-object-detect | Needs apps with Java/.NET serialization | Custom test server |
