Table of Contents
- Overview
- Architecture
- Directory Structure
- Test Layers
- Running Benchmarks
- YAML Definition Format
- Source Stubs
- SARIF Fixtures
- Adding New Benchmarks
- Harness Reference
- CI Integration
- Troubleshooting
Overview
The whitebox SAST benchmarks validate the three layers of the source-aware scanning pipeline:| Layer | What It Tests | Key Package |
|---|---|---|
| 1. Route Extraction | ast-grep scans framework source code and extracts routes | pkg/toolexec/astgrep/ |
| 2. SARIF Parsing | Semgrep, OSV-Scanner, and generic SARIF outputs parse into findings | pkg/toolexec/sourcetools/ |
| 3. SAST→DAST Handoff | Extracted routes convert into HttpRequestResponse with insertion points | pkg/httpmsg/ |
| 4. E2E Pipeline | Full chain from source stub to scannable insertion points | All of the above |
Relationship to Other Benchmarks
| Benchmark | Build Tag | Tests | Requirements |
|---|---|---|---|
| Whitebox (DAST) | canary | Active/passive modules against Docker apps | Docker |
| XBOW | xbow | DAST modules against CTF-style apps | Docker + XBOW_SOURCE_DIR |
| Blackbox | blackbox | DAST modules against external sites | Internet |
| SAST (Layers 1-3) | sast | Route extraction, SARIF parsing, handoff | ast-grep binary (Layer 1 only) |
| SAST E2E (Layer 4) | sast_e2e | Full source-to-scan pipeline | ast-grep binary |
Architecture
Key Design Decisions
- YAML-driven: All test expectations are declared in YAML files. Adding a new framework or fixture is a YAML addition, not a code change.
- No external dependencies for Layers 2-3: SARIF fixtures are static JSON files; handoff tests build raw HTTP requests from route definitions. Only Layer 1 requires the ast-grep binary.
- Graceful degradation: If ast-grep has compatibility issues (e.g., newer versions changing config format), extraction tests skip with a clear message rather than failing the suite.
- Fixture-based SARIF testing: Static
.sariffiles enable deterministic testing without running semgrep or osv-scanner. - Separation of concerns: Each layer tests exactly one component. Layer 4 (E2E) ties them together.
Directory Structure
Test Layers
Layer 1: Route Extraction
Tests the ast-grep scanner’s ability to extract HTTP routes from framework source code. What it validates:astgrep.NewScanner()andScanDirWithFramework()produce matchesastgrep.MatchesToRoutes()converts matches to structuredRoutestructs- Match counts fall within expected bounds
- Specific routes appear with correct method, path, file, and params
- Negative routes (that should NOT appear) are absent
astgrep.DetectFramework()correctly identifies frameworks from manifest files
| Framework | Source Stub | Routes | Detection |
|---|---|---|---|
| Gin (Go) | sast-stubs/gin/ | 12 routes (CRUD, groups, Any) | go.mod → gin-gonic/gin |
| FastAPI (Python) | sast-stubs/fastapi/ | 11 routes (Path, Query, Body) | requirements.txt → fastapi |
| Express (JS) | sast-stubs/express/ | 8 routes (router, groups, all) | package.json → express |
| Django (Python) | sast-stubs/django/ | 9 URL patterns + class views | manage.py → django |
| Flask (Python) | sast-stubs/flask/ | 7 routes (decorators, add_url_rule) | requirements.txt → flask |
| Next.js (TS) | sast-stubs/nextjs/ | 3+ handlers (App/Pages Router) | package.json → next |
| Next.js OopsSec (TS) | sast-stubs/nextjs-oopssec/ | 15+ handlers (dynamic routes, middleware) | package.json → next |
| Next.js VulnExamples (TS) | sast-stubs/nextjs-vulnexamples/ | 9+ handlers (auth, authz, XSS, secrets) | package.json → next |
| Go HTTP (Go) | sast-stubs/gohttp/ | 3 routes (HandleFunc) | No manifest marker |
| Function | Description |
|---|---|
TestExtraction_All | Runs all extraction definitions from YAML |
TestExtraction_Gin_Routes | Gin-specific extraction |
TestExtraction_FastAPI_Routes | FastAPI-specific extraction |
TestExtraction_Express_Routes | Express-specific extraction |
TestExtraction_Django_Routes | Django-specific extraction |
TestExtraction_Flask_Routes | Flask-specific extraction |
TestExtraction_NextJS_Routes | Next.js-specific extraction |
TestExtraction_NextJS_Oopssec_Routes | Next.js OopsSec extraction |
TestExtraction_NextJS_VulnExamples_Routes | Next.js VulnExamples extraction |
TestExtraction_GoHTTP_Routes | Go net/http-specific extraction |
TestExtraction_DetectFramework | Framework detection from manifest files |
Layer 2: SARIF Parsing
Tests the SARIF/JSON parser’s ability to correctly extract findings from tool output. What it validates:sourcetools.ParseSARIF()parses standard SARIF formatsourcetools.ParseSemgrepOutput()parses semgrep’s native JSONsourcetools.ParseTrivyOutput()parses osv-scanner’s native JSON- Finding counts match expectations
- Individual findings have correct rule_id, severity, file_path, start_line
- Severity distributions match (e.g., 2 high, 3 medium, 1 low)
- Empty fixtures return 0 findings without error
- Malformed JSON returns appropriate errors
- SARIF level mapping: error→high, warning→medium, note→low, none→info
sourcetools.ToFinding()produces correct ModuleID, Tags, MatchedAt, FindingHash
| Fixture | Tool | Findings | What It Tests |
|---|---|---|---|
semgrep-normal.sarif | semgrep | 4 | Standard parsing with mixed severity |
semgrep-empty.sarif | semgrep | 0 | Empty result handling |
semgrep-multirule.sarif | semgrep | 11 | Multiple rules (sqli, xss, ssrf, idor, crypto) |
semgrep-nextjs-vulnexamples.sarif | semgrep | 6 | Next.js auth/authz/secrets/XSS (4 high, 2 medium) |
trivy-normal.sarif | osv-scanner | 3 | Vuln + secret + misconfig |
trivy-empty.sarif | osv-scanner | 0 | Empty result handling |
trivy-multirule.sarif | osv-scanner | 5 | All osv-scanner categories |
sarif-malformed-1.json | any | N/A | Missing “runs” key → 0 findings |
sarif-malformed-2.json | any | N/A | Invalid JSON → parse error |
sarif-severity-mapping.sarif | test | 4 | Level mapping validation |
| Function | Description |
|---|---|
TestSARIF_All | Runs all SARIF definitions from YAML |
TestSARIF_Semgrep_Normal | Standard semgrep parsing |
TestSARIF_Semgrep_Multirule | Multi-rule diversity |
TestSARIF_Trivy_Normal | Standard osv-scanner parsing |
TestSARIF_Trivy_Multirule | All osv-scanner categories |
TestSARIF_EdgeCases | Severity mapping edge cases |
TestSARIF_Empty | Zero-finding fixtures |
TestSARIF_Malformed | Invalid/incomplete input |
TestSARIF_SeverityMapping | error→high, warning→medium, note→low, none→info |
TestSARIF_ToFinding | RawFinding → database.Finding conversion |
Layer 3: SAST→DAST Handoff
Tests the conversion of extracted routes into scannable HTTP requests with insertion points. What it validates:- Routes convert to valid
HttpRequestResponseviahttpmsg.ParseRawRequest() - Method normalization:
ANY/HANDLE/ALL/empty →GET - Empty-path routes are correctly skipped
- URL parameters produce
InsertionPointobjects viahttpmsg.CreateAllInsertionPoints() - HTTP method, URI path, and Host header match expectations
| Function | Description |
|---|---|
TestHandoff_All | Runs all handoff definitions (gin, fastapi, express, nextjs-oopssec, nextjs-vulnexamples) |
TestHandoff_MethodNormalization | ANY/HANDLE/ALL/"" → GET, standard methods unchanged |
TestHandoff_EmptyPathSkipped | Empty path routes produce skip |
TestHandoff_InsertionPointCreation | URL params → insertion points with correct names and types |
Layer 4: E2E Pipeline
Tests the complete chain from source code to scannable insertion points. What it validates:- Source stubs → ast-grep scan → route extraction → HRR construction → insertion point creation
- At least one route per framework successfully produces insertion points
| Function | Description |
|---|---|
TestSAST_E2E_Extraction_To_Scan | Full pipeline for gin, fastapi, express |
Running Benchmarks
Make Targets
| Command | Layers | Timeout | Requirements |
|---|---|---|---|
make test-sast | 1 + 2 + 3 | 10 min | ast-grep binary (Layer 1) |
make test-sast-extraction | 1 only | 10 min | ast-grep binary |
make test-sast-sarif | 2 only | 5 min | None |
make test-sast-handoff | 3 only | 5 min | None |
make test-sast-e2e | 4 | 15 min | ast-grep binary |
Running Individual Tests
Example Output
YAML Definition Format
Extraction Definition
Each file indefinitions/whitebox/extraction/ describes expected routes for one framework:
SARIF Definition
Each file indefinitions/whitebox/sarif/ declares expectations for a fixture:
Handoff Definition
Each file indefinitions/whitebox/handoff/ describes route-to-request conversion:
Source Stubs
Source stubs are minimal, syntactically valid framework code intest/testdata/sast-stubs/. Each stub exercises key patterns that ast-grep rules should detect:
| Pattern | Example | Why It Matters |
|---|---|---|
| Basic CRUD routes | r.GET("/users", h) | Baseline extraction |
| Path parameters | /users/:id, /users/{user_id} | Param type detection |
| Route groups/prefixes | r.Group("/api/v2") | Path concatenation |
| Multiple methods | r.Any("/health") | Method expansion |
| Query/body params | c.Query("q"), Body(...) | Parameter binding |
| Framework detection | go.mod, requirements.txt, package.json | DetectFramework() validation |
| Class-based views | Django ViewSet | Method extraction from classes |
| Decorator patterns | @app.get("/path") | Python route decorators |
| Export patterns | export async function GET() | Next.js App Router |
go.mod, requirements.txt, or package.json) that DetectFramework() uses to identify the framework. The exception is gohttp/ — Go’s net/http has no framework-specific dependency marker, so detect_framework: false is set in its YAML.
SARIF Fixtures
SARIF fixtures intest/testdata/sast-sarif/ are static JSON files that simulate real tool output. They test the parser without requiring semgrep or osv-scanner to be installed.
Fixture Design
Normal fixtures contain realistic findings with proper SARIF structure:sarif-malformed-1.json: Valid JSON but missing the"runs"keysarif-malformed-2.json: Invalid JSON (parse should error)
sarif-severity-mapping.sarif): One finding per SARIF level, validating:
| SARIF Level | Vigolium Severity |
|---|---|
error | high |
warning | medium |
note | low |
none | info |
Adding New Benchmarks
Adding a New Framework (Layer 1)
- Create a source stub at
test/testdata/sast-stubs/<framework>/:
pom.xml, build.gradle) for DetectFramework().
- Create a YAML definition at
test/benchmark/definitions/whitebox/extraction/<framework>-extraction.yaml:
-
No Go code changes needed —
TestExtraction_Allautomatically picks up new YAML files. - Run it:
Adding a New SARIF Fixture (Layer 2)
-
Create the fixture at
test/testdata/sast-sarif/<name>.sarif. -
Create a definition at
test/benchmark/definitions/whitebox/sarif/<name>.yaml:
- Run it:
Adding a New Handoff Test (Layer 3)
- Create a definition at
test/benchmark/definitions/whitebox/handoff/<framework>-handoff.yaml:
- Run it:
Harness Reference
SAST Types
| Type | Description |
|---|---|
SASTExtractionDefinition | Route extraction test: framework, source_dir, expected routes, bounds |
ExpectedRoute | Route expectation: method, path, file, params, assertion mode |
MatchCountBounds | Min/max bounds for ast-grep match counts |
SASTSARIFDefinition | SARIF parsing test: fixture path, tool name, format, expectations |
SARIFExpectation | Expected results: finding count, error flag, specific findings, severity distribution |
ExpectedFinding | Finding expectation: rule_id, severity, file_path, start_line |
SASTHandoffDefinition | Handoff test: framework, base URL, routes with expected requests |
HandoffRoute | Route conversion expectation: method, path, params, expected request, skip flag |
ExpectedRequest | Expected HTTP request properties: method, URI, host |
Loader Functions
| Function | Description |
|---|---|
LoadSASTExtractionDefinition(path) | Load one extraction YAML (defaults assertion to “strict”) |
LoadSASTExtractionDefinitionsFromDir(dir) | Load all extraction YAMLs from directory |
LoadSASTSARIFDefinition(path) | Load one SARIF YAML (defaults format to “sarif”) |
LoadSASTSARIFDefinitionsFromDir(dir) | Load all SARIF YAMLs from directory |
LoadSASTHandoffDefinition(path) | Load one handoff YAML |
LoadSASTHandoffDefinitionsFromDir(dir) | Load all handoff YAMLs from directory |
SASTDefinitionsDir() | Returns path to definitions/whitebox/ |
Test Helpers
| Function | Description |
|---|---|
stubPath(framework) | Resolves to test/testdata/sast-stubs/{framework} |
sarifFixturePath(name) | Resolves to test/testdata/sast-sarif/{name} |
definitionsDir() | Resolves to test/benchmark/definitions/whitebox |
findRoute(routes, method, path) | Search routes by method and path |
findFinding(findings, ruleID) | Search findings by rule ID |
buildSeverityDistribution(findings) | Count findings per severity level |
normalizeMethod(method) | ANY/HANDLE/ALL/"" → GET |
shouldSkipRoute(route) | True if route has empty path |
CI Integration
Recommended Strategy
| Trigger | What to Run | Timeout | Notes |
|---|---|---|---|
| On every PR | make test-sast-sarif && make test-sast-handoff | 5 min | No external deps |
| On every PR | make test-sast | 10 min | Requires ast-grep binary |
| Nightly | make test-sast-e2e | 15 min | Full pipeline validation |
Example GitHub Actions Workflow
Troubleshooting
ast-grep binary not found
ast-grep in PATH or downloads it. Ensure internet access is available, or pre-install:
ast-grep-config.yaml compatibility issue
ast-grep-config.yaml in the rules directory as a rule file, causing a parse error. This is a known issue with the embedded rule extraction in pkg/toolexec/astgrep/rules.go. Extraction tests skip gracefully when this occurs. The DetectFramework tests always pass regardless, since they do not invoke the scanner.
SARIF fixture not found
fixture field in the YAML definition matches the filename exactly (including extension).
No insertion points created
IfTestHandoff_InsertionPointCreation fails with zero insertion points, check that the raw HTTP request contains query parameters. The CreateAllInsertionPoints() function requires actual parameters in the request line to generate URL parameter insertion points.
Definition YAML parse errors
- Indentation must use spaces, not tabs
- Strings with special characters (
{,},:) should be quoted - Boolean values (
true/false) are case-sensitive in YAML
