PR #1: Data Structures & Enrichment Layer #391

shivasurya · 2025-11-21T22:38:15Z

Implements foundational data structures and enrichment logic for output standardization. This PR introduces EnrichedDetection with resolved file paths, code snippets, and comprehensive metadata from raw detections. The enricher transforms FQN-based detections into user-friendly output with context lines and reference URLs.

The enrichment layer uses callgraph lookup with fallback heuristics for file path resolution and caches file contents for performance. Detection types are classified as pattern matches or taint flows (local/global scope).

All tests pass with 96.3% coverage for the new output package. Binary builds successfully and passes linting with zero issues.

Part of the output standardization feature.

- EnrichedDetection with location, snippet, metadata - RuleMetadata with CWE, OWASP, references - DetectionType enum (pattern, taint-local, taint-global) - OutputOptions with verbosity levels - LocationInfo and CodeSnippet structures Part of output standardization feature. Co-Authored-By: Claude <[email protected]>

- FQN to file path resolution via callgraph lookup - Fallback heuristic for FQN parsing - Code snippet extraction with configurable context - File content caching for performance - Rule metadata extraction with CWE/OWASP URLs - Taint path construction (source/sink only for v1) Part of output standardization feature. Co-Authored-By: Claude <[email protected]>

- Enricher tests: detection type, FQN parsing, snippets - File cache tests - Location and metadata tests - Coverage: 96.3% for output package Part of output standardization feature. Co-Authored-By: Claude <[email protected]>

safedep · 2025-11-21T22:38:19Z

SafeDep Report Summary

No dependency changes detected. Nothing to scan.

_{This report is generated by SafeDep Github App}

codecov · 2025-11-21T22:39:26Z

Codecov Report

❌ Patch coverage is 96.11111% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.33%. Comparing base (8f2eb7b) to head (5b86eff).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
sourcecode-parser/output/enricher.go	95.39%	4 Missing and 3 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #391      +/-   ##
==========================================
+ Coverage   78.91%   79.33%   +0.42%     
==========================================
  Files          70       73       +3     
  Lines        7123     7303     +180     
==========================================
+ Hits         5621     5794     +173     
- Misses       1263     1267       +4     
- Partials      239      242       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Add comprehensive tests for ConfidenceLevel and DetectionBadge methods. Coverage for enriched_detection.go now at 100%. Co-Authored-By: Claude <[email protected]>

## Summary Implements JSON and CSV output formatters for the `ci` command, replacing the old inline JSON generation with a modular, well-tested implementation. **Part of output-standardization tech spec (Stacked PRs)** - ✅ PR #1: Logging System Infrastructure (#391) - **Merged** - ✅ PR #2: Output Package Foundation (#392) - **In Review** - ✅ PR #3: Text Formatter for Scan Command (#393) - **In Review** - 🔄 PR #4: JSON and CSV Formatters ← **This PR** ## Changes ### New Files - `output/json_formatter.go` (235 lines) - Enhanced JSON output with rich metadata structure - Tool, scan, results, summary, and errors sections - Code snippets with configurable context lines - Taint flow source/sink information - CWE, OWASP, and reference metadata - `output/csv_formatter.go` (123 lines) - CSV output for CI/CD integration - 17 columns: severity, confidence, rule_id, rule_name, cwe, owasp, file, line, column, function, message, detection_type, detection_scope, source_line, sink_line, tainted_var, sink_call - Proper escaping via encoding/csv package - `output/json_formatter_test.go` (415 lines) - Comprehensive tests achieving 100% coverage - Structure validation, snippet handling, metadata, pattern vs taint detection - `output/csv_formatter_test.go` (395 lines) - Comprehensive tests achieving 100% coverage - Header validation, escaping, multiple rows, zero values ### Modified Files - `cmd/ci.go` - Replaced old `generateJSONOutput()` with new formatter integration - Added enrichment pipeline using `output.NewEnricher()` - Updated output format validation to include "csv" - Added CSV formatter support - Updated help text and examples - Exit code 1 when vulnerabilities found (for CI/CD) - `cmd/ci_test.go` - Skipped obsolete `TestGenerateJSONOutput` (replaced by new formatter tests) - `main_test.go` - Updated expected help text to include CSV output format ## JSON Output Structure ```json { "tool": { "name": "Code Pathfinder", "version": "1.0.0", "url": "https://codepathfinder.dev" }, "scan": { "target": "/path/to/project", "timestamp": "2025-01-21T10:30:00Z", "duration": 5.43, "rules_executed": 12 }, "results": [{ "rule_id": "sql-injection", "rule_name": "SQL Injection", "message": "Unsanitized user input flows to SQL query", "severity": "critical", "confidence": "high", "location": { "file": "src/main.py", "line": 42, "column": 8, "function": "process_user", "snippet": { "start_line": 40, "end_line": 44, "lines": ["...", "query = f\"SELECT * FROM users WHERE id={user_id}\"", "..."] } }, "detection": { "type": "taint-local", "scope": "intra-procedural", "confidence_score": 0.95, "source": {"line": 38, "variable": "user_id"}, "sink": {"line": 42, "call": "execute"} }, "metadata": { "cwe": ["CWE-89"], "owasp": ["A03:2021"], "references": ["https://..."] } }], "summary": { "total": 5, "by_severity": {"critical": 2, "high": 3}, "by_detection_type": {"taint-local": 4, "pattern": 1} }, "errors": [] } ``` ## CSV Output Format ```csv severity,confidence,rule_id,rule_name,cwe,owasp,file,line,column,function,message,detection_type,detection_scope,source_line,sink_line,tainted_var,sink_call critical,high,sql-injection,SQL Injection,CWE-89,A03:2021,src/main.py,42,8,process_user,Unsanitized user input flows to SQL query,taint-local,intra-procedural,38,42,user_id,execute ``` ## Testing - All tests passing (100% coverage for both formatters) - Output package overall: 98.1% coverage - Linting checks passed - Integration tests with ci command verified ## Usage Examples ```bash # Generate JSON report pathfinder ci --rules rules/ --project . --output json > results.json # Generate CSV report pathfinder ci --rules rules/ --project . --output csv > results.csv # Generate SARIF report (existing) pathfinder ci --rules rules/ --project . --output sarif > results.sarif ``` ## Breaking Changes - Old `generateJSONOutput()` function removed from cmd/ci.go - JSON output structure changed to new rich format (snake_case fields) - Exit code behavior unchanged (exits 1 when vulnerabilities found) ## Stack Status This PR stacks on: - **PR #3**: shiva/output-text-formatter (#393) ← base branch - **PR #2**: shiva/output-logging-system (#392) - **main**: Production branch Next PR: - PR #5: SARIF Formatter Enhancement (will stack on this PR) 🤖 Generated with [Claude Code](https://claude.com/claude-code)

## Summary Implements enhanced SARIF formatter with code flows, related locations, and rich metadata for optimal GitHub Code Scanning integration. **Part of output-standardization tech spec (Stacked PRs)** - ✅ PR #1: Logging System Infrastructure (#391) - **Merged** - ✅ PR #2: Output Package Foundation (#392) - **In Review** - ✅ PR #3: Text Formatter for Scan Command (#393) - **In Review** - ✅ PR #4: JSON and CSV Formatters (#394) - **In Review** - 🔄 PR #5: Enhanced SARIF Formatter ← **This PR** ## Changes ### New Files - `output/sarif_formatter.go` (290 lines) - SARIF 2.1.0 compliant output formatter - Code flows for taint path visualization (source → sink) - Related locations for taint sources - Help text with markdown and CWE references - Security severity scores (9.0, 7.0, 5.0, 3.0) - Rule properties: tags, precision - Deduplicates rules across multiple detections - `output/sarif_formatter_test.go` (519 lines) - Comprehensive tests achieving 97.5% coverage - Tests for version, tool metadata, rules, results - Code flow generation tests (taint-local, taint-global) - Related locations validation - Pattern vs taint detection differentiation ### Modified Files - `cmd/ci.go` - Replaced old `generateSARIFOutput()` with new formatter - Uses enriched detections for rich output - Removed unused imports (sarif library, json, encoding/json) - Consistent pattern with JSON and CSV formatters - `cmd/ci_test.go` - Skipped obsolete SARIF tests - Removed unused helper functions ## Key Features ### Code Flows Taint detections automatically include code flows showing the path from source to sink: ```json { "codeFlows": [{ "message": {"text": "Taint flow from line 10 to line 20"}, "threadFlows": [{ "locations": [ { "location": {"physicalLocation": {"region": {"startLine": 10}}}, "message": {"text": "Taint source: user_input"} }, { "location": {"physicalLocation": {"region": {"startLine": 20}}}, "message": {"text": "Taint sink: os.system"} } ] }] }] } ``` ### Help Text with Markdown Rules include rich help text with CWE references: ```markdown ## Command Injection User input flows to shell command without sanitization ### References - [CWE-78](https://cwe.mitre.org/data/definitions/78.html) ``` ### Security Severity Scores GitHub-compatible severity scores for prioritization: - Critical: 9.0 - High: 7.0 - Medium: 5.0 - Low: 3.0 ### Rule Properties ```json { "properties": { "tags": ["security"], "security-severity": "9.0", "precision": "high" } } ``` ## Benefits over Old Implementation | Feature | Old | New | |---------|-----|-----| | Code flows | ❌ None | ✅ Source → Sink visualization | | Related locations | ❌ None | ✅ Taint sources highlighted | | Help text | ❌ Plain text | ✅ Markdown with references | | Security severity | ❌ Level only | ✅ Numeric scores for GitHub | | Rule properties | ❌ None | ✅ Tags, precision | | Pattern detection | ❌ Same as taint | ✅ No code flows (correct) | | Test coverage | ❌ ~60% | ✅ 97.5% | ## Testing - All tests passing (97.5% coverage on SARIF formatter) - Output package overall: 97.5% coverage - Linting checks passed - Integration with ci command verified ## Usage Examples ```bash # Generate enhanced SARIF report with code flows pathfinder ci --rules rules/ --project . --output sarif > results.sarif # Upload to GitHub Code Scanning gh api /repos/:owner/:repo/code-scanning/sarifs -F [email protected] # View in GitHub UI with code flows highlighted ``` ## SARIF Output Sample ```json { "version": "2.1.0", "runs": [{ "tool": { "driver": { "name": "Code Pathfinder", "version": "0.0.25", "rules": [{ "id": "sql-injection", "name": "SQL Injection", "fullDescription": {"text": "Unsanitized user input flows to SQL query (CWE-89, A03:2021)"}, "helpUri": "https://github.com/shivasurya/code-pathfinder", "defaultConfiguration": {"level": "error"}, "properties": { "tags": ["security"], "security-severity": "9.0", "precision": "high" } }] } }, "results": [{ "ruleId": "sql-injection", "message": {"text": "Unsanitized user input flows to SQL query (sink: execute, confidence: 95%)"}, "locations": [{ "physicalLocation": { "artifactLocation": {"uri": "src/db/queries.py"}, "region": {"startLine": 42, "startColumn": 8} } }], "codeFlows": [...], "relatedLocations": [...] }] }] } ``` ## Breaking Changes - Old `generateSARIFOutput()` function removed - SARIF output structure enhanced with additional fields - Pattern matches no longer include code flows (correct behavior) ## Stack Status This PR stacks on: - **PR #4**: shiva/output-json-csv-formatters (#394) ← base branch - **PR #3**: shiva/output-text-formatter (#393) - **PR #2**: shiva/output-logging-system (#392) - **main**: Production branch Next PR: - PR #6: Exit Code Standardization (will stack on this PR) 🤖 Generated with [Claude Code](https://claude.com/claude-code)

shivasurya and others added 3 commits November 21, 2025 17:37

Add enricher and data structure tests

0efb396

- Enricher tests: detection type, FQN parsing, snippets - File cache tests - Location and metadata tests - Coverage: 96.3% for output package Part of output standardization feature. Co-Authored-By: Claude <[email protected]>

shivasurya self-assigned this Nov 21, 2025

shivasurya added enhancement New feature or request go Pull requests that update go code labels Nov 21, 2025

Improve test coverage for enriched detection methods

5b86eff

Add comprehensive tests for ConfidenceLevel and DetectionBadge methods. Coverage for enriched_detection.go now at 100%. Co-Authored-By: Claude <[email protected]>

shivasurya merged commit a6a882a into main Nov 21, 2025
5 checks passed

shivasurya deleted the shiva/output-data-structures branch November 21, 2025 22:43

This was referenced Nov 21, 2025

PR #4: Add JSON and CSV Output Formatters for CI Mode #394

Merged

PR #5: Enhanced SARIF Formatter with Code Flows #395

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PR #1: Data Structures & Enrichment Layer #391

PR #1: Data Structures & Enrichment Layer #391

Uh oh!

shivasurya commented Nov 21, 2025

Uh oh!

safedep bot commented Nov 21, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PR #1: Data Structures & Enrichment Layer #391

PR #1: Data Structures & Enrichment Layer #391

Uh oh!

Conversation

shivasurya commented Nov 21, 2025

Uh oh!

safedep bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SafeDep Report Summary

Uh oh!

codecov bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

safedep bot commented Nov 21, 2025 •

edited

Loading

codecov bot commented Nov 21, 2025 •

edited

Loading