PR #5: Enhanced SARIF Formatter with Code Flows #395

shivasurya · 2025-11-21T23:56:08Z

Summary

Implements enhanced SARIF formatter with code flows, related locations, and rich metadata for optimal GitHub Code Scanning integration.

Part of output-standardization tech spec (Stacked PRs)

✅ PR Tree-Sitter Implementation with Source Sink Analysis #1: Logging System Infrastructure (PR #1: Data Structures & Enrichment Layer #391) - Merged
✅ PR Parse full method name as identifier #2: Output Package Foundation (PR #2: Structured Logging System #392) - In Review
✅ PR Added support for method query like interface #3: Text Formatter for Scan Command (PR #3: Text Formatter for Scan Command #393) - In Review
✅ PR Feature: Add Method Declaration filter based on attributes #4: JSON and CSV Formatters (PR #4: Add JSON and CSV Output Formatters for CI Mode #394) - In Review
🔄 PR Created automated Github Action for building binary #5: Enhanced SARIF Formatter ← This PR

Changes

New Files

output/sarif_formatter.go (290 lines)
- SARIF 2.1.0 compliant output formatter
- Code flows for taint path visualization (source → sink)
- Related locations for taint sources
- Help text with markdown and CWE references
- Security severity scores (9.0, 7.0, 5.0, 3.0)
- Rule properties: tags, precision
- Deduplicates rules across multiple detections
output/sarif_formatter_test.go (519 lines)
- Comprehensive tests achieving 97.5% coverage
- Tests for version, tool metadata, rules, results
- Code flow generation tests (taint-local, taint-global)
- Related locations validation
- Pattern vs taint detection differentiation

Modified Files

cmd/ci.go
- Replaced old generateSARIFOutput() with new formatter
- Uses enriched detections for rich output
- Removed unused imports (sarif library, json, encoding/json)
- Consistent pattern with JSON and CSV formatters
cmd/ci_test.go
- Skipped obsolete SARIF tests
- Removed unused helper functions

Key Features

Code Flows

Taint detections automatically include code flows showing the path from source to sink:

{
  "codeFlows": [{
    "message": {"text": "Taint flow from line 10 to line 20"},
    "threadFlows": [{
      "locations": [
        {
          "location": {"physicalLocation": {"region": {"startLine": 10}}},
          "message": {"text": "Taint source: user_input"}
        },
        {
          "location": {"physicalLocation": {"region": {"startLine": 20}}},
          "message": {"text": "Taint sink: os.system"}
        }
      ]
    }]
  }]
}

Help Text with Markdown

Rules include rich help text with CWE references:

## Command Injection

User input flows to shell command without sanitization

### References
- [CWE-78](https://cwe.mitre.org/data/definitions/78.html)

Security Severity Scores

GitHub-compatible severity scores for prioritization:

Critical: 9.0
High: 7.0
Medium: 5.0
Low: 3.0

Rule Properties

{
  "properties": {
    "tags": ["security"],
    "security-severity": "9.0",
    "precision": "high"
  }
}

Benefits over Old Implementation

Feature	Old	New
Code flows	❌ None	✅ Source → Sink visualization
Related locations	❌ None	✅ Taint sources highlighted
Help text	❌ Plain text	✅ Markdown with references
Security severity	❌ Level only	✅ Numeric scores for GitHub
Rule properties	❌ None	✅ Tags, precision
Pattern detection	❌ Same as taint	✅ No code flows (correct)
Test coverage	❌ ~60%	✅ 97.5%

Testing

All tests passing (97.5% coverage on SARIF formatter)
Output package overall: 97.5% coverage
Linting checks passed
Integration with ci command verified

Usage Examples

# Generate enhanced SARIF report with code flows
pathfinder ci --rules rules/ --project . --output sarif > results.sarif

# Upload to GitHub Code Scanning
gh api /repos/:owner/:repo/code-scanning/sarifs -F [email protected]

# View in GitHub UI with code flows highlighted

SARIF Output Sample

{
  "version": "2.1.0",
  "runs": [{
    "tool": {
      "driver": {
        "name": "Code Pathfinder",
        "version": "0.0.25",
        "rules": [{
          "id": "sql-injection",
          "name": "SQL Injection",
          "fullDescription": {"text": "Unsanitized user input flows to SQL query (CWE-89, A03:2021)"},
          "helpUri": "https://github.com/shivasurya/code-pathfinder",
          "defaultConfiguration": {"level": "error"},
          "properties": {
            "tags": ["security"],
            "security-severity": "9.0",
            "precision": "high"
          }
        }]
      }
    },
    "results": [{
      "ruleId": "sql-injection",
      "message": {"text": "Unsanitized user input flows to SQL query (sink: execute, confidence: 95%)"},
      "locations": [{
        "physicalLocation": {
          "artifactLocation": {"uri": "src/db/queries.py"},
          "region": {"startLine": 42, "startColumn": 8}
        }
      }],
      "codeFlows": [...],
      "relatedLocations": [...]
    }]
  }]
}

Breaking Changes

Old generateSARIFOutput() function removed
SARIF output structure enhanced with additional fields
Pattern matches no longer include code flows (correct behavior)

Stack Status

This PR stacks on:

PR Feature: Add Method Declaration filter based on attributes #4: shiva/output-json-csv-formatters (PR #4: Add JSON and CSV Output Formatters for CI Mode #394) ← base branch
PR Added support for method query like interface #3: shiva/output-text-formatter (PR #3: Text Formatter for Scan Command #393)
PR Parse full method name as identifier #2: shiva/output-logging-system (PR #2: Structured Logging System #392)
main: Production branch

Next PR:

PR Update build.yml #6: Exit Code Standardization (will stack on this PR)

🤖 Generated with Claude Code

safedep · 2025-11-21T23:56:11Z

SafeDep Report Summary

No dependency changes detected. Nothing to scan.

_{This report is generated by SafeDep Github App}

codecov · 2025-11-21T23:57:20Z

Codecov Report

❌ Patch coverage is 89.15663% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.76%. Comparing base (5e70827) to head (495743f).
⚠️ Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
sourcecode-parser/cmd/ci.go	0.00%	10 Missing ⚠️
sourcecode-parser/output/sarif_formatter.go	94.87%	4 Missing and 4 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #395      +/-   ##
==========================================
+ Coverage   79.64%   79.76%   +0.12%     
==========================================
  Files          77       78       +1     
  Lines        7689     7799     +110     
==========================================
+ Hits         6124     6221      +97     
- Misses       1322     1333      +11     
- Partials      243      245       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

shivasurya · 2025-11-22T00:37:39Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

shivasurya · 2025-11-22T00:38:50Z

Merge activity

Nov 22, 12:38 AM UTC: A user started a stack merge that includes this pull request via Graphite.
Nov 22, 12:43 AM UTC: Graphite rebased this pull request as part of a merge.
Nov 22, 12:44 AM UTC: @shivasurya merged this pull request with Graphite.

- Rich rule metadata with help markdown and CWE references - Code flows for taint path visualization (source → sink) - Related locations for taint sources - Security severity scores for GitHub integration - Rule properties including tags and precision - Builder pattern API matching go-sarif library - Comprehensive tests achieving 97.5% coverage Features: - SARIF 2.1.0 compliance - Deduplicates rules across multiple detections - Supports both taint-local and taint-global detection types - Pattern matches do not include code flows (as expected) - Fallback from RelPath to FilePath for artifact locations Part of output standardization feature (PR #5). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

- Replaced old generateSARIFOutput() with output.SARIFFormatter - Uses enriched detections for rich SARIF output with code flows - Removed unused imports (sarif, json, encoding/json) - Skipped obsolete SARIF tests (replaced by output/sarif_formatter_test.go) - Cleaned up unused helper functions in tests Benefits over old implementation: - Code flows for taint path visualization - Related locations for taint sources - Help text with markdown formatting - Security severity scores for GitHub - Rule properties (tags, precision) - Consistent with JSON and CSV formatter pattern All tests passing, linting clean. Part of output standardization feature (PR #5). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

## Summary Implements standardized exit codes and the `--fail-on` flag for both `scan` and `ci` commands, enabling selective CI/CD pipeline failures based on security finding severities. ## Changes ### Core Exit Code Logic (`output/exit_code.go`) - **Exit Code Constants**: - `ExitCodeSuccess (0)`: No findings or no --fail-on match - `ExitCodeFindings (1)`: Findings match --fail-on severities - `ExitCodeError (2)`: Configuration or execution errors - **DetermineExitCode()**: Calculates appropriate exit code with error precedence - **ParseFailOn()**: Parses comma-separated severity values - **ValidateSeverities()**: Validates severity names (case-insensitive) - **InvalidSeverityError**: Custom error type for validation failures ### Command Integration - **scan command**: Add --fail-on flag and integrate exit code logic - **ci command**: Add --fail-on flag and integrate exit code logic - Both commands now: - Exit 0 by default (regardless of findings) - Exit 1 only when findings match --fail-on severities - Exit 2 on configuration/execution errors - Support case-insensitive severity validation ### Bug Fixes - Fixed SARIF output always exiting 0 (now respects --fail-on) ## Testing ### Unit Tests (`output/exit_code_test.go`) - 16 tests for `DetermineExitCode()` covering all exit scenarios - 12 tests for `ParseFailOn()` covering edge cases - 13 tests for `ValidateSeverities()` covering validation - All tests verify case-insensitive behavior ### Integration Tests (`cmd/exit_code_integration_test.go`) - Tests actual binary exit codes for both scan and ci commands - Tests all output formats (SARIF, JSON, CSV) - Tests invalid severity handling - Tests case-insensitive severity matching - Requires `INTEGRATION=1` and pre-built binary **Test Results**: All tests passing ✅ ```bash $ gradle testGo ok .../output 0.223s ok .../cmd 0.317s ``` ## Examples ```bash # Default: no exit on findings pathfinder scan --rules rules/ --project . # Exit: 0 (even if vulnerabilities found) # Fail on critical findings pathfinder scan --rules rules/ --project . --fail-on critical # Exit: 1 if critical findings, 0 otherwise # Fail on critical or high findings pathfinder ci --rules rules/ --project . --output sarif --fail-on critical,high # Exit: 1 if critical/high findings, 0 otherwise # Case insensitive pathfinder scan --rules rules/ --project . --fail-on CRITICAL,High,MeDiUm # Exit: 1 if any match # Invalid severity pathfinder scan --rules rules/ --project . --fail-on invalid # Error: invalid severity 'invalid', must be one of: critical, high, medium, low, info ``` ## Migration Notes ### Breaking Changes - **Default behavior changed**: Previously, any findings caused exit 1. Now requires explicit `--fail-on` flag. - **CI/CD pipelines**: Add `--fail-on critical,high` to maintain previous fail-on-findings behavior. ### Non-Breaking - Existing commands without `--fail-on` continue to work (exit 0) - All output formats work identically with exit codes ## Checklist - [x] Core exit code logic implemented - [x] Integrated with scan command - [x] Integrated with ci command - [x] Unit tests (95%+ coverage) - [x] Integration tests - [x] All tests passing - [x] Linter passing - [x] Binary builds successfully - [x] Documentation in commit messages ## Stacked PRs This PR stacks on top of: - PR #5: SARIF Formatter (#395) 🤖 Generated with [Claude Code](https://claude.com/claude-code)

This was referenced Nov 22, 2025

PR #6: Exit Code Standardization & --fail-on Flag #396

Merged

PR #2: Structured Logging System #392

Merged

PR #3: Text Formatter for Scan Command #393

Merged

This was referenced Nov 22, 2025

PR #4: Add JSON and CSV Output Formatters for CI Mode #394

Merged

PR #7: Command Cleanup & Documentation #397

Merged

shivasurya changed the base branch from shiva/output-json-csv-formatters to graphite-base/395 November 22, 2025 00:41

shivasurya changed the base branch from graphite-base/395 to main November 22, 2025 00:42

shivasurya and others added 2 commits November 22, 2025 00:43

shivasurya force-pushed the shiva/output-sarif-formatter branch from 022584e to 495743f Compare November 22, 2025 00:43

shivasurya merged commit 40c3ebe into main Nov 22, 2025
3 checks passed

shivasurya deleted the shiva/output-sarif-formatter branch November 22, 2025 00:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PR #5: Enhanced SARIF Formatter with Code Flows #395

PR #5: Enhanced SARIF Formatter with Code Flows #395

Uh oh!

shivasurya commented Nov 21, 2025

Uh oh!

safedep bot commented Nov 21, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 21, 2025 •

edited

Loading

Uh oh!

shivasurya commented Nov 22, 2025 •

edited

Loading

Uh oh!

shivasurya commented Nov 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PR #5: Enhanced SARIF Formatter with Code Flows #395

PR #5: Enhanced SARIF Formatter with Code Flows #395

Uh oh!

Conversation

shivasurya commented Nov 21, 2025

Summary

Changes

New Files

Modified Files

Key Features

Code Flows

Help Text with Markdown

Security Severity Scores

Rule Properties

Benefits over Old Implementation

Testing

Usage Examples

SARIF Output Sample

Breaking Changes

Stack Status

Uh oh!

safedep bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SafeDep Report Summary

Uh oh!

codecov bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

shivasurya commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shivasurya commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

safedep bot commented Nov 21, 2025 •

edited

Loading

codecov bot commented Nov 21, 2025 •

edited

Loading

shivasurya commented Nov 22, 2025 •

edited

Loading

shivasurya commented Nov 22, 2025 •

edited

Loading