Skip to content

Conversation

@shivasurya
Copy link
Owner

Implements foundational data structures and enrichment logic for output standardization. This PR introduces EnrichedDetection with resolved file paths, code snippets, and comprehensive metadata from raw detections. The enricher transforms FQN-based detections into user-friendly output with context lines and reference URLs.

The enrichment layer uses callgraph lookup with fallback heuristics for file path resolution and caches file contents for performance. Detection types are classified as pattern matches or taint flows (local/global scope).

All tests pass with 96.3% coverage for the new output package. Binary builds successfully and passes linting with zero issues.

Part of the output standardization feature.

shivasurya and others added 3 commits November 21, 2025 17:37
- EnrichedDetection with location, snippet, metadata

- RuleMetadata with CWE, OWASP, references

- DetectionType enum (pattern, taint-local, taint-global)

- OutputOptions with verbosity levels

- LocationInfo and CodeSnippet structures

Part of output standardization feature.

Co-Authored-By: Claude <[email protected]>
- FQN to file path resolution via callgraph lookup

- Fallback heuristic for FQN parsing

- Code snippet extraction with configurable context

- File content caching for performance

- Rule metadata extraction with CWE/OWASP URLs

- Taint path construction (source/sink only for v1)

Part of output standardization feature.

Co-Authored-By: Claude <[email protected]>
- Enricher tests: detection type, FQN parsing, snippets

- File cache tests

- Location and metadata tests

- Coverage: 96.3% for output package

Part of output standardization feature.

Co-Authored-By: Claude <[email protected]>
@safedep
Copy link

safedep bot commented Nov 21, 2025

SafeDep Report Summary

Green Malicious Packages Badge Green Vulnerable Packages Badge Green Risky License Badge

No dependency changes detected. Nothing to scan.

This report is generated by SafeDep Github App

@codecov
Copy link

codecov bot commented Nov 21, 2025

Codecov Report

❌ Patch coverage is 96.11111% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.33%. Comparing base (8f2eb7b) to head (5b86eff).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
sourcecode-parser/output/enricher.go 95.39% 4 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #391      +/-   ##
==========================================
+ Coverage   78.91%   79.33%   +0.42%     
==========================================
  Files          70       73       +3     
  Lines        7123     7303     +180     
==========================================
+ Hits         5621     5794     +173     
- Misses       1263     1267       +4     
- Partials      239      242       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@shivasurya shivasurya self-assigned this Nov 21, 2025
@shivasurya shivasurya added enhancement New feature or request go Pull requests that update go code labels Nov 21, 2025
Add comprehensive tests for ConfidenceLevel and DetectionBadge methods. Coverage for enriched_detection.go now at 100%.

Co-Authored-By: Claude <[email protected]>
@shivasurya shivasurya merged commit a6a882a into main Nov 21, 2025
5 checks passed
@shivasurya shivasurya deleted the shiva/output-data-structures branch November 21, 2025 22:43
shivasurya added a commit that referenced this pull request Nov 22, 2025
## Summary
Implements JSON and CSV output formatters for the `ci` command, replacing the old inline JSON generation with a modular, well-tested implementation.

**Part of output-standardization tech spec (Stacked PRs)**
- ✅ PR #1: Logging System Infrastructure (#391) - **Merged**
- ✅ PR #2: Output Package Foundation (#392) - **In Review**
- ✅ PR #3: Text Formatter for Scan Command (#393) - **In Review**
- 🔄 PR #4: JSON and CSV Formatters ← **This PR**

## Changes

### New Files
- `output/json_formatter.go` (235 lines)
  - Enhanced JSON output with rich metadata structure
  - Tool, scan, results, summary, and errors sections
  - Code snippets with configurable context lines
  - Taint flow source/sink information
  - CWE, OWASP, and reference metadata
  
- `output/csv_formatter.go` (123 lines)
  - CSV output for CI/CD integration
  - 17 columns: severity, confidence, rule_id, rule_name, cwe, owasp, file, line, column, function, message, detection_type, detection_scope, source_line, sink_line, tainted_var, sink_call
  - Proper escaping via encoding/csv package

- `output/json_formatter_test.go` (415 lines)
  - Comprehensive tests achieving 100% coverage
  - Structure validation, snippet handling, metadata, pattern vs taint detection

- `output/csv_formatter_test.go` (395 lines)
  - Comprehensive tests achieving 100% coverage
  - Header validation, escaping, multiple rows, zero values

### Modified Files
- `cmd/ci.go`
  - Replaced old `generateJSONOutput()` with new formatter integration
  - Added enrichment pipeline using `output.NewEnricher()`
  - Updated output format validation to include "csv"
  - Added CSV formatter support
  - Updated help text and examples
  - Exit code 1 when vulnerabilities found (for CI/CD)

- `cmd/ci_test.go`
  - Skipped obsolete `TestGenerateJSONOutput` (replaced by new formatter tests)

- `main_test.go`
  - Updated expected help text to include CSV output format

## JSON Output Structure
```json
{
  "tool": {
    "name": "Code Pathfinder",
    "version": "1.0.0",
    "url": "https://codepathfinder.dev"
  },
  "scan": {
    "target": "/path/to/project",
    "timestamp": "2025-01-21T10:30:00Z",
    "duration": 5.43,
    "rules_executed": 12
  },
  "results": [{
    "rule_id": "sql-injection",
    "rule_name": "SQL Injection",
    "message": "Unsanitized user input flows to SQL query",
    "severity": "critical",
    "confidence": "high",
    "location": {
      "file": "src/main.py",
      "line": 42,
      "column": 8,
      "function": "process_user",
      "snippet": {
        "start_line": 40,
        "end_line": 44,
        "lines": ["...", "query = f\"SELECT * FROM users WHERE id={user_id}\"", "..."]
      }
    },
    "detection": {
      "type": "taint-local",
      "scope": "intra-procedural",
      "confidence_score": 0.95,
      "source": {"line": 38, "variable": "user_id"},
      "sink": {"line": 42, "call": "execute"}
    },
    "metadata": {
      "cwe": ["CWE-89"],
      "owasp": ["A03:2021"],
      "references": ["https://..."]
    }
  }],
  "summary": {
    "total": 5,
    "by_severity": {"critical": 2, "high": 3},
    "by_detection_type": {"taint-local": 4, "pattern": 1}
  },
  "errors": []
}
```

## CSV Output Format
```csv
severity,confidence,rule_id,rule_name,cwe,owasp,file,line,column,function,message,detection_type,detection_scope,source_line,sink_line,tainted_var,sink_call
critical,high,sql-injection,SQL Injection,CWE-89,A03:2021,src/main.py,42,8,process_user,Unsanitized user input flows to SQL query,taint-local,intra-procedural,38,42,user_id,execute
```

## Testing
- All tests passing (100% coverage for both formatters)
- Output package overall: 98.1% coverage
- Linting checks passed
- Integration tests with ci command verified

## Usage Examples
```bash
# Generate JSON report
pathfinder ci --rules rules/ --project . --output json > results.json

# Generate CSV report  
pathfinder ci --rules rules/ --project . --output csv > results.csv

# Generate SARIF report (existing)
pathfinder ci --rules rules/ --project . --output sarif > results.sarif
```

## Breaking Changes
- Old `generateJSONOutput()` function removed from cmd/ci.go
- JSON output structure changed to new rich format (snake_case fields)
- Exit code behavior unchanged (exits 1 when vulnerabilities found)

## Stack Status
This PR stacks on:
- **PR #3**: shiva/output-text-formatter (#393) ← base branch
- **PR #2**: shiva/output-logging-system (#392)
- **main**: Production branch

Next PR:
- PR #5: SARIF Formatter Enhancement (will stack on this PR)

🤖 Generated with [Claude Code](https://claude.com/claude-code)
shivasurya added a commit that referenced this pull request Nov 22, 2025
## Summary
Implements enhanced SARIF formatter with code flows, related locations, and rich metadata for optimal GitHub Code Scanning integration.

**Part of output-standardization tech spec (Stacked PRs)**
- ✅ PR #1: Logging System Infrastructure (#391) - **Merged**
- ✅ PR #2: Output Package Foundation (#392) - **In Review**
- ✅ PR #3: Text Formatter for Scan Command (#393) - **In Review**
- ✅ PR #4: JSON and CSV Formatters (#394) - **In Review**
- 🔄 PR #5: Enhanced SARIF Formatter ← **This PR**

## Changes

### New Files
- `output/sarif_formatter.go` (290 lines)
  - SARIF 2.1.0 compliant output formatter
  - Code flows for taint path visualization (source → sink)
  - Related locations for taint sources
  - Help text with markdown and CWE references
  - Security severity scores (9.0, 7.0, 5.0, 3.0)
  - Rule properties: tags, precision
  - Deduplicates rules across multiple detections

- `output/sarif_formatter_test.go` (519 lines)
  - Comprehensive tests achieving 97.5% coverage
  - Tests for version, tool metadata, rules, results
  - Code flow generation tests (taint-local, taint-global)
  - Related locations validation
  - Pattern vs taint detection differentiation

### Modified Files
- `cmd/ci.go`
  - Replaced old `generateSARIFOutput()` with new formatter
  - Uses enriched detections for rich output
  - Removed unused imports (sarif library, json, encoding/json)
  - Consistent pattern with JSON and CSV formatters

- `cmd/ci_test.go`
  - Skipped obsolete SARIF tests
  - Removed unused helper functions

## Key Features

### Code Flows
Taint detections automatically include code flows showing the path from source to sink:

```json
{
  "codeFlows": [{
    "message": {"text": "Taint flow from line 10 to line 20"},
    "threadFlows": [{
      "locations": [
        {
          "location": {"physicalLocation": {"region": {"startLine": 10}}},
          "message": {"text": "Taint source: user_input"}
        },
        {
          "location": {"physicalLocation": {"region": {"startLine": 20}}},
          "message": {"text": "Taint sink: os.system"}
        }
      ]
    }]
  }]
}
```

### Help Text with Markdown
Rules include rich help text with CWE references:

```markdown
## Command Injection

User input flows to shell command without sanitization

### References
- [CWE-78](https://cwe.mitre.org/data/definitions/78.html)
```

### Security Severity Scores
GitHub-compatible severity scores for prioritization:
- Critical: 9.0
- High: 7.0
- Medium: 5.0
- Low: 3.0

### Rule Properties
```json
{
  "properties": {
    "tags": ["security"],
    "security-severity": "9.0",
    "precision": "high"
  }
}
```

## Benefits over Old Implementation

| Feature | Old | New |
|---------|-----|-----|
| Code flows | ❌ None | ✅ Source → Sink visualization |
| Related locations | ❌ None | ✅ Taint sources highlighted |
| Help text | ❌ Plain text | ✅ Markdown with references |
| Security severity | ❌ Level only | ✅ Numeric scores for GitHub |
| Rule properties | ❌ None | ✅ Tags, precision |
| Pattern detection | ❌ Same as taint | ✅ No code flows (correct) |
| Test coverage | ❌ ~60% | ✅ 97.5% |

## Testing
- All tests passing (97.5% coverage on SARIF formatter)
- Output package overall: 97.5% coverage
- Linting checks passed
- Integration with ci command verified

## Usage Examples
```bash
# Generate enhanced SARIF report with code flows
pathfinder ci --rules rules/ --project . --output sarif > results.sarif

# Upload to GitHub Code Scanning
gh api /repos/:owner/:repo/code-scanning/sarifs -F [email protected]

# View in GitHub UI with code flows highlighted
```

## SARIF Output Sample
```json
{
  "version": "2.1.0",
  "runs": [{
    "tool": {
      "driver": {
        "name": "Code Pathfinder",
        "version": "0.0.25",
        "rules": [{
          "id": "sql-injection",
          "name": "SQL Injection",
          "fullDescription": {"text": "Unsanitized user input flows to SQL query (CWE-89, A03:2021)"},
          "helpUri": "https://github.com/shivasurya/code-pathfinder",
          "defaultConfiguration": {"level": "error"},
          "properties": {
            "tags": ["security"],
            "security-severity": "9.0",
            "precision": "high"
          }
        }]
      }
    },
    "results": [{
      "ruleId": "sql-injection",
      "message": {"text": "Unsanitized user input flows to SQL query (sink: execute, confidence: 95%)"},
      "locations": [{
        "physicalLocation": {
          "artifactLocation": {"uri": "src/db/queries.py"},
          "region": {"startLine": 42, "startColumn": 8}
        }
      }],
      "codeFlows": [...],
      "relatedLocations": [...]
    }]
  }]
}
```

## Breaking Changes
- Old `generateSARIFOutput()` function removed
- SARIF output structure enhanced with additional fields
- Pattern matches no longer include code flows (correct behavior)

## Stack Status
This PR stacks on:
- **PR #4**: shiva/output-json-csv-formatters (#394) ← base branch
- **PR #3**: shiva/output-text-formatter (#393)
- **PR #2**: shiva/output-logging-system (#392)
- **main**: Production branch

Next PR:
- PR #6: Exit Code Standardization (will stack on this PR)

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request go Pull requests that update go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants