Skip to content

Conversation

@shivasurya
Copy link
Owner

@shivasurya shivasurya commented Nov 4, 2025

Summary

This PR implements the foundation for intra-procedural taint tracking by adding core data structures. This is PR #1 of 5 in the intra-procedural dataflow feature stack.

Changes

New Files

  1. graph/callgraph/statement.go (208 lines)

    • StatementType enum with 11 Python statement types (assignment, call, return, if, for, while, with, try, raise, import, expression)
    • Statement struct for representing code statements with def-use information
    • DefUseChain for tracking variable definitions and uses across a function
    • Complete API for def-use chain construction and querying
  2. graph/callgraph/taint_summary.go (238 lines)

    • TaintInfo struct for detailed taint tracking (source/sink locations, propagation paths)
    • Confidence scoring (0.0-1.0) for detection quality (high/medium/low)
    • TaintSummary for complete function-level analysis results
    • Support for parameter and return value tainting
    • Error tracking for failed analyses
  3. graph/callgraph/statement_test.go (444 lines)

    • 16 comprehensive test functions covering all Statement functionality
    • Table-driven tests with multiple scenarios
    • Complex scenario test simulating real code patterns
  4. graph/callgraph/taint_summary_test.go (397 lines)

    • 21 comprehensive test functions covering all TaintSummary functionality
    • Table-driven tests for confidence level classification
    • Complex scenario test simulating SQL injection detection

Test Coverage

100% code coverage (all 36 functions tested)

  • Total: 1,089 lines of code
  • 37 test cases covering all functionality
  • All edge cases tested (empty inputs, nil values, duplicates)

Quality Checks

✅ All tests pass (gradle testGo)
✅ Lint passes with 0 issues (gradle lintGo)
✅ Build succeeds (gradle buildGo)
✅ 100% code coverage

Technical Details

Statement Representation

  • Captures both def-use information and control flow structure
  • Supports nested statements (if/for/while/try blocks)
  • Line number tracking for precise error reporting

Taint Tracking

  • Multi-path taint tracking (variables can have multiple taint sources)
  • Confidence-based detection (0.8+ high, 0.5-0.8 medium, <0.5 low)
  • Sanitization tracking to reduce false positives
  • Propagation path recording for debugging

Design Decisions

  • Conservative approach: Track ALL definitions (not just reaching definitions)
  • Future-proof: Supports both intra-procedural and inter-procedural analysis
  • Memory-efficient: Only metadata stored, not full code snippets

Next Steps

This PR provides the foundational data structures. Future PRs will implement:

  • Statement extraction from Python AST
  • Def-use chain construction algorithms
  • Intra-procedural taint propagation engine
  • Integration into the call graph builder

🤖 Generated with Claude Code

Co-Authored-By: Claude [email protected]

…nalysis

Implements the foundation for intra-procedural taint tracking with two new
core data structures:

## Statement (`statement.go`)
- StatementType enum with 11 Python statement types
- Statement struct for representing code statements with def-use information
- DefUseChain for tracking variable definitions and uses
- Full API for def-use chain construction and querying

## TaintSummary (`taint_summary.go`)
- TaintInfo struct for detailed taint tracking (source, sink, propagation path)
- Confidence scoring (0.0-1.0) for detection quality
- TaintSummary for complete function-level analysis results
- Support for parameter and return value tainting

## Test Coverage
- 100% code coverage (37 test cases)
- statement_test.go: 16 comprehensive tests
- taint_summary_test.go: 21 comprehensive tests
- Complex scenario tests simulating real security issues

This is PR #1 of 5 in the intra-procedural dataflow feature stack.
Next: PR #2 will implement statement extraction from Python AST.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@shivasurya shivasurya marked this pull request as ready for review November 4, 2025 00:10
@safedep
Copy link

safedep bot commented Nov 4, 2025

SafeDep Report Summary

Green Malicious Packages Badge Green Vulnerable Packages Badge Green Risky License Badge

No dependency changes detected. Nothing to scan.

This report is generated by SafeDep Github App

@codecov
Copy link

codecov bot commented Nov 4, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.50%. Comparing base (3ed6a7e) to head (aeccb4a).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #343      +/-   ##
==========================================
+ Coverage   74.91%   75.50%   +0.58%     
==========================================
  Files          47       49       +2     
  Lines        5566     5699     +133     
==========================================
+ Hits         4170     4303     +133     
  Misses       1221     1221              
  Partials      175      175              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Owner Author

shivasurya commented Nov 4, 2025

Merge activity

  • Nov 4, 1:51 AM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Nov 4, 1:51 AM UTC: @shivasurya merged this pull request with Graphite.

@shivasurya shivasurya merged commit 1f7bc7a into main Nov 4, 2025
5 checks passed
@shivasurya shivasurya deleted the feat/intra-procedural-dataflow-pr1-data-structures branch November 4, 2025 01:51
shivasurya added a commit that referenced this pull request Nov 4, 2025
… dataflow (#344)

## Summary

Implements Python statement extraction from AST to support intra-procedural dataflow analysis. This is part 2 of the intra-procedural dataflow feature.

## Changes

- Add statement extraction for Python functions
- Extract assignments, calls, and returns with def-use information
- Comprehensive test coverage (87.3%)

## Testing

- 20+ tests covering all statement types
- All tests passing
- Build and lint clean

Stacked on #343

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request go Pull requests that update go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants