feat(dataflow): Add core data structures for intra-procedural taint analysis #343

shivasurya · 2025-11-04T00:09:29Z

Summary

This PR implements the foundation for intra-procedural taint tracking by adding core data structures. This is PR #1 of 5 in the intra-procedural dataflow feature stack.

Changes

New Files

graph/callgraph/statement.go (208 lines)
- StatementType enum with 11 Python statement types (assignment, call, return, if, for, while, with, try, raise, import, expression)
- Statement struct for representing code statements with def-use information
- DefUseChain for tracking variable definitions and uses across a function
- Complete API for def-use chain construction and querying
graph/callgraph/taint_summary.go (238 lines)
- TaintInfo struct for detailed taint tracking (source/sink locations, propagation paths)
- Confidence scoring (0.0-1.0) for detection quality (high/medium/low)
- TaintSummary for complete function-level analysis results
- Support for parameter and return value tainting
- Error tracking for failed analyses
graph/callgraph/statement_test.go (444 lines)
- 16 comprehensive test functions covering all Statement functionality
- Table-driven tests with multiple scenarios
- Complex scenario test simulating real code patterns
graph/callgraph/taint_summary_test.go (397 lines)
- 21 comprehensive test functions covering all TaintSummary functionality
- Table-driven tests for confidence level classification
- Complex scenario test simulating SQL injection detection

Test Coverage

✅ 100% code coverage (all 36 functions tested)

Total: 1,089 lines of code
37 test cases covering all functionality
All edge cases tested (empty inputs, nil values, duplicates)

Quality Checks

✅ All tests pass (gradle testGo)
✅ Lint passes with 0 issues (gradle lintGo)
✅ Build succeeds (gradle buildGo)
✅ 100% code coverage

Technical Details

Statement Representation

Captures both def-use information and control flow structure
Supports nested statements (if/for/while/try blocks)
Line number tracking for precise error reporting

Taint Tracking

Multi-path taint tracking (variables can have multiple taint sources)
Confidence-based detection (0.8+ high, 0.5-0.8 medium, <0.5 low)
Sanitization tracking to reduce false positives
Propagation path recording for debugging

Design Decisions

Conservative approach: Track ALL definitions (not just reaching definitions)
Future-proof: Supports both intra-procedural and inter-procedural analysis
Memory-efficient: Only metadata stored, not full code snippets

Next Steps

This PR provides the foundational data structures. Future PRs will implement:

Statement extraction from Python AST
Def-use chain construction algorithms
Intra-procedural taint propagation engine
Integration into the call graph builder

🤖 Generated with Claude Code

Co-Authored-By: Claude [email protected]

…nalysis Implements the foundation for intra-procedural taint tracking with two new core data structures: ## Statement (`statement.go`) - StatementType enum with 11 Python statement types - Statement struct for representing code statements with def-use information - DefUseChain for tracking variable definitions and uses - Full API for def-use chain construction and querying ## TaintSummary (`taint_summary.go`) - TaintInfo struct for detailed taint tracking (source, sink, propagation path) - Confidence scoring (0.0-1.0) for detection quality - TaintSummary for complete function-level analysis results - Support for parameter and return value tainting ## Test Coverage - 100% code coverage (37 test cases) - statement_test.go: 16 comprehensive tests - taint_summary_test.go: 21 comprehensive tests - Complex scenario tests simulating real security issues This is PR #1 of 5 in the intra-procedural dataflow feature stack. Next: PR #2 will implement statement extraction from Python AST. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

shivasurya · 2025-11-04T00:09:43Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

safedep · 2025-11-04T00:10:05Z

SafeDep Report Summary

No dependency changes detected. Nothing to scan.

_{This report is generated by SafeDep Github App}

codecov · 2025-11-04T00:10:33Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.50%. Comparing base (3ed6a7e) to head (aeccb4a).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #343      +/-   ##
==========================================
+ Coverage   74.91%   75.50%   +0.58%     
==========================================
  Files          47       49       +2     
  Lines        5566     5699     +133     
==========================================
+ Hits         4170     4303     +133     
  Misses       1221     1221              
  Partials      175      175

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

shivasurya · 2025-11-04T01:51:36Z

Merge activity

Nov 4, 1:51 AM UTC: A user started a stack merge that includes this pull request via Graphite.
Nov 4, 1:51 AM UTC: @shivasurya merged this pull request with Graphite.

… dataflow (#344) ## Summary Implements Python statement extraction from AST to support intra-procedural dataflow analysis. This is part 2 of the intra-procedural dataflow feature. ## Changes - Add statement extraction for Python functions - Extract assignments, calls, and returns with def-use information - Comprehensive test coverage (87.3%) ## Testing - 20+ tests covering all statement types - All tests passing - Build and lint clean Stacked on #343 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

shivasurya marked this pull request as ready for review November 4, 2025 00:10

shivasurya self-assigned this Nov 4, 2025

shivasurya added enhancement New feature or request go Pull requests that update go code labels Nov 4, 2025

shivasurya merged commit 1f7bc7a into main Nov 4, 2025
5 checks passed

shivasurya deleted the feat/intra-procedural-dataflow-pr1-data-structures branch November 4, 2025 01:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(dataflow): Add core data structures for intra-procedural taint analysis #343

feat(dataflow): Add core data structures for intra-procedural taint analysis #343

shivasurya commented Nov 4, 2025 •

edited

Loading

Uh oh!

shivasurya commented Nov 4, 2025 •

edited

Loading

Uh oh!

safedep bot commented Nov 4, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 4, 2025 •

edited

Loading

Uh oh!

shivasurya commented Nov 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(dataflow): Add core data structures for intra-procedural taint analysis #343

feat(dataflow): Add core data structures for intra-procedural taint analysis #343

Conversation

shivasurya commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New Files

Test Coverage

Quality Checks

Technical Details

Statement Representation

Taint Tracking

Design Decisions

Next Steps

Uh oh!

shivasurya commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

safedep bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SafeDep Report Summary

Uh oh!

codecov bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

shivasurya commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shivasurya commented Nov 4, 2025 •

edited

Loading

shivasurya commented Nov 4, 2025 •

edited

Loading

safedep bot commented Nov 4, 2025 •

edited

Loading

codecov bot commented Nov 4, 2025 •

edited

Loading

shivasurya commented Nov 4, 2025 •

edited

Loading