Skip to content

Conversation

@shivasurya
Copy link
Owner

No description provided.

@shivasurya shivasurya merged commit 0e990a9 into main Apr 21, 2024
@shivasurya shivasurya deleted the shiva/build branch April 21, 2024 02:48
shivasurya added a commit that referenced this pull request Oct 26, 2025
This PR completes the 3-pass algorithm for building Python call graphs
by implementing the final pass that resolves call targets and constructs
the complete graph structure with edges linking callers to callees.

## Changes

### Core Implementation (builder.go)

1. **BuildCallGraph()**: Main entry point for Pass 3
   - Indexes all function definitions from code graph
   - Iterates through all Python files in the registry
   - Extracts imports and call sites for each file
   - Resolves each call site to its target function
   - Builds edges and stores call site details
   - Returns complete CallGraph with all relationships

2. **indexFunctions()**: Function indexing
   - Scans code graph for all function/method definitions
   - Maps each function to its FQN using module registry
   - Populates CallGraph.Functions map for quick lookup

3. **getFunctionsInFile()**: File-scoped function retrieval
   - Filters code graph nodes by file path
   - Returns only function/method definitions in that file
   - Used for finding containing functions of call sites

4. **findContainingFunction()**: Call site parent resolution
   - Determines which function contains a given call site
   - Uses line number comparison with nearest-match algorithm
   - Finds function with highest line number ≤ call line
   - Returns empty string for module-level calls

5. **resolveCallTarget()**: Core resolution logic
   - Handles simple names: sanitize() → myapp.utils.sanitize
   - Handles qualified names: utils.sanitize() → myapp.utils.sanitize
   - Resolves through import maps first
   - Falls back to same-module resolution
   - Validates FQNs against module registry
   - Returns (FQN, resolved bool) tuple

6. **validateFQN()**: FQN validation
   - Checks if a fully qualified name exists in registry
   - Handles both modules and functions within modules
   - Validates parent module for function FQNs

7. **readFileBytes()**: File reading helper
   - Reads source files for parsing
   - Handles absolute path conversion

### Comprehensive Tests (builder_test.go)

Created 15 test functions covering:

**Resolution Tests:**
- Simple imported function resolution
- Qualified import resolution (module.function)
- Same-module function resolution
- Unresolved method calls (obj.method)
- Non-existent function handling

**Validation Tests:**
- Module existence validation
- Function-in-module validation
- Non-existent module handling

**Helper Function Tests:**
- Function indexing from code graph
- Functions-in-file filtering
- Containing function detection with edge cases

**Integration Tests:**
- Simple single-file call graph
- Multi-file call graph with imports
- Real test fixture integration

## Test Coverage

- Overall: 91.8%
- BuildCallGraph: 80.8%
- indexFunctions: 87.5%
- getFunctionsInFile: 100.0%
- findContainingFunction: 100.0%
- resolveCallTarget: 85.0%
- validateFQN: 100.0%
- readFileBytes: 75.0%

## Algorithm Overview

Pass 3 ties together all previous work:

### Pass 1 (PR #2): BuildModuleRegistry
- Maps file paths to module paths
- Enables FQN generation

### Pass 2 (PRs #3-5): Import & Call Site Extraction
- ExtractImports: Maps local names to FQNs
- ExtractCallSites: Finds all function calls in AST

### Pass 3 (This PR): Call Graph Construction
- Resolves call targets using import maps
- Links callers to callees with edges
- Validates resolutions against registry
- Stores detailed call site information

## Resolution Strategy

The resolver uses a multi-step approach:

1. **Simple names** (no dots):
   - Check import map first
   - Fall back to same-module lookup
   - Return unresolved if neither works

2. **Qualified names** (with dots):
   - Split into base + rest
   - Resolve base through imports
   - Append rest to get full FQN
   - Try current module if not imported

3. **Validation**:
   - Check if target exists in registry
   - For functions, validate parent module exists
   - Mark resolution success/failure

## Design Decisions

1. **Containing function detection**:
   - Uses nearest-match algorithm based on line numbers
   - Finds function with highest line number ≤ call line
   - Handles module-level calls by returning empty FQN

2. **Resolution priority**:
   - Import map takes precedence over same-module
   - Explicit imports always respected even if unresolved
   - Same-module only tried when not in imports

3. **Validation vs Resolution**:
   - Resolution finds FQN from imports/context
   - Validation checks if FQN exists in registry
   - Both pieces of information stored in CallSite

4. **Error handling**:
   - Continues processing even if some files fail
   - Marks individual call sites as unresolved
   - Returns partial graph instead of failing completely

## Next Steps

The call graph infrastructure is now complete. Future PRs will:

- PR #7: Add CFG data structures for control flow analysis
- PR #8: Implement pattern matching for security rules
- PR #9: Integrate into main initialization pipeline
- PR #10: Add comprehensive documentation and examples
- PR #11: Performance optimizations (caching, pooling)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
shivasurya added a commit that referenced this pull request Oct 29, 2025
* feat: Add core data structures for call graph (PR #1)

Add foundational data structures for Python call graph construction:

New Types:
- CallSite: Represents function call locations with arguments and resolution status
- CallGraph: Maps functions to callees with forward/reverse edges
- ModuleRegistry: Maps Python file paths to module paths
- ImportMap: Tracks imports per file for name resolution
- Location: Source code position tracking
- Argument: Function call argument metadata

Features:
- 100% test coverage with comprehensive unit tests
- Bidirectional call graph edges (forward and reverse)
- Support for ambiguous short names in module registry
- Helper functions for module path manipulation

This establishes the foundation for 3-pass call graph algorithm:
- Pass 1 (next PR): Module registry builder
- Pass 2 (next PR): Import extraction and resolution
- Pass 3 (next PR): Call graph construction

Related: Phase 1 - Call Graph Construction & 3-Pass Algorithm

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement module registry - Pass 1 of 3-pass algorithm (PR #2)

Implement the first pass of the call graph construction algorithm: building
a complete registry of Python modules by walking the directory tree.

New Features:
- BuildModuleRegistry: Walks directory tree and maps file paths to module paths
- convertToModulePath: Converts file system paths to Python import paths
- shouldSkipDirectory: Filters out venv, __pycache__, build dirs, etc.

Module Path Conversion:
- Handles regular files: myapp/views.py → myapp.views
- Handles packages: myapp/utils/__init__.py → myapp.utils
- Supports deep nesting: myapp/api/v1/endpoints/users.py → myapp.api.v1.endpoints.users
- Cross-platform: Normalizes Windows/Unix path separators

Performance Optimizations:
- Skips 15+ common non-source directories (venv, __pycache__, .git, dist, build, etc.)
- Avoids scanning thousands of dependency files
- Indexes both full module paths and short names for ambiguity detection

Test Coverage: 93%
- Comprehensive unit tests for all conversion scenarios
- Integration tests with real Python project structure
- Edge case handling: empty dirs, non-Python files, deep nesting, permissions
- Error path testing: walk errors, invalid paths, system errors
- Test fixtures: test-src/python/simple_project/ with realistic structure
- Documented: Remaining 7% are untestable OS-level errors (filepath.Abs failures)

This establishes Pass 1 of 3:
- ✅ Pass 1: Module registry (this PR)
- Next: Pass 2 - Import extraction and resolution
- Next: Pass 3 - Call graph construction

Related: Phase 1 - Call Graph Construction & 3-Pass Algorithm
Base Branch: shiva/callgraph-infra-1 (PR #1)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement import extraction with tree-sitter - Pass 2 Part A

This PR implements comprehensive import extraction for Python code using
tree-sitter AST parsing. It handles all three main import styles:

1. Simple imports: `import module`
2. From imports: `from module import name`
3. Aliased imports: `import module as alias` and `from module import name as alias`

The implementation uses direct AST traversal instead of tree-sitter queries
for better compatibility and control. It properly handles:
- Multiple imports per line (`from json import dumps, loads`)
- Nested module paths (`import xml.etree.ElementTree`)
- Whitespace variations
- Invalid/malformed syntax (fault-tolerant parsing)

Key functions:
- ExtractImports(): Main entry point that parses code and builds ImportMap
- traverseForImports(): Recursively traverses AST to find import statements
- processImportStatement(): Handles simple and aliased imports
- processImportFromStatement(): Handles from-import statements with proper
  module name skipping to avoid duplicate entries

Test coverage: 92.8% overall, 90-95% for import extraction functions

Test fixtures include:
- simple_imports.py: Basic import statements
- from_imports.py: From import statements with multiple names
- aliased_imports.py: Aliased imports (both simple and from)
- mixed_imports.py: Mixed import styles

All tests passing, linting clean, builds successfully.

This is Pass 2 Part A of the 3-pass call graph algorithm.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement relative import resolution - Pass 2 Part B

This PR implements comprehensive relative import resolution for Python using
a 3-pass algorithm. It extends the import extraction system from PR #3 to handle
Python's relative import syntax with dot notation.

Key Changes:

1. **Added FileToModule reverse mapping to ModuleRegistry**
   - Enables O(1) lookup from file path to module path
   - Required for resolving relative imports
   - Updated AddModule() to maintain bidirectional mapping

2. **Implemented resolveRelativeImport() function**
   - Handles single dot (.) for current package
   - Handles multiple dots (.., ...) for parent/grandparent packages
   - Navigates package hierarchy using module path components
   - Clamps excessive dots to root package level
   - Falls back gracefully when file not in registry

3. **Enhanced processImportFromStatement() for relative imports**
   - Detects relative_import nodes in tree-sitter AST
   - Extracts import_prefix (dots) and optional module suffix
   - Resolves relative paths to absolute module paths before adding to ImportMap

4. **Comprehensive test coverage (94.5% overall)**
   - Unit tests for resolveRelativeImport with various dot counts
   - Integration tests with ExtractImports
   - Tests for deeply nested packages
   - Tests for mixed absolute and relative imports
   - Real fixture files with project structure

Relative Import Examples:
- `from . import utils` → "currentpackage.utils"
- `from .. import config` → "parentpackage.config"
- `from ..utils import helper` → "parentpackage.utils.helper"
- `from ...db import query` → "grandparent.db.query"

Test Fixtures:
- Created myapp/submodule/handler.py with all relative import styles
- Created supporting package structure with __init__.py files
- Tests verify correct resolution across package hierarchy

All tests passing, linting clean, builds successfully.

This is Pass 2 Part B of the 3-pass call graph algorithm.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement call site extraction from AST - Pass 2 Part C

This PR implements call site extraction from Python source code using
tree-sitter AST parsing. It builds on the import resolution work from
PRs #3 and #4 to prepare for call graph construction in Pass 3.

## Changes

### Core Implementation (callsites.go)

1. **ExtractCallSites()**: Main entry point for extracting call sites
   - Parses Python source with tree-sitter
   - Traverses AST to find all call expressions
   - Returns slice of CallSite objects with location information

2. **traverseForCalls()**: Recursive AST traversal
   - Tracks function context while traversing
   - Updates context when entering function definitions
   - Finds and processes call expressions

3. **processCallExpression()**: Call site processing
   - Extracts callee name (function/method being called)
   - Parses arguments (positional and keyword)
   - Creates CallSite with source location
   - Parameters for importMap and caller reserved for Pass 3

4. **extractCalleeName()**: Callee name extraction
   - Handles simple identifiers: foo()
   - Handles attributes: obj.method(), obj.attr.method()
   - Recursively builds dotted names

5. **extractArguments()**: Argument parsing
   - Extracts all positional arguments
   - Preserves keyword arguments as "name=value" in Value field
   - Tracks argument position and variable status

6. **convertArgumentsToSlice()**: Helper for struct conversion
   - Converts []*Argument to []Argument for CallSite struct

### Comprehensive Tests (callsites_test.go)

Created 17 test functions covering:
- Simple function calls: foo(), bar()
- Method calls: obj.method(), self.helper()
- Arguments: positional, keyword, mixed
- Nested calls: foo(bar(x))
- Multiple functions in one file
- Class methods
- Chained calls: obj.method1().method2()
- Module-level calls (no function context)
- Source location tracking
- Empty files
- Complex arguments: expressions, lists, dicts, lambdas
- Nested method calls: obj.attr.method()
- Real file fixture integration

### Test Fixture (simple_calls.py)

Created realistic test file with:
- Function definitions with various call patterns
- Method calls on objects
- Calls with arguments (positional and keyword)
- Nested calls
- Class methods with self references

## Test Coverage

- Overall: 93.3%
- ExtractCallSites: 90.0%
- traverseForCalls: 93.3%
- processCallExpression: 83.3%
- extractCalleeName: 91.7%
- extractArguments: 87.5%
- convertArgumentsToSlice: 100.0%

## Design Decisions

1. **Keyword argument handling**: Store as "name=value" in Value field
   - Tree-sitter provides full keyword_argument node content
   - Preserves complete argument information for later analysis
   - Separating name/value would require additional parsing

2. **Caller context tracking**: Parameter reserved but not used yet
   - Will be populated in Pass 3 during call graph construction
   - Enables linking call sites to their containing functions

3. **Import map parameter**: Reserved for Pass 3 resolution
   - Will be used to resolve qualified names to FQNs
   - Enables cross-file call graph construction

4. **Location tracking**: Store exact position for each call site
   - File, line, column information
   - Enables precise error reporting and code navigation

## Testing Strategy

- Unit tests for each extraction function
- Integration tests with tree-sitter AST
- Real file fixture for end-to-end validation
- Edge cases: empty files, no context, nested structures

## Next Steps (PR #6)

Pass 3 will use this call site data to:
1. Build the complete call graph structure
2. Resolve call targets to function definitions
3. Link caller and callee through edges
4. Handle disambiguation for overloaded names

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement call graph builder - Pass 3

This PR completes the 3-pass algorithm for building Python call graphs
by implementing the final pass that resolves call targets and constructs
the complete graph structure with edges linking callers to callees.

## Changes

### Core Implementation (builder.go)

1. **BuildCallGraph()**: Main entry point for Pass 3
   - Indexes all function definitions from code graph
   - Iterates through all Python files in the registry
   - Extracts imports and call sites for each file
   - Resolves each call site to its target function
   - Builds edges and stores call site details
   - Returns complete CallGraph with all relationships

2. **indexFunctions()**: Function indexing
   - Scans code graph for all function/method definitions
   - Maps each function to its FQN using module registry
   - Populates CallGraph.Functions map for quick lookup

3. **getFunctionsInFile()**: File-scoped function retrieval
   - Filters code graph nodes by file path
   - Returns only function/method definitions in that file
   - Used for finding containing functions of call sites

4. **findContainingFunction()**: Call site parent resolution
   - Determines which function contains a given call site
   - Uses line number comparison with nearest-match algorithm
   - Finds function with highest line number ≤ call line
   - Returns empty string for module-level calls

5. **resolveCallTarget()**: Core resolution logic
   - Handles simple names: sanitize() → myapp.utils.sanitize
   - Handles qualified names: utils.sanitize() → myapp.utils.sanitize
   - Resolves through import maps first
   - Falls back to same-module resolution
   - Validates FQNs against module registry
   - Returns (FQN, resolved bool) tuple

6. **validateFQN()**: FQN validation
   - Checks if a fully qualified name exists in registry
   - Handles both modules and functions within modules
   - Validates parent module for function FQNs

7. **readFileBytes()**: File reading helper
   - Reads source files for parsing
   - Handles absolute path conversion

### Comprehensive Tests (builder_test.go)

Created 15 test functions covering:

**Resolution Tests:**
- Simple imported function resolution
- Qualified import resolution (module.function)
- Same-module function resolution
- Unresolved method calls (obj.method)
- Non-existent function handling

**Validation Tests:**
- Module existence validation
- Function-in-module validation
- Non-existent module handling

**Helper Function Tests:**
- Function indexing from code graph
- Functions-in-file filtering
- Containing function detection with edge cases

**Integration Tests:**
- Simple single-file call graph
- Multi-file call graph with imports
- Real test fixture integration

## Test Coverage

- Overall: 91.8%
- BuildCallGraph: 80.8%
- indexFunctions: 87.5%
- getFunctionsInFile: 100.0%
- findContainingFunction: 100.0%
- resolveCallTarget: 85.0%
- validateFQN: 100.0%
- readFileBytes: 75.0%

## Algorithm Overview

Pass 3 ties together all previous work:

### Pass 1 (PR #2): BuildModuleRegistry
- Maps file paths to module paths
- Enables FQN generation

### Pass 2 (PRs #3-5): Import & Call Site Extraction
- ExtractImports: Maps local names to FQNs
- ExtractCallSites: Finds all function calls in AST

### Pass 3 (This PR): Call Graph Construction
- Resolves call targets using import maps
- Links callers to callees with edges
- Validates resolutions against registry
- Stores detailed call site information

## Resolution Strategy

The resolver uses a multi-step approach:

1. **Simple names** (no dots):
   - Check import map first
   - Fall back to same-module lookup
   - Return unresolved if neither works

2. **Qualified names** (with dots):
   - Split into base + rest
   - Resolve base through imports
   - Append rest to get full FQN
   - Try current module if not imported

3. **Validation**:
   - Check if target exists in registry
   - For functions, validate parent module exists
   - Mark resolution success/failure

## Design Decisions

1. **Containing function detection**:
   - Uses nearest-match algorithm based on line numbers
   - Finds function with highest line number ≤ call line
   - Handles module-level calls by returning empty FQN

2. **Resolution priority**:
   - Import map takes precedence over same-module
   - Explicit imports always respected even if unresolved
   - Same-module only tried when not in imports

3. **Validation vs Resolution**:
   - Resolution finds FQN from imports/context
   - Validation checks if FQN exists in registry
   - Both pieces of information stored in CallSite

4. **Error handling**:
   - Continues processing even if some files fail
   - Marks individual call sites as unresolved
   - Returns partial graph instead of failing completely

## Next Steps

The call graph infrastructure is now complete. Future PRs will:

- PR #7: Add CFG data structures for control flow analysis
- PR #8: Implement pattern matching for security rules
- PR #9: Integrate into main initialization pipeline
- PR #10: Add comprehensive documentation and examples
- PR #11: Performance optimizations (caching, pooling)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: Claude <[email protected]>
shivasurya added a commit that referenced this pull request Oct 29, 2025
…#328)

* feat: Add core data structures for call graph (PR #1)

Add foundational data structures for Python call graph construction:

New Types:
- CallSite: Represents function call locations with arguments and resolution status
- CallGraph: Maps functions to callees with forward/reverse edges
- ModuleRegistry: Maps Python file paths to module paths
- ImportMap: Tracks imports per file for name resolution
- Location: Source code position tracking
- Argument: Function call argument metadata

Features:
- 100% test coverage with comprehensive unit tests
- Bidirectional call graph edges (forward and reverse)
- Support for ambiguous short names in module registry
- Helper functions for module path manipulation

This establishes the foundation for 3-pass call graph algorithm:
- Pass 1 (next PR): Module registry builder
- Pass 2 (next PR): Import extraction and resolution
- Pass 3 (next PR): Call graph construction

Related: Phase 1 - Call Graph Construction & 3-Pass Algorithm

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement module registry - Pass 1 of 3-pass algorithm (PR #2)

Implement the first pass of the call graph construction algorithm: building
a complete registry of Python modules by walking the directory tree.

New Features:
- BuildModuleRegistry: Walks directory tree and maps file paths to module paths
- convertToModulePath: Converts file system paths to Python import paths
- shouldSkipDirectory: Filters out venv, __pycache__, build dirs, etc.

Module Path Conversion:
- Handles regular files: myapp/views.py → myapp.views
- Handles packages: myapp/utils/__init__.py → myapp.utils
- Supports deep nesting: myapp/api/v1/endpoints/users.py → myapp.api.v1.endpoints.users
- Cross-platform: Normalizes Windows/Unix path separators

Performance Optimizations:
- Skips 15+ common non-source directories (venv, __pycache__, .git, dist, build, etc.)
- Avoids scanning thousands of dependency files
- Indexes both full module paths and short names for ambiguity detection

Test Coverage: 93%
- Comprehensive unit tests for all conversion scenarios
- Integration tests with real Python project structure
- Edge case handling: empty dirs, non-Python files, deep nesting, permissions
- Error path testing: walk errors, invalid paths, system errors
- Test fixtures: test-src/python/simple_project/ with realistic structure
- Documented: Remaining 7% are untestable OS-level errors (filepath.Abs failures)

This establishes Pass 1 of 3:
- ✅ Pass 1: Module registry (this PR)
- Next: Pass 2 - Import extraction and resolution
- Next: Pass 3 - Call graph construction

Related: Phase 1 - Call Graph Construction & 3-Pass Algorithm
Base Branch: shiva/callgraph-infra-1 (PR #1)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement import extraction with tree-sitter - Pass 2 Part A

This PR implements comprehensive import extraction for Python code using
tree-sitter AST parsing. It handles all three main import styles:

1. Simple imports: `import module`
2. From imports: `from module import name`
3. Aliased imports: `import module as alias` and `from module import name as alias`

The implementation uses direct AST traversal instead of tree-sitter queries
for better compatibility and control. It properly handles:
- Multiple imports per line (`from json import dumps, loads`)
- Nested module paths (`import xml.etree.ElementTree`)
- Whitespace variations
- Invalid/malformed syntax (fault-tolerant parsing)

Key functions:
- ExtractImports(): Main entry point that parses code and builds ImportMap
- traverseForImports(): Recursively traverses AST to find import statements
- processImportStatement(): Handles simple and aliased imports
- processImportFromStatement(): Handles from-import statements with proper
  module name skipping to avoid duplicate entries

Test coverage: 92.8% overall, 90-95% for import extraction functions

Test fixtures include:
- simple_imports.py: Basic import statements
- from_imports.py: From import statements with multiple names
- aliased_imports.py: Aliased imports (both simple and from)
- mixed_imports.py: Mixed import styles

All tests passing, linting clean, builds successfully.

This is Pass 2 Part A of the 3-pass call graph algorithm.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement relative import resolution - Pass 2 Part B

This PR implements comprehensive relative import resolution for Python using
a 3-pass algorithm. It extends the import extraction system from PR #3 to handle
Python's relative import syntax with dot notation.

Key Changes:

1. **Added FileToModule reverse mapping to ModuleRegistry**
   - Enables O(1) lookup from file path to module path
   - Required for resolving relative imports
   - Updated AddModule() to maintain bidirectional mapping

2. **Implemented resolveRelativeImport() function**
   - Handles single dot (.) for current package
   - Handles multiple dots (.., ...) for parent/grandparent packages
   - Navigates package hierarchy using module path components
   - Clamps excessive dots to root package level
   - Falls back gracefully when file not in registry

3. **Enhanced processImportFromStatement() for relative imports**
   - Detects relative_import nodes in tree-sitter AST
   - Extracts import_prefix (dots) and optional module suffix
   - Resolves relative paths to absolute module paths before adding to ImportMap

4. **Comprehensive test coverage (94.5% overall)**
   - Unit tests for resolveRelativeImport with various dot counts
   - Integration tests with ExtractImports
   - Tests for deeply nested packages
   - Tests for mixed absolute and relative imports
   - Real fixture files with project structure

Relative Import Examples:
- `from . import utils` → "currentpackage.utils"
- `from .. import config` → "parentpackage.config"
- `from ..utils import helper` → "parentpackage.utils.helper"
- `from ...db import query` → "grandparent.db.query"

Test Fixtures:
- Created myapp/submodule/handler.py with all relative import styles
- Created supporting package structure with __init__.py files
- Tests verify correct resolution across package hierarchy

All tests passing, linting clean, builds successfully.

This is Pass 2 Part B of the 3-pass call graph algorithm.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement call site extraction from AST - Pass 2 Part C

This PR implements call site extraction from Python source code using
tree-sitter AST parsing. It builds on the import resolution work from
PRs #3 and #4 to prepare for call graph construction in Pass 3.

## Changes

### Core Implementation (callsites.go)

1. **ExtractCallSites()**: Main entry point for extracting call sites
   - Parses Python source with tree-sitter
   - Traverses AST to find all call expressions
   - Returns slice of CallSite objects with location information

2. **traverseForCalls()**: Recursive AST traversal
   - Tracks function context while traversing
   - Updates context when entering function definitions
   - Finds and processes call expressions

3. **processCallExpression()**: Call site processing
   - Extracts callee name (function/method being called)
   - Parses arguments (positional and keyword)
   - Creates CallSite with source location
   - Parameters for importMap and caller reserved for Pass 3

4. **extractCalleeName()**: Callee name extraction
   - Handles simple identifiers: foo()
   - Handles attributes: obj.method(), obj.attr.method()
   - Recursively builds dotted names

5. **extractArguments()**: Argument parsing
   - Extracts all positional arguments
   - Preserves keyword arguments as "name=value" in Value field
   - Tracks argument position and variable status

6. **convertArgumentsToSlice()**: Helper for struct conversion
   - Converts []*Argument to []Argument for CallSite struct

### Comprehensive Tests (callsites_test.go)

Created 17 test functions covering:
- Simple function calls: foo(), bar()
- Method calls: obj.method(), self.helper()
- Arguments: positional, keyword, mixed
- Nested calls: foo(bar(x))
- Multiple functions in one file
- Class methods
- Chained calls: obj.method1().method2()
- Module-level calls (no function context)
- Source location tracking
- Empty files
- Complex arguments: expressions, lists, dicts, lambdas
- Nested method calls: obj.attr.method()
- Real file fixture integration

### Test Fixture (simple_calls.py)

Created realistic test file with:
- Function definitions with various call patterns
- Method calls on objects
- Calls with arguments (positional and keyword)
- Nested calls
- Class methods with self references

## Test Coverage

- Overall: 93.3%
- ExtractCallSites: 90.0%
- traverseForCalls: 93.3%
- processCallExpression: 83.3%
- extractCalleeName: 91.7%
- extractArguments: 87.5%
- convertArgumentsToSlice: 100.0%

## Design Decisions

1. **Keyword argument handling**: Store as "name=value" in Value field
   - Tree-sitter provides full keyword_argument node content
   - Preserves complete argument information for later analysis
   - Separating name/value would require additional parsing

2. **Caller context tracking**: Parameter reserved but not used yet
   - Will be populated in Pass 3 during call graph construction
   - Enables linking call sites to their containing functions

3. **Import map parameter**: Reserved for Pass 3 resolution
   - Will be used to resolve qualified names to FQNs
   - Enables cross-file call graph construction

4. **Location tracking**: Store exact position for each call site
   - File, line, column information
   - Enables precise error reporting and code navigation

## Testing Strategy

- Unit tests for each extraction function
- Integration tests with tree-sitter AST
- Real file fixture for end-to-end validation
- Edge cases: empty files, no context, nested structures

## Next Steps (PR #6)

Pass 3 will use this call site data to:
1. Build the complete call graph structure
2. Resolve call targets to function definitions
3. Link caller and callee through edges
4. Handle disambiguation for overloaded names

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement call graph builder - Pass 3

This PR completes the 3-pass algorithm for building Python call graphs
by implementing the final pass that resolves call targets and constructs
the complete graph structure with edges linking callers to callees.

## Changes

### Core Implementation (builder.go)

1. **BuildCallGraph()**: Main entry point for Pass 3
   - Indexes all function definitions from code graph
   - Iterates through all Python files in the registry
   - Extracts imports and call sites for each file
   - Resolves each call site to its target function
   - Builds edges and stores call site details
   - Returns complete CallGraph with all relationships

2. **indexFunctions()**: Function indexing
   - Scans code graph for all function/method definitions
   - Maps each function to its FQN using module registry
   - Populates CallGraph.Functions map for quick lookup

3. **getFunctionsInFile()**: File-scoped function retrieval
   - Filters code graph nodes by file path
   - Returns only function/method definitions in that file
   - Used for finding containing functions of call sites

4. **findContainingFunction()**: Call site parent resolution
   - Determines which function contains a given call site
   - Uses line number comparison with nearest-match algorithm
   - Finds function with highest line number ≤ call line
   - Returns empty string for module-level calls

5. **resolveCallTarget()**: Core resolution logic
   - Handles simple names: sanitize() → myapp.utils.sanitize
   - Handles qualified names: utils.sanitize() → myapp.utils.sanitize
   - Resolves through import maps first
   - Falls back to same-module resolution
   - Validates FQNs against module registry
   - Returns (FQN, resolved bool) tuple

6. **validateFQN()**: FQN validation
   - Checks if a fully qualified name exists in registry
   - Handles both modules and functions within modules
   - Validates parent module for function FQNs

7. **readFileBytes()**: File reading helper
   - Reads source files for parsing
   - Handles absolute path conversion

### Comprehensive Tests (builder_test.go)

Created 15 test functions covering:

**Resolution Tests:**
- Simple imported function resolution
- Qualified import resolution (module.function)
- Same-module function resolution
- Unresolved method calls (obj.method)
- Non-existent function handling

**Validation Tests:**
- Module existence validation
- Function-in-module validation
- Non-existent module handling

**Helper Function Tests:**
- Function indexing from code graph
- Functions-in-file filtering
- Containing function detection with edge cases

**Integration Tests:**
- Simple single-file call graph
- Multi-file call graph with imports
- Real test fixture integration

## Test Coverage

- Overall: 91.8%
- BuildCallGraph: 80.8%
- indexFunctions: 87.5%
- getFunctionsInFile: 100.0%
- findContainingFunction: 100.0%
- resolveCallTarget: 85.0%
- validateFQN: 100.0%
- readFileBytes: 75.0%

## Algorithm Overview

Pass 3 ties together all previous work:

### Pass 1 (PR #2): BuildModuleRegistry
- Maps file paths to module paths
- Enables FQN generation

### Pass 2 (PRs #3-5): Import & Call Site Extraction
- ExtractImports: Maps local names to FQNs
- ExtractCallSites: Finds all function calls in AST

### Pass 3 (This PR): Call Graph Construction
- Resolves call targets using import maps
- Links callers to callees with edges
- Validates resolutions against registry
- Stores detailed call site information

## Resolution Strategy

The resolver uses a multi-step approach:

1. **Simple names** (no dots):
   - Check import map first
   - Fall back to same-module lookup
   - Return unresolved if neither works

2. **Qualified names** (with dots):
   - Split into base + rest
   - Resolve base through imports
   - Append rest to get full FQN
   - Try current module if not imported

3. **Validation**:
   - Check if target exists in registry
   - For functions, validate parent module exists
   - Mark resolution success/failure

## Design Decisions

1. **Containing function detection**:
   - Uses nearest-match algorithm based on line numbers
   - Finds function with highest line number ≤ call line
   - Handles module-level calls by returning empty FQN

2. **Resolution priority**:
   - Import map takes precedence over same-module
   - Explicit imports always respected even if unresolved
   - Same-module only tried when not in imports

3. **Validation vs Resolution**:
   - Resolution finds FQN from imports/context
   - Validation checks if FQN exists in registry
   - Both pieces of information stored in CallSite

4. **Error handling**:
   - Continues processing even if some files fail
   - Marks individual call sites as unresolved
   - Returns partial graph instead of failing completely

## Next Steps

The call graph infrastructure is now complete. Future PRs will:

- PR #7: Add CFG data structures for control flow analysis
- PR #8: Implement pattern matching for security rules
- PR #9: Integrate into main initialization pipeline
- PR #10: Add comprehensive documentation and examples
- PR #11: Performance optimizations (caching, pooling)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Create CFG data structures for control flow analysis

This PR implements Control Flow Graph (CFG) data structures to enable
intra-procedural analysis of execution paths through functions. CFGs are
essential for security analysis patterns like taint tracking and detecting
missing sanitization on all paths.

## Changes

### Core Implementation (cfg.go)

1. **BlockType**: Enumeration of basic block types
   - Entry: Function entry point
   - Exit: Function exit point
   - Normal: Sequential execution block
   - Conditional: Branch blocks (if/else)
   - Loop: Loop header blocks (while/for)
   - Switch: Switch/match statement blocks
   - Try/Catch/Finally: Exception handling blocks

2. **BasicBlock**: Represents a single basic block
   - ID: Unique identifier within CFG
   - Type: Block category for analysis
   - StartLine/EndLine: Source code location
   - Instructions: CallSites occurring in this block
   - Successors: Blocks that can execute next
   - Predecessors: Blocks that can execute before
   - Condition: Condition expression (for conditional blocks)
   - Dominators: Blocks that always execute before this one

3. **ControlFlowGraph**: Complete CFG for a function
   - FunctionFQN: Fully qualified function name
   - Blocks: Map of block ID to BasicBlock
   - EntryBlockID/ExitBlockID: Special block identifiers
   - CallGraph: Reference for inter-procedural analysis

4. **CFG Operations**:
   - NewControlFlowGraph(): Creates CFG with entry/exit blocks
   - AddBlock(): Adds basic block to CFG
   - AddEdge(): Connects blocks with control flow edges
   - GetBlock(): Retrieves block by ID
   - GetSuccessors(): Returns successor blocks
   - GetPredecessors(): Returns predecessor blocks

5. **Dominator Analysis**:
   - ComputeDominators(): Calculates dominator sets using iterative data flow
   - IsDominator(): Checks if one block dominates another
   - Used to verify sanitization always occurs before usage

6. **Path Analysis**:
   - GetAllPaths(): Enumerates all execution paths from entry to exit
   - dfsAllPaths(): DFS-based path enumeration
   - Used for exhaustive security analysis

7. **Helper Functions**:
   - intersect(): Set intersection for dominator computation
   - slicesEqual(): Compare string slices for fixed-point detection

### Comprehensive Tests (cfg_test.go)

Created 23 test functions covering:

**Construction Tests:**
- CFG creation with entry/exit blocks
- Basic block creation with all fields
- Block addition to CFG

**Edge Management Tests:**
- Adding edges between blocks
- Duplicate edge handling
- Non-existent block edge handling

**Graph Navigation Tests:**
- Block retrieval by ID
- Successor block retrieval
- Predecessor block retrieval

**Dominator Analysis Tests:**
- Linear CFG dominators (A→B→C)
- Branching CFG dominators (if/else merge)
- Dominator checking

**Path Analysis Tests:**
- All paths in linear CFG
- All paths in branching CFG

**Helper Function Tests:**
- Set intersection operations
- Slice equality checking

**Complex Integration Test:**
- Realistic function CFG with branches
- Multiple blocks and paths
- Dominator relationships verification

## Test Coverage

- Overall: 92.7%
- NewControlFlowGraph: 100.0%
- AddBlock: 100.0%
- AddEdge: 100.0%
- GetBlock: 100.0%
- GetSuccessors: 87.5%
- GetPredecessors: 87.5%
- ComputeDominators: 100.0%
- IsDominator: 75.0%
- GetAllPaths: 100.0%
- dfsAllPaths: 91.7%
- intersect: 100.0%
- slicesEqual: 100.0%

## Design Decisions

1. **Entry/Exit blocks always created**:
   - Simplifies analysis by providing single entry/exit points
   - Standard CFG construction practice

2. **Dominator computation uses iterative algorithm**:
   - Simple fixed-point iteration
   - Converges quickly for most real-world CFGs
   - More efficient than other dominator algorithms for small graphs

3. **Path enumeration with cycle detection**:
   - Avoids infinite loops in cyclic CFGs
   - Uses visited tracking during DFS
   - WARNING: Can be exponential for complex CFGs

4. **Blocks store CallSites as instructions**:
   - Links CFG to call graph for inter-procedural analysis
   - Enables tracking tainted data through function calls

5. **Condition stored as string**:
   - Simple representation for conditional blocks
   - Could be enhanced with AST expression nodes later

## Use Cases

CFGs enable several security analysis patterns:

**Taint Analysis:**
- Track data flow through execution paths
- Detect if tainted data reaches sensitive sinks

**Sanitization Verification:**
- Use dominators to check if sanitization always occurs
- Detect missing sanitization on some paths

**Dead Code Detection:**
- Find unreachable blocks
- Identify code that never executes

**Inter-Procedural Analysis:**
- Combine CFG with call graph
- Track data flow across function boundaries

## Example CFG

```python
def process_user(user_id):
    user = get_user(user_id)        # Block 1 (entry)
    if user.is_admin():              # Block 2 (conditional)
        grant_access()               # Block 3 (true branch)
    else:
        deny_access()                # Block 4 (false branch)
    log_action(user)                 # Block 5 (merge point)
    return                           # Block 6 (exit)
```

CFG Structure:
```
Entry → Block1 → Block2 → Block3 → Block5 → Exit
                       ↘ Block4 ↗
```

Dominators:
- Block1 dominates all blocks (always executes)
- Block2 dominates Block3, Block4, Block5
- Block3 does NOT dominate Block5 (false branch skips it)
- Block4 does NOT dominate Block5 (true branch skips it)

## Next Steps

Future PRs will:
- PR #8: Implement pattern registry for security rules
- Use CFG to detect missing sanitization patterns
- Implement taint tracking across CFG paths
- Combine CFG with call graph for full analysis

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: Claude <[email protected]>
shivasurya added a commit that referenced this pull request Oct 29, 2025
…xample (#329)

* feat: Add core data structures for call graph (PR #1)

Add foundational data structures for Python call graph construction:

New Types:
- CallSite: Represents function call locations with arguments and resolution status
- CallGraph: Maps functions to callees with forward/reverse edges
- ModuleRegistry: Maps Python file paths to module paths
- ImportMap: Tracks imports per file for name resolution
- Location: Source code position tracking
- Argument: Function call argument metadata

Features:
- 100% test coverage with comprehensive unit tests
- Bidirectional call graph edges (forward and reverse)
- Support for ambiguous short names in module registry
- Helper functions for module path manipulation

This establishes the foundation for 3-pass call graph algorithm:
- Pass 1 (next PR): Module registry builder
- Pass 2 (next PR): Import extraction and resolution
- Pass 3 (next PR): Call graph construction

Related: Phase 1 - Call Graph Construction & 3-Pass Algorithm

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement module registry - Pass 1 of 3-pass algorithm (PR #2)

Implement the first pass of the call graph construction algorithm: building
a complete registry of Python modules by walking the directory tree.

New Features:
- BuildModuleRegistry: Walks directory tree and maps file paths to module paths
- convertToModulePath: Converts file system paths to Python import paths
- shouldSkipDirectory: Filters out venv, __pycache__, build dirs, etc.

Module Path Conversion:
- Handles regular files: myapp/views.py → myapp.views
- Handles packages: myapp/utils/__init__.py → myapp.utils
- Supports deep nesting: myapp/api/v1/endpoints/users.py → myapp.api.v1.endpoints.users
- Cross-platform: Normalizes Windows/Unix path separators

Performance Optimizations:
- Skips 15+ common non-source directories (venv, __pycache__, .git, dist, build, etc.)
- Avoids scanning thousands of dependency files
- Indexes both full module paths and short names for ambiguity detection

Test Coverage: 93%
- Comprehensive unit tests for all conversion scenarios
- Integration tests with real Python project structure
- Edge case handling: empty dirs, non-Python files, deep nesting, permissions
- Error path testing: walk errors, invalid paths, system errors
- Test fixtures: test-src/python/simple_project/ with realistic structure
- Documented: Remaining 7% are untestable OS-level errors (filepath.Abs failures)

This establishes Pass 1 of 3:
- ✅ Pass 1: Module registry (this PR)
- Next: Pass 2 - Import extraction and resolution
- Next: Pass 3 - Call graph construction

Related: Phase 1 - Call Graph Construction & 3-Pass Algorithm
Base Branch: shiva/callgraph-infra-1 (PR #1)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement import extraction with tree-sitter - Pass 2 Part A

This PR implements comprehensive import extraction for Python code using
tree-sitter AST parsing. It handles all three main import styles:

1. Simple imports: `import module`
2. From imports: `from module import name`
3. Aliased imports: `import module as alias` and `from module import name as alias`

The implementation uses direct AST traversal instead of tree-sitter queries
for better compatibility and control. It properly handles:
- Multiple imports per line (`from json import dumps, loads`)
- Nested module paths (`import xml.etree.ElementTree`)
- Whitespace variations
- Invalid/malformed syntax (fault-tolerant parsing)

Key functions:
- ExtractImports(): Main entry point that parses code and builds ImportMap
- traverseForImports(): Recursively traverses AST to find import statements
- processImportStatement(): Handles simple and aliased imports
- processImportFromStatement(): Handles from-import statements with proper
  module name skipping to avoid duplicate entries

Test coverage: 92.8% overall, 90-95% for import extraction functions

Test fixtures include:
- simple_imports.py: Basic import statements
- from_imports.py: From import statements with multiple names
- aliased_imports.py: Aliased imports (both simple and from)
- mixed_imports.py: Mixed import styles

All tests passing, linting clean, builds successfully.

This is Pass 2 Part A of the 3-pass call graph algorithm.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement relative import resolution - Pass 2 Part B

This PR implements comprehensive relative import resolution for Python using
a 3-pass algorithm. It extends the import extraction system from PR #3 to handle
Python's relative import syntax with dot notation.

Key Changes:

1. **Added FileToModule reverse mapping to ModuleRegistry**
   - Enables O(1) lookup from file path to module path
   - Required for resolving relative imports
   - Updated AddModule() to maintain bidirectional mapping

2. **Implemented resolveRelativeImport() function**
   - Handles single dot (.) for current package
   - Handles multiple dots (.., ...) for parent/grandparent packages
   - Navigates package hierarchy using module path components
   - Clamps excessive dots to root package level
   - Falls back gracefully when file not in registry

3. **Enhanced processImportFromStatement() for relative imports**
   - Detects relative_import nodes in tree-sitter AST
   - Extracts import_prefix (dots) and optional module suffix
   - Resolves relative paths to absolute module paths before adding to ImportMap

4. **Comprehensive test coverage (94.5% overall)**
   - Unit tests for resolveRelativeImport with various dot counts
   - Integration tests with ExtractImports
   - Tests for deeply nested packages
   - Tests for mixed absolute and relative imports
   - Real fixture files with project structure

Relative Import Examples:
- `from . import utils` → "currentpackage.utils"
- `from .. import config` → "parentpackage.config"
- `from ..utils import helper` → "parentpackage.utils.helper"
- `from ...db import query` → "grandparent.db.query"

Test Fixtures:
- Created myapp/submodule/handler.py with all relative import styles
- Created supporting package structure with __init__.py files
- Tests verify correct resolution across package hierarchy

All tests passing, linting clean, builds successfully.

This is Pass 2 Part B of the 3-pass call graph algorithm.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement call site extraction from AST - Pass 2 Part C

This PR implements call site extraction from Python source code using
tree-sitter AST parsing. It builds on the import resolution work from
PRs #3 and #4 to prepare for call graph construction in Pass 3.

## Changes

### Core Implementation (callsites.go)

1. **ExtractCallSites()**: Main entry point for extracting call sites
   - Parses Python source with tree-sitter
   - Traverses AST to find all call expressions
   - Returns slice of CallSite objects with location information

2. **traverseForCalls()**: Recursive AST traversal
   - Tracks function context while traversing
   - Updates context when entering function definitions
   - Finds and processes call expressions

3. **processCallExpression()**: Call site processing
   - Extracts callee name (function/method being called)
   - Parses arguments (positional and keyword)
   - Creates CallSite with source location
   - Parameters for importMap and caller reserved for Pass 3

4. **extractCalleeName()**: Callee name extraction
   - Handles simple identifiers: foo()
   - Handles attributes: obj.method(), obj.attr.method()
   - Recursively builds dotted names

5. **extractArguments()**: Argument parsing
   - Extracts all positional arguments
   - Preserves keyword arguments as "name=value" in Value field
   - Tracks argument position and variable status

6. **convertArgumentsToSlice()**: Helper for struct conversion
   - Converts []*Argument to []Argument for CallSite struct

### Comprehensive Tests (callsites_test.go)

Created 17 test functions covering:
- Simple function calls: foo(), bar()
- Method calls: obj.method(), self.helper()
- Arguments: positional, keyword, mixed
- Nested calls: foo(bar(x))
- Multiple functions in one file
- Class methods
- Chained calls: obj.method1().method2()
- Module-level calls (no function context)
- Source location tracking
- Empty files
- Complex arguments: expressions, lists, dicts, lambdas
- Nested method calls: obj.attr.method()
- Real file fixture integration

### Test Fixture (simple_calls.py)

Created realistic test file with:
- Function definitions with various call patterns
- Method calls on objects
- Calls with arguments (positional and keyword)
- Nested calls
- Class methods with self references

## Test Coverage

- Overall: 93.3%
- ExtractCallSites: 90.0%
- traverseForCalls: 93.3%
- processCallExpression: 83.3%
- extractCalleeName: 91.7%
- extractArguments: 87.5%
- convertArgumentsToSlice: 100.0%

## Design Decisions

1. **Keyword argument handling**: Store as "name=value" in Value field
   - Tree-sitter provides full keyword_argument node content
   - Preserves complete argument information for later analysis
   - Separating name/value would require additional parsing

2. **Caller context tracking**: Parameter reserved but not used yet
   - Will be populated in Pass 3 during call graph construction
   - Enables linking call sites to their containing functions

3. **Import map parameter**: Reserved for Pass 3 resolution
   - Will be used to resolve qualified names to FQNs
   - Enables cross-file call graph construction

4. **Location tracking**: Store exact position for each call site
   - File, line, column information
   - Enables precise error reporting and code navigation

## Testing Strategy

- Unit tests for each extraction function
- Integration tests with tree-sitter AST
- Real file fixture for end-to-end validation
- Edge cases: empty files, no context, nested structures

## Next Steps (PR #6)

Pass 3 will use this call site data to:
1. Build the complete call graph structure
2. Resolve call targets to function definitions
3. Link caller and callee through edges
4. Handle disambiguation for overloaded names

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement call graph builder - Pass 3

This PR completes the 3-pass algorithm for building Python call graphs
by implementing the final pass that resolves call targets and constructs
the complete graph structure with edges linking callers to callees.

## Changes

### Core Implementation (builder.go)

1. **BuildCallGraph()**: Main entry point for Pass 3
   - Indexes all function definitions from code graph
   - Iterates through all Python files in the registry
   - Extracts imports and call sites for each file
   - Resolves each call site to its target function
   - Builds edges and stores call site details
   - Returns complete CallGraph with all relationships

2. **indexFunctions()**: Function indexing
   - Scans code graph for all function/method definitions
   - Maps each function to its FQN using module registry
   - Populates CallGraph.Functions map for quick lookup

3. **getFunctionsInFile()**: File-scoped function retrieval
   - Filters code graph nodes by file path
   - Returns only function/method definitions in that file
   - Used for finding containing functions of call sites

4. **findContainingFunction()**: Call site parent resolution
   - Determines which function contains a given call site
   - Uses line number comparison with nearest-match algorithm
   - Finds function with highest line number ≤ call line
   - Returns empty string for module-level calls

5. **resolveCallTarget()**: Core resolution logic
   - Handles simple names: sanitize() → myapp.utils.sanitize
   - Handles qualified names: utils.sanitize() → myapp.utils.sanitize
   - Resolves through import maps first
   - Falls back to same-module resolution
   - Validates FQNs against module registry
   - Returns (FQN, resolved bool) tuple

6. **validateFQN()**: FQN validation
   - Checks if a fully qualified name exists in registry
   - Handles both modules and functions within modules
   - Validates parent module for function FQNs

7. **readFileBytes()**: File reading helper
   - Reads source files for parsing
   - Handles absolute path conversion

### Comprehensive Tests (builder_test.go)

Created 15 test functions covering:

**Resolution Tests:**
- Simple imported function resolution
- Qualified import resolution (module.function)
- Same-module function resolution
- Unresolved method calls (obj.method)
- Non-existent function handling

**Validation Tests:**
- Module existence validation
- Function-in-module validation
- Non-existent module handling

**Helper Function Tests:**
- Function indexing from code graph
- Functions-in-file filtering
- Containing function detection with edge cases

**Integration Tests:**
- Simple single-file call graph
- Multi-file call graph with imports
- Real test fixture integration

## Test Coverage

- Overall: 91.8%
- BuildCallGraph: 80.8%
- indexFunctions: 87.5%
- getFunctionsInFile: 100.0%
- findContainingFunction: 100.0%
- resolveCallTarget: 85.0%
- validateFQN: 100.0%
- readFileBytes: 75.0%

## Algorithm Overview

Pass 3 ties together all previous work:

### Pass 1 (PR #2): BuildModuleRegistry
- Maps file paths to module paths
- Enables FQN generation

### Pass 2 (PRs #3-5): Import & Call Site Extraction
- ExtractImports: Maps local names to FQNs
- ExtractCallSites: Finds all function calls in AST

### Pass 3 (This PR): Call Graph Construction
- Resolves call targets using import maps
- Links callers to callees with edges
- Validates resolutions against registry
- Stores detailed call site information

## Resolution Strategy

The resolver uses a multi-step approach:

1. **Simple names** (no dots):
   - Check import map first
   - Fall back to same-module lookup
   - Return unresolved if neither works

2. **Qualified names** (with dots):
   - Split into base + rest
   - Resolve base through imports
   - Append rest to get full FQN
   - Try current module if not imported

3. **Validation**:
   - Check if target exists in registry
   - For functions, validate parent module exists
   - Mark resolution success/failure

## Design Decisions

1. **Containing function detection**:
   - Uses nearest-match algorithm based on line numbers
   - Finds function with highest line number ≤ call line
   - Handles module-level calls by returning empty FQN

2. **Resolution priority**:
   - Import map takes precedence over same-module
   - Explicit imports always respected even if unresolved
   - Same-module only tried when not in imports

3. **Validation vs Resolution**:
   - Resolution finds FQN from imports/context
   - Validation checks if FQN exists in registry
   - Both pieces of information stored in CallSite

4. **Error handling**:
   - Continues processing even if some files fail
   - Marks individual call sites as unresolved
   - Returns partial graph instead of failing completely

## Next Steps

The call graph infrastructure is now complete. Future PRs will:

- PR #7: Add CFG data structures for control flow analysis
- PR #8: Implement pattern matching for security rules
- PR #9: Integrate into main initialization pipeline
- PR #10: Add comprehensive documentation and examples
- PR #11: Performance optimizations (caching, pooling)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Create CFG data structures for control flow analysis

This PR implements Control Flow Graph (CFG) data structures to enable
intra-procedural analysis of execution paths through functions. CFGs are
essential for security analysis patterns like taint tracking and detecting
missing sanitization on all paths.

## Changes

### Core Implementation (cfg.go)

1. **BlockType**: Enumeration of basic block types
   - Entry: Function entry point
   - Exit: Function exit point
   - Normal: Sequential execution block
   - Conditional: Branch blocks (if/else)
   - Loop: Loop header blocks (while/for)
   - Switch: Switch/match statement blocks
   - Try/Catch/Finally: Exception handling blocks

2. **BasicBlock**: Represents a single basic block
   - ID: Unique identifier within CFG
   - Type: Block category for analysis
   - StartLine/EndLine: Source code location
   - Instructions: CallSites occurring in this block
   - Successors: Blocks that can execute next
   - Predecessors: Blocks that can execute before
   - Condition: Condition expression (for conditional blocks)
   - Dominators: Blocks that always execute before this one

3. **ControlFlowGraph**: Complete CFG for a function
   - FunctionFQN: Fully qualified function name
   - Blocks: Map of block ID to BasicBlock
   - EntryBlockID/ExitBlockID: Special block identifiers
   - CallGraph: Reference for inter-procedural analysis

4. **CFG Operations**:
   - NewControlFlowGraph(): Creates CFG with entry/exit blocks
   - AddBlock(): Adds basic block to CFG
   - AddEdge(): Connects blocks with control flow edges
   - GetBlock(): Retrieves block by ID
   - GetSuccessors(): Returns successor blocks
   - GetPredecessors(): Returns predecessor blocks

5. **Dominator Analysis**:
   - ComputeDominators(): Calculates dominator sets using iterative data flow
   - IsDominator(): Checks if one block dominates another
   - Used to verify sanitization always occurs before usage

6. **Path Analysis**:
   - GetAllPaths(): Enumerates all execution paths from entry to exit
   - dfsAllPaths(): DFS-based path enumeration
   - Used for exhaustive security analysis

7. **Helper Functions**:
   - intersect(): Set intersection for dominator computation
   - slicesEqual(): Compare string slices for fixed-point detection

### Comprehensive Tests (cfg_test.go)

Created 23 test functions covering:

**Construction Tests:**
- CFG creation with entry/exit blocks
- Basic block creation with all fields
- Block addition to CFG

**Edge Management Tests:**
- Adding edges between blocks
- Duplicate edge handling
- Non-existent block edge handling

**Graph Navigation Tests:**
- Block retrieval by ID
- Successor block retrieval
- Predecessor block retrieval

**Dominator Analysis Tests:**
- Linear CFG dominators (A→B→C)
- Branching CFG dominators (if/else merge)
- Dominator checking

**Path Analysis Tests:**
- All paths in linear CFG
- All paths in branching CFG

**Helper Function Tests:**
- Set intersection operations
- Slice equality checking

**Complex Integration Test:**
- Realistic function CFG with branches
- Multiple blocks and paths
- Dominator relationships verification

## Test Coverage

- Overall: 92.7%
- NewControlFlowGraph: 100.0%
- AddBlock: 100.0%
- AddEdge: 100.0%
- GetBlock: 100.0%
- GetSuccessors: 87.5%
- GetPredecessors: 87.5%
- ComputeDominators: 100.0%
- IsDominator: 75.0%
- GetAllPaths: 100.0%
- dfsAllPaths: 91.7%
- intersect: 100.0%
- slicesEqual: 100.0%

## Design Decisions

1. **Entry/Exit blocks always created**:
   - Simplifies analysis by providing single entry/exit points
   - Standard CFG construction practice

2. **Dominator computation uses iterative algorithm**:
   - Simple fixed-point iteration
   - Converges quickly for most real-world CFGs
   - More efficient than other dominator algorithms for small graphs

3. **Path enumeration with cycle detection**:
   - Avoids infinite loops in cyclic CFGs
   - Uses visited tracking during DFS
   - WARNING: Can be exponential for complex CFGs

4. **Blocks store CallSites as instructions**:
   - Links CFG to call graph for inter-procedural analysis
   - Enables tracking tainted data through function calls

5. **Condition stored as string**:
   - Simple representation for conditional blocks
   - Could be enhanced with AST expression nodes later

## Use Cases

CFGs enable several security analysis patterns:

**Taint Analysis:**
- Track data flow through execution paths
- Detect if tainted data reaches sensitive sinks

**Sanitization Verification:**
- Use dominators to check if sanitization always occurs
- Detect missing sanitization on some paths

**Dead Code Detection:**
- Find unreachable blocks
- Identify code that never executes

**Inter-Procedural Analysis:**
- Combine CFG with call graph
- Track data flow across function boundaries

## Example CFG

```python
def process_user(user_id):
    user = get_user(user_id)        # Block 1 (entry)
    if user.is_admin():              # Block 2 (conditional)
        grant_access()               # Block 3 (true branch)
    else:
        deny_access()                # Block 4 (false branch)
    log_action(user)                 # Block 5 (merge point)
    return                           # Block 6 (exit)
```

CFG Structure:
```
Entry → Block1 → Block2 → Block3 → Block5 → Exit
                       ↘ Block4 ↗
```

Dominators:
- Block1 dominates all blocks (always executes)
- Block2 dominates Block3, Block4, Block5
- Block3 does NOT dominate Block5 (false branch skips it)
- Block4 does NOT dominate Block5 (true branch skips it)

## Next Steps

Future PRs will:
- PR #8: Implement pattern registry for security rules
- Use CFG to detect missing sanitization patterns
- Implement taint tracking across CFG paths
- Combine CFG with call graph for full analysis

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Add pattern registry with hardcoded code injection example

Implements pattern matching infrastructure for security analysis with one example pattern (code injection via eval). Additional patterns will be loaded from queries in future PRs. Includes pattern types (source-sink, missing-sanitizer, dangerous-function) and matching algorithms with 92.4% test coverage.

---------

Co-authored-by: Claude <[email protected]>
shivasurya added a commit that referenced this pull request Oct 29, 2025
* feat: Add core data structures for call graph (PR #1)

Add foundational data structures for Python call graph construction:

New Types:
- CallSite: Represents function call locations with arguments and resolution status
- CallGraph: Maps functions to callees with forward/reverse edges
- ModuleRegistry: Maps Python file paths to module paths
- ImportMap: Tracks imports per file for name resolution
- Location: Source code position tracking
- Argument: Function call argument metadata

Features:
- 100% test coverage with comprehensive unit tests
- Bidirectional call graph edges (forward and reverse)
- Support for ambiguous short names in module registry
- Helper functions for module path manipulation

This establishes the foundation for 3-pass call graph algorithm:
- Pass 1 (next PR): Module registry builder
- Pass 2 (next PR): Import extraction and resolution
- Pass 3 (next PR): Call graph construction

Related: Phase 1 - Call Graph Construction & 3-Pass Algorithm

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement module registry - Pass 1 of 3-pass algorithm (PR #2)

Implement the first pass of the call graph construction algorithm: building
a complete registry of Python modules by walking the directory tree.

New Features:
- BuildModuleRegistry: Walks directory tree and maps file paths to module paths
- convertToModulePath: Converts file system paths to Python import paths
- shouldSkipDirectory: Filters out venv, __pycache__, build dirs, etc.

Module Path Conversion:
- Handles regular files: myapp/views.py → myapp.views
- Handles packages: myapp/utils/__init__.py → myapp.utils
- Supports deep nesting: myapp/api/v1/endpoints/users.py → myapp.api.v1.endpoints.users
- Cross-platform: Normalizes Windows/Unix path separators

Performance Optimizations:
- Skips 15+ common non-source directories (venv, __pycache__, .git, dist, build, etc.)
- Avoids scanning thousands of dependency files
- Indexes both full module paths and short names for ambiguity detection

Test Coverage: 93%
- Comprehensive unit tests for all conversion scenarios
- Integration tests with real Python project structure
- Edge case handling: empty dirs, non-Python files, deep nesting, permissions
- Error path testing: walk errors, invalid paths, system errors
- Test fixtures: test-src/python/simple_project/ with realistic structure
- Documented: Remaining 7% are untestable OS-level errors (filepath.Abs failures)

This establishes Pass 1 of 3:
- ✅ Pass 1: Module registry (this PR)
- Next: Pass 2 - Import extraction and resolution
- Next: Pass 3 - Call graph construction

Related: Phase 1 - Call Graph Construction & 3-Pass Algorithm
Base Branch: shiva/callgraph-infra-1 (PR #1)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement import extraction with tree-sitter - Pass 2 Part A

This PR implements comprehensive import extraction for Python code using
tree-sitter AST parsing. It handles all three main import styles:

1. Simple imports: `import module`
2. From imports: `from module import name`
3. Aliased imports: `import module as alias` and `from module import name as alias`

The implementation uses direct AST traversal instead of tree-sitter queries
for better compatibility and control. It properly handles:
- Multiple imports per line (`from json import dumps, loads`)
- Nested module paths (`import xml.etree.ElementTree`)
- Whitespace variations
- Invalid/malformed syntax (fault-tolerant parsing)

Key functions:
- ExtractImports(): Main entry point that parses code and builds ImportMap
- traverseForImports(): Recursively traverses AST to find import statements
- processImportStatement(): Handles simple and aliased imports
- processImportFromStatement(): Handles from-import statements with proper
  module name skipping to avoid duplicate entries

Test coverage: 92.8% overall, 90-95% for import extraction functions

Test fixtures include:
- simple_imports.py: Basic import statements
- from_imports.py: From import statements with multiple names
- aliased_imports.py: Aliased imports (both simple and from)
- mixed_imports.py: Mixed import styles

All tests passing, linting clean, builds successfully.

This is Pass 2 Part A of the 3-pass call graph algorithm.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement relative import resolution - Pass 2 Part B

This PR implements comprehensive relative import resolution for Python using
a 3-pass algorithm. It extends the import extraction system from PR #3 to handle
Python's relative import syntax with dot notation.

Key Changes:

1. **Added FileToModule reverse mapping to ModuleRegistry**
   - Enables O(1) lookup from file path to module path
   - Required for resolving relative imports
   - Updated AddModule() to maintain bidirectional mapping

2. **Implemented resolveRelativeImport() function**
   - Handles single dot (.) for current package
   - Handles multiple dots (.., ...) for parent/grandparent packages
   - Navigates package hierarchy using module path components
   - Clamps excessive dots to root package level
   - Falls back gracefully when file not in registry

3. **Enhanced processImportFromStatement() for relative imports**
   - Detects relative_import nodes in tree-sitter AST
   - Extracts import_prefix (dots) and optional module suffix
   - Resolves relative paths to absolute module paths before adding to ImportMap

4. **Comprehensive test coverage (94.5% overall)**
   - Unit tests for resolveRelativeImport with various dot counts
   - Integration tests with ExtractImports
   - Tests for deeply nested packages
   - Tests for mixed absolute and relative imports
   - Real fixture files with project structure

Relative Import Examples:
- `from . import utils` → "currentpackage.utils"
- `from .. import config` → "parentpackage.config"
- `from ..utils import helper` → "parentpackage.utils.helper"
- `from ...db import query` → "grandparent.db.query"

Test Fixtures:
- Created myapp/submodule/handler.py with all relative import styles
- Created supporting package structure with __init__.py files
- Tests verify correct resolution across package hierarchy

All tests passing, linting clean, builds successfully.

This is Pass 2 Part B of the 3-pass call graph algorithm.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement call site extraction from AST - Pass 2 Part C

This PR implements call site extraction from Python source code using
tree-sitter AST parsing. It builds on the import resolution work from
PRs #3 and #4 to prepare for call graph construction in Pass 3.

## Changes

### Core Implementation (callsites.go)

1. **ExtractCallSites()**: Main entry point for extracting call sites
   - Parses Python source with tree-sitter
   - Traverses AST to find all call expressions
   - Returns slice of CallSite objects with location information

2. **traverseForCalls()**: Recursive AST traversal
   - Tracks function context while traversing
   - Updates context when entering function definitions
   - Finds and processes call expressions

3. **processCallExpression()**: Call site processing
   - Extracts callee name (function/method being called)
   - Parses arguments (positional and keyword)
   - Creates CallSite with source location
   - Parameters for importMap and caller reserved for Pass 3

4. **extractCalleeName()**: Callee name extraction
   - Handles simple identifiers: foo()
   - Handles attributes: obj.method(), obj.attr.method()
   - Recursively builds dotted names

5. **extractArguments()**: Argument parsing
   - Extracts all positional arguments
   - Preserves keyword arguments as "name=value" in Value field
   - Tracks argument position and variable status

6. **convertArgumentsToSlice()**: Helper for struct conversion
   - Converts []*Argument to []Argument for CallSite struct

### Comprehensive Tests (callsites_test.go)

Created 17 test functions covering:
- Simple function calls: foo(), bar()
- Method calls: obj.method(), self.helper()
- Arguments: positional, keyword, mixed
- Nested calls: foo(bar(x))
- Multiple functions in one file
- Class methods
- Chained calls: obj.method1().method2()
- Module-level calls (no function context)
- Source location tracking
- Empty files
- Complex arguments: expressions, lists, dicts, lambdas
- Nested method calls: obj.attr.method()
- Real file fixture integration

### Test Fixture (simple_calls.py)

Created realistic test file with:
- Function definitions with various call patterns
- Method calls on objects
- Calls with arguments (positional and keyword)
- Nested calls
- Class methods with self references

## Test Coverage

- Overall: 93.3%
- ExtractCallSites: 90.0%
- traverseForCalls: 93.3%
- processCallExpression: 83.3%
- extractCalleeName: 91.7%
- extractArguments: 87.5%
- convertArgumentsToSlice: 100.0%

## Design Decisions

1. **Keyword argument handling**: Store as "name=value" in Value field
   - Tree-sitter provides full keyword_argument node content
   - Preserves complete argument information for later analysis
   - Separating name/value would require additional parsing

2. **Caller context tracking**: Parameter reserved but not used yet
   - Will be populated in Pass 3 during call graph construction
   - Enables linking call sites to their containing functions

3. **Import map parameter**: Reserved for Pass 3 resolution
   - Will be used to resolve qualified names to FQNs
   - Enables cross-file call graph construction

4. **Location tracking**: Store exact position for each call site
   - File, line, column information
   - Enables precise error reporting and code navigation

## Testing Strategy

- Unit tests for each extraction function
- Integration tests with tree-sitter AST
- Real file fixture for end-to-end validation
- Edge cases: empty files, no context, nested structures

## Next Steps (PR #6)

Pass 3 will use this call site data to:
1. Build the complete call graph structure
2. Resolve call targets to function definitions
3. Link caller and callee through edges
4. Handle disambiguation for overloaded names

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement call graph builder - Pass 3

This PR completes the 3-pass algorithm for building Python call graphs
by implementing the final pass that resolves call targets and constructs
the complete graph structure with edges linking callers to callees.

## Changes

### Core Implementation (builder.go)

1. **BuildCallGraph()**: Main entry point for Pass 3
   - Indexes all function definitions from code graph
   - Iterates through all Python files in the registry
   - Extracts imports and call sites for each file
   - Resolves each call site to its target function
   - Builds edges and stores call site details
   - Returns complete CallGraph with all relationships

2. **indexFunctions()**: Function indexing
   - Scans code graph for all function/method definitions
   - Maps each function to its FQN using module registry
   - Populates CallGraph.Functions map for quick lookup

3. **getFunctionsInFile()**: File-scoped function retrieval
   - Filters code graph nodes by file path
   - Returns only function/method definitions in that file
   - Used for finding containing functions of call sites

4. **findContainingFunction()**: Call site parent resolution
   - Determines which function contains a given call site
   - Uses line number comparison with nearest-match algorithm
   - Finds function with highest line number ≤ call line
   - Returns empty string for module-level calls

5. **resolveCallTarget()**: Core resolution logic
   - Handles simple names: sanitize() → myapp.utils.sanitize
   - Handles qualified names: utils.sanitize() → myapp.utils.sanitize
   - Resolves through import maps first
   - Falls back to same-module resolution
   - Validates FQNs against module registry
   - Returns (FQN, resolved bool) tuple

6. **validateFQN()**: FQN validation
   - Checks if a fully qualified name exists in registry
   - Handles both modules and functions within modules
   - Validates parent module for function FQNs

7. **readFileBytes()**: File reading helper
   - Reads source files for parsing
   - Handles absolute path conversion

### Comprehensive Tests (builder_test.go)

Created 15 test functions covering:

**Resolution Tests:**
- Simple imported function resolution
- Qualified import resolution (module.function)
- Same-module function resolution
- Unresolved method calls (obj.method)
- Non-existent function handling

**Validation Tests:**
- Module existence validation
- Function-in-module validation
- Non-existent module handling

**Helper Function Tests:**
- Function indexing from code graph
- Functions-in-file filtering
- Containing function detection with edge cases

**Integration Tests:**
- Simple single-file call graph
- Multi-file call graph with imports
- Real test fixture integration

## Test Coverage

- Overall: 91.8%
- BuildCallGraph: 80.8%
- indexFunctions: 87.5%
- getFunctionsInFile: 100.0%
- findContainingFunction: 100.0%
- resolveCallTarget: 85.0%
- validateFQN: 100.0%
- readFileBytes: 75.0%

## Algorithm Overview

Pass 3 ties together all previous work:

### Pass 1 (PR #2): BuildModuleRegistry
- Maps file paths to module paths
- Enables FQN generation

### Pass 2 (PRs #3-5): Import & Call Site Extraction
- ExtractImports: Maps local names to FQNs
- ExtractCallSites: Finds all function calls in AST

### Pass 3 (This PR): Call Graph Construction
- Resolves call targets using import maps
- Links callers to callees with edges
- Validates resolutions against registry
- Stores detailed call site information

## Resolution Strategy

The resolver uses a multi-step approach:

1. **Simple names** (no dots):
   - Check import map first
   - Fall back to same-module lookup
   - Return unresolved if neither works

2. **Qualified names** (with dots):
   - Split into base + rest
   - Resolve base through imports
   - Append rest to get full FQN
   - Try current module if not imported

3. **Validation**:
   - Check if target exists in registry
   - For functions, validate parent module exists
   - Mark resolution success/failure

## Design Decisions

1. **Containing function detection**:
   - Uses nearest-match algorithm based on line numbers
   - Finds function with highest line number ≤ call line
   - Handles module-level calls by returning empty FQN

2. **Resolution priority**:
   - Import map takes precedence over same-module
   - Explicit imports always respected even if unresolved
   - Same-module only tried when not in imports

3. **Validation vs Resolution**:
   - Resolution finds FQN from imports/context
   - Validation checks if FQN exists in registry
   - Both pieces of information stored in CallSite

4. **Error handling**:
   - Continues processing even if some files fail
   - Marks individual call sites as unresolved
   - Returns partial graph instead of failing completely

## Next Steps

The call graph infrastructure is now complete. Future PRs will:

- PR #7: Add CFG data structures for control flow analysis
- PR #8: Implement pattern matching for security rules
- PR #9: Integrate into main initialization pipeline
- PR #10: Add comprehensive documentation and examples
- PR #11: Performance optimizations (caching, pooling)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Create CFG data structures for control flow analysis

This PR implements Control Flow Graph (CFG) data structures to enable
intra-procedural analysis of execution paths through functions. CFGs are
essential for security analysis patterns like taint tracking and detecting
missing sanitization on all paths.

## Changes

### Core Implementation (cfg.go)

1. **BlockType**: Enumeration of basic block types
   - Entry: Function entry point
   - Exit: Function exit point
   - Normal: Sequential execution block
   - Conditional: Branch blocks (if/else)
   - Loop: Loop header blocks (while/for)
   - Switch: Switch/match statement blocks
   - Try/Catch/Finally: Exception handling blocks

2. **BasicBlock**: Represents a single basic block
   - ID: Unique identifier within CFG
   - Type: Block category for analysis
   - StartLine/EndLine: Source code location
   - Instructions: CallSites occurring in this block
   - Successors: Blocks that can execute next
   - Predecessors: Blocks that can execute before
   - Condition: Condition expression (for conditional blocks)
   - Dominators: Blocks that always execute before this one

3. **ControlFlowGraph**: Complete CFG for a function
   - FunctionFQN: Fully qualified function name
   - Blocks: Map of block ID to BasicBlock
   - EntryBlockID/ExitBlockID: Special block identifiers
   - CallGraph: Reference for inter-procedural analysis

4. **CFG Operations**:
   - NewControlFlowGraph(): Creates CFG with entry/exit blocks
   - AddBlock(): Adds basic block to CFG
   - AddEdge(): Connects blocks with control flow edges
   - GetBlock(): Retrieves block by ID
   - GetSuccessors(): Returns successor blocks
   - GetPredecessors(): Returns predecessor blocks

5. **Dominator Analysis**:
   - ComputeDominators(): Calculates dominator sets using iterative data flow
   - IsDominator(): Checks if one block dominates another
   - Used to verify sanitization always occurs before usage

6. **Path Analysis**:
   - GetAllPaths(): Enumerates all execution paths from entry to exit
   - dfsAllPaths(): DFS-based path enumeration
   - Used for exhaustive security analysis

7. **Helper Functions**:
   - intersect(): Set intersection for dominator computation
   - slicesEqual(): Compare string slices for fixed-point detection

### Comprehensive Tests (cfg_test.go)

Created 23 test functions covering:

**Construction Tests:**
- CFG creation with entry/exit blocks
- Basic block creation with all fields
- Block addition to CFG

**Edge Management Tests:**
- Adding edges between blocks
- Duplicate edge handling
- Non-existent block edge handling

**Graph Navigation Tests:**
- Block retrieval by ID
- Successor block retrieval
- Predecessor block retrieval

**Dominator Analysis Tests:**
- Linear CFG dominators (A→B→C)
- Branching CFG dominators (if/else merge)
- Dominator checking

**Path Analysis Tests:**
- All paths in linear CFG
- All paths in branching CFG

**Helper Function Tests:**
- Set intersection operations
- Slice equality checking

**Complex Integration Test:**
- Realistic function CFG with branches
- Multiple blocks and paths
- Dominator relationships verification

## Test Coverage

- Overall: 92.7%
- NewControlFlowGraph: 100.0%
- AddBlock: 100.0%
- AddEdge: 100.0%
- GetBlock: 100.0%
- GetSuccessors: 87.5%
- GetPredecessors: 87.5%
- ComputeDominators: 100.0%
- IsDominator: 75.0%
- GetAllPaths: 100.0%
- dfsAllPaths: 91.7%
- intersect: 100.0%
- slicesEqual: 100.0%

## Design Decisions

1. **Entry/Exit blocks always created**:
   - Simplifies analysis by providing single entry/exit points
   - Standard CFG construction practice

2. **Dominator computation uses iterative algorithm**:
   - Simple fixed-point iteration
   - Converges quickly for most real-world CFGs
   - More efficient than other dominator algorithms for small graphs

3. **Path enumeration with cycle detection**:
   - Avoids infinite loops in cyclic CFGs
   - Uses visited tracking during DFS
   - WARNING: Can be exponential for complex CFGs

4. **Blocks store CallSites as instructions**:
   - Links CFG to call graph for inter-procedural analysis
   - Enables tracking tainted data through function calls

5. **Condition stored as string**:
   - Simple representation for conditional blocks
   - Could be enhanced with AST expression nodes later

## Use Cases

CFGs enable several security analysis patterns:

**Taint Analysis:**
- Track data flow through execution paths
- Detect if tainted data reaches sensitive sinks

**Sanitization Verification:**
- Use dominators to check if sanitization always occurs
- Detect missing sanitization on some paths

**Dead Code Detection:**
- Find unreachable blocks
- Identify code that never executes

**Inter-Procedural Analysis:**
- Combine CFG with call graph
- Track data flow across function boundaries

## Example CFG

```python
def process_user(user_id):
    user = get_user(user_id)        # Block 1 (entry)
    if user.is_admin():              # Block 2 (conditional)
        grant_access()               # Block 3 (true branch)
    else:
        deny_access()                # Block 4 (false branch)
    log_action(user)                 # Block 5 (merge point)
    return                           # Block 6 (exit)
```

CFG Structure:
```
Entry → Block1 → Block2 → Block3 → Block5 → Exit
                       ↘ Block4 ↗
```

Dominators:
- Block1 dominates all blocks (always executes)
- Block2 dominates Block3, Block4, Block5
- Block3 does NOT dominate Block5 (false branch skips it)
- Block4 does NOT dominate Block5 (true branch skips it)

## Next Steps

Future PRs will:
- PR #8: Implement pattern registry for security rules
- Use CFG to detect missing sanitization patterns
- Implement taint tracking across CFG paths
- Combine CFG with call graph for full analysis

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Add pattern registry with hardcoded code injection example

Implements pattern matching infrastructure for security analysis with one example pattern (code injection via eval). Additional patterns will be loaded from queries in future PRs. Includes pattern types (source-sink, missing-sanitizer, dangerous-function) and matching algorithms with 92.4% test coverage.

* feat: Integrate call graph into initialization pipeline

Adds InitializeCallGraph() to wire together the 3-pass algorithm (module registry, call graph building, pattern loading) and AnalyzePatterns() for security pattern detection. Includes end-to-end integration tests with 92.6% coverage.

* add callgraph integration

* chore: comment the debugging code

* cpf/enhancement: Benchmark suite test for callgraph (#331)

* feat: Add comprehensive benchmark suite for performance testing

This commit adds a complete benchmark suite to measure performance across
small, medium, and large Python projects. The benchmarks establish baseline
metrics for future optimization work.

Changes:
- Add benchmark_test.go with benchmarks for:
  * Module registry building (Pass 1)
  * Import extraction (Pass 2A)
  * Call site extraction (Pass 2B)
  * Call target resolution
  * Pattern matching
- Test against 3 real-world codebases:
  * Small: simple_project (~5 files)
  * Medium: label-studio (~1000 files)
  * Large: salt (~10,000 files)
- Fix patterns_test.go assertions for PatternMatchDetails return type
- Fix godot lint errors in builder.go

Baseline Performance Results (Apple M2 Max, 5 iterations):
- BuildModuleRegistry_Small: 80µs (target: <10ms) ✓
- BuildModuleRegistry_Medium: 6.5ms (target: <500ms) ✓
- BuildModuleRegistry_Large: 3.3ms (target: <2s) ✓
- ExtractImports_Small: 101µs (target: <20ms) ✓
- ExtractImports_Medium: 433ms (target: <2s) ✓
- ExtractCallSites_Small: 91µs (target: <30ms) ✓
- ResolveCallTarget: 533ns (target: <1µs) ✓

All benchmarks meet performance targets. Medium/Large project benchmarks
are skipped by default to keep CI fast. Enable manually with:
  go test -bench=Medium -run=^$
  go test -bench=Large -run=^$

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Add ImportMap caching with sync.RWMutex for performance

This commit implements thread-safe caching of ImportMap instances to avoid
re-parsing imports from the same file multiple times. This provides significant
performance improvements when the same imports are needed repeatedly.

Changes:
- Add ImportMapCache struct with RWMutex-protected cache map
- Implement Get(), Put(), and GetOrExtract() cache methods
- Update BuildCallGraph to use import caching
- Add comprehensive cache_test.go with:
  * Basic CRUD operations tests
  * Cache hit/miss scenarios
  * Concurrent access safety tests
  * Performance benchmarks

Performance characteristics:
- Get operation: O(1) with read lock (allows concurrent reads)
- Put operation: O(1) with write lock (exclusive access)
- Thread-safe for concurrent access from multiple goroutines
- Cache hit avoids expensive tree-sitter parsing

Test coverage:
- NewImportMapCache: 100%
- Get: 100%
- Put: 100%
- GetOrExtract: 85.7%
- All tests pass including concurrent access tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix: Correct matchesFunctionName test expectations

The test was incorrectly expecting 'evaluation' to match 'eval' via
substring matching, but the implementation correctly only supports:
- Exact matches: 'eval' == 'eval'
- Suffix matches: 'myapp.utils.eval' ends with '.eval'
- Prefix matches: 'request.GET.get' starts with 'request.GET.'

This prevents false positives like matching 'evaluation' to 'eval'.

Updated test case to expect false for 'evaluation' vs 'eval' match.
All tests now pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix: Update main_test.go to include analyze command in expected output

The analyze command was added in a previous commit (cmd/analyze.go) but the
main_test.go wasn't updated to reflect this new command in the help output.

This caused TestExecute/Successful_execution to fail because it expected
the old command list without 'analyze'.

Updated expected output to include:
  analyze     Analyze source code for security vulnerabilities using call graph

All tests now pass with gradle testGo.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feature: add diagnostic report command for callgraph resolution

* cpf/enhancement: added resolution for framework and its corresponding support (#332)

* feature: added resolution for framework and its corresponding support

* chore: fixed lint issues

---------

Co-authored-by: Claude <[email protected]>

---------

Co-authored-by: Claude <[email protected]>
shivasurya added a commit that referenced this pull request Oct 29, 2025
* feat: Add core data structures for call graph (PR #1)

Add foundational data structures for Python call graph construction:

New Types:
- CallSite: Represents function call locations with arguments and resolution status
- CallGraph: Maps functions to callees with forward/reverse edges
- ModuleRegistry: Maps Python file paths to module paths
- ImportMap: Tracks imports per file for name resolution
- Location: Source code position tracking
- Argument: Function call argument metadata

Features:
- 100% test coverage with comprehensive unit tests
- Bidirectional call graph edges (forward and reverse)
- Support for ambiguous short names in module registry
- Helper functions for module path manipulation

This establishes the foundation for 3-pass call graph algorithm:
- Pass 1 (next PR): Module registry builder
- Pass 2 (next PR): Import extraction and resolution
- Pass 3 (next PR): Call graph construction

Related: Phase 1 - Call Graph Construction & 3-Pass Algorithm

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement module registry - Pass 1 of 3-pass algorithm (PR #2)

Implement the first pass of the call graph construction algorithm: building
a complete registry of Python modules by walking the directory tree.

New Features:
- BuildModuleRegistry: Walks directory tree and maps file paths to module paths
- convertToModulePath: Converts file system paths to Python import paths
- shouldSkipDirectory: Filters out venv, __pycache__, build dirs, etc.

Module Path Conversion:
- Handles regular files: myapp/views.py → myapp.views
- Handles packages: myapp/utils/__init__.py → myapp.utils
- Supports deep nesting: myapp/api/v1/endpoints/users.py → myapp.api.v1.endpoints.users
- Cross-platform: Normalizes Windows/Unix path separators

Performance Optimizations:
- Skips 15+ common non-source directories (venv, __pycache__, .git, dist, build, etc.)
- Avoids scanning thousands of dependency files
- Indexes both full module paths and short names for ambiguity detection

Test Coverage: 93%
- Comprehensive unit tests for all conversion scenarios
- Integration tests with real Python project structure
- Edge case handling: empty dirs, non-Python files, deep nesting, permissions
- Error path testing: walk errors, invalid paths, system errors
- Test fixtures: test-src/python/simple_project/ with realistic structure
- Documented: Remaining 7% are untestable OS-level errors (filepath.Abs failures)

This establishes Pass 1 of 3:
- ✅ Pass 1: Module registry (this PR)
- Next: Pass 2 - Import extraction and resolution
- Next: Pass 3 - Call graph construction

Related: Phase 1 - Call Graph Construction & 3-Pass Algorithm
Base Branch: shiva/callgraph-infra-1 (PR #1)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement import extraction with tree-sitter - Pass 2 Part A

This PR implements comprehensive import extraction for Python code using
tree-sitter AST parsing. It handles all three main import styles:

1. Simple imports: `import module`
2. From imports: `from module import name`
3. Aliased imports: `import module as alias` and `from module import name as alias`

The implementation uses direct AST traversal instead of tree-sitter queries
for better compatibility and control. It properly handles:
- Multiple imports per line (`from json import dumps, loads`)
- Nested module paths (`import xml.etree.ElementTree`)
- Whitespace variations
- Invalid/malformed syntax (fault-tolerant parsing)

Key functions:
- ExtractImports(): Main entry point that parses code and builds ImportMap
- traverseForImports(): Recursively traverses AST to find import statements
- processImportStatement(): Handles simple and aliased imports
- processImportFromStatement(): Handles from-import statements with proper
  module name skipping to avoid duplicate entries

Test coverage: 92.8% overall, 90-95% for import extraction functions

Test fixtures include:
- simple_imports.py: Basic import statements
- from_imports.py: From import statements with multiple names
- aliased_imports.py: Aliased imports (both simple and from)
- mixed_imports.py: Mixed import styles

All tests passing, linting clean, builds successfully.

This is Pass 2 Part A of the 3-pass call graph algorithm.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement relative import resolution - Pass 2 Part B

This PR implements comprehensive relative import resolution for Python using
a 3-pass algorithm. It extends the import extraction system from PR #3 to handle
Python's relative import syntax with dot notation.

Key Changes:

1. **Added FileToModule reverse mapping to ModuleRegistry**
   - Enables O(1) lookup from file path to module path
   - Required for resolving relative imports
   - Updated AddModule() to maintain bidirectional mapping

2. **Implemented resolveRelativeImport() function**
   - Handles single dot (.) for current package
   - Handles multiple dots (.., ...) for parent/grandparent packages
   - Navigates package hierarchy using module path components
   - Clamps excessive dots to root package level
   - Falls back gracefully when file not in registry

3. **Enhanced processImportFromStatement() for relative imports**
   - Detects relative_import nodes in tree-sitter AST
   - Extracts import_prefix (dots) and optional module suffix
   - Resolves relative paths to absolute module paths before adding to ImportMap

4. **Comprehensive test coverage (94.5% overall)**
   - Unit tests for resolveRelativeImport with various dot counts
   - Integration tests with ExtractImports
   - Tests for deeply nested packages
   - Tests for mixed absolute and relative imports
   - Real fixture files with project structure

Relative Import Examples:
- `from . import utils` → "currentpackage.utils"
- `from .. import config` → "parentpackage.config"
- `from ..utils import helper` → "parentpackage.utils.helper"
- `from ...db import query` → "grandparent.db.query"

Test Fixtures:
- Created myapp/submodule/handler.py with all relative import styles
- Created supporting package structure with __init__.py files
- Tests verify correct resolution across package hierarchy

All tests passing, linting clean, builds successfully.

This is Pass 2 Part B of the 3-pass call graph algorithm.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement call site extraction from AST - Pass 2 Part C

This PR implements call site extraction from Python source code using
tree-sitter AST parsing. It builds on the import resolution work from
PRs #3 and #4 to prepare for call graph construction in Pass 3.

## Changes

### Core Implementation (callsites.go)

1. **ExtractCallSites()**: Main entry point for extracting call sites
   - Parses Python source with tree-sitter
   - Traverses AST to find all call expressions
   - Returns slice of CallSite objects with location information

2. **traverseForCalls()**: Recursive AST traversal
   - Tracks function context while traversing
   - Updates context when entering function definitions
   - Finds and processes call expressions

3. **processCallExpression()**: Call site processing
   - Extracts callee name (function/method being called)
   - Parses arguments (positional and keyword)
   - Creates CallSite with source location
   - Parameters for importMap and caller reserved for Pass 3

4. **extractCalleeName()**: Callee name extraction
   - Handles simple identifiers: foo()
   - Handles attributes: obj.method(), obj.attr.method()
   - Recursively builds dotted names

5. **extractArguments()**: Argument parsing
   - Extracts all positional arguments
   - Preserves keyword arguments as "name=value" in Value field
   - Tracks argument position and variable status

6. **convertArgumentsToSlice()**: Helper for struct conversion
   - Converts []*Argument to []Argument for CallSite struct

### Comprehensive Tests (callsites_test.go)

Created 17 test functions covering:
- Simple function calls: foo(), bar()
- Method calls: obj.method(), self.helper()
- Arguments: positional, keyword, mixed
- Nested calls: foo(bar(x))
- Multiple functions in one file
- Class methods
- Chained calls: obj.method1().method2()
- Module-level calls (no function context)
- Source location tracking
- Empty files
- Complex arguments: expressions, lists, dicts, lambdas
- Nested method calls: obj.attr.method()
- Real file fixture integration

### Test Fixture (simple_calls.py)

Created realistic test file with:
- Function definitions with various call patterns
- Method calls on objects
- Calls with arguments (positional and keyword)
- Nested calls
- Class methods with self references

## Test Coverage

- Overall: 93.3%
- ExtractCallSites: 90.0%
- traverseForCalls: 93.3%
- processCallExpression: 83.3%
- extractCalleeName: 91.7%
- extractArguments: 87.5%
- convertArgumentsToSlice: 100.0%

## Design Decisions

1. **Keyword argument handling**: Store as "name=value" in Value field
   - Tree-sitter provides full keyword_argument node content
   - Preserves complete argument information for later analysis
   - Separating name/value would require additional parsing

2. **Caller context tracking**: Parameter reserved but not used yet
   - Will be populated in Pass 3 during call graph construction
   - Enables linking call sites to their containing functions

3. **Import map parameter**: Reserved for Pass 3 resolution
   - Will be used to resolve qualified names to FQNs
   - Enables cross-file call graph construction

4. **Location tracking**: Store exact position for each call site
   - File, line, column information
   - Enables precise error reporting and code navigation

## Testing Strategy

- Unit tests for each extraction function
- Integration tests with tree-sitter AST
- Real file fixture for end-to-end validation
- Edge cases: empty files, no context, nested structures

## Next Steps (PR #6)

Pass 3 will use this call site data to:
1. Build the complete call graph structure
2. Resolve call targets to function definitions
3. Link caller and callee through edges
4. Handle disambiguation for overloaded names

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Implement call graph builder - Pass 3

This PR completes the 3-pass algorithm for building Python call graphs
by implementing the final pass that resolves call targets and constructs
the complete graph structure with edges linking callers to callees.

## Changes

### Core Implementation (builder.go)

1. **BuildCallGraph()**: Main entry point for Pass 3
   - Indexes all function definitions from code graph
   - Iterates through all Python files in the registry
   - Extracts imports and call sites for each file
   - Resolves each call site to its target function
   - Builds edges and stores call site details
   - Returns complete CallGraph with all relationships

2. **indexFunctions()**: Function indexing
   - Scans code graph for all function/method definitions
   - Maps each function to its FQN using module registry
   - Populates CallGraph.Functions map for quick lookup

3. **getFunctionsInFile()**: File-scoped function retrieval
   - Filters code graph nodes by file path
   - Returns only function/method definitions in that file
   - Used for finding containing functions of call sites

4. **findContainingFunction()**: Call site parent resolution
   - Determines which function contains a given call site
   - Uses line number comparison with nearest-match algorithm
   - Finds function with highest line number ≤ call line
   - Returns empty string for module-level calls

5. **resolveCallTarget()**: Core resolution logic
   - Handles simple names: sanitize() → myapp.utils.sanitize
   - Handles qualified names: utils.sanitize() → myapp.utils.sanitize
   - Resolves through import maps first
   - Falls back to same-module resolution
   - Validates FQNs against module registry
   - Returns (FQN, resolved bool) tuple

6. **validateFQN()**: FQN validation
   - Checks if a fully qualified name exists in registry
   - Handles both modules and functions within modules
   - Validates parent module for function FQNs

7. **readFileBytes()**: File reading helper
   - Reads source files for parsing
   - Handles absolute path conversion

### Comprehensive Tests (builder_test.go)

Created 15 test functions covering:

**Resolution Tests:**
- Simple imported function resolution
- Qualified import resolution (module.function)
- Same-module function resolution
- Unresolved method calls (obj.method)
- Non-existent function handling

**Validation Tests:**
- Module existence validation
- Function-in-module validation
- Non-existent module handling

**Helper Function Tests:**
- Function indexing from code graph
- Functions-in-file filtering
- Containing function detection with edge cases

**Integration Tests:**
- Simple single-file call graph
- Multi-file call graph with imports
- Real test fixture integration

## Test Coverage

- Overall: 91.8%
- BuildCallGraph: 80.8%
- indexFunctions: 87.5%
- getFunctionsInFile: 100.0%
- findContainingFunction: 100.0%
- resolveCallTarget: 85.0%
- validateFQN: 100.0%
- readFileBytes: 75.0%

## Algorithm Overview

Pass 3 ties together all previous work:

### Pass 1 (PR #2): BuildModuleRegistry
- Maps file paths to module paths
- Enables FQN generation

### Pass 2 (PRs #3-5): Import & Call Site Extraction
- ExtractImports: Maps local names to FQNs
- ExtractCallSites: Finds all function calls in AST

### Pass 3 (This PR): Call Graph Construction
- Resolves call targets using import maps
- Links callers to callees with edges
- Validates resolutions against registry
- Stores detailed call site information

## Resolution Strategy

The resolver uses a multi-step approach:

1. **Simple names** (no dots):
   - Check import map first
   - Fall back to same-module lookup
   - Return unresolved if neither works

2. **Qualified names** (with dots):
   - Split into base + rest
   - Resolve base through imports
   - Append rest to get full FQN
   - Try current module if not imported

3. **Validation**:
   - Check if target exists in registry
   - For functions, validate parent module exists
   - Mark resolution success/failure

## Design Decisions

1. **Containing function detection**:
   - Uses nearest-match algorithm based on line numbers
   - Finds function with highest line number ≤ call line
   - Handles module-level calls by returning empty FQN

2. **Resolution priority**:
   - Import map takes precedence over same-module
   - Explicit imports always respected even if unresolved
   - Same-module only tried when not in imports

3. **Validation vs Resolution**:
   - Resolution finds FQN from imports/context
   - Validation checks if FQN exists in registry
   - Both pieces of information stored in CallSite

4. **Error handling**:
   - Continues processing even if some files fail
   - Marks individual call sites as unresolved
   - Returns partial graph instead of failing completely

## Next Steps

The call graph infrastructure is now complete. Future PRs will:

- PR #7: Add CFG data structures for control flow analysis
- PR #8: Implement pattern matching for security rules
- PR #9: Integrate into main initialization pipeline
- PR #10: Add comprehensive documentation and examples
- PR #11: Performance optimizations (caching, pooling)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Create CFG data structures for control flow analysis

This PR implements Control Flow Graph (CFG) data structures to enable
intra-procedural analysis of execution paths through functions. CFGs are
essential for security analysis patterns like taint tracking and detecting
missing sanitization on all paths.

## Changes

### Core Implementation (cfg.go)

1. **BlockType**: Enumeration of basic block types
   - Entry: Function entry point
   - Exit: Function exit point
   - Normal: Sequential execution block
   - Conditional: Branch blocks (if/else)
   - Loop: Loop header blocks (while/for)
   - Switch: Switch/match statement blocks
   - Try/Catch/Finally: Exception handling blocks

2. **BasicBlock**: Represents a single basic block
   - ID: Unique identifier within CFG
   - Type: Block category for analysis
   - StartLine/EndLine: Source code location
   - Instructions: CallSites occurring in this block
   - Successors: Blocks that can execute next
   - Predecessors: Blocks that can execute before
   - Condition: Condition expression (for conditional blocks)
   - Dominators: Blocks that always execute before this one

3. **ControlFlowGraph**: Complete CFG for a function
   - FunctionFQN: Fully qualified function name
   - Blocks: Map of block ID to BasicBlock
   - EntryBlockID/ExitBlockID: Special block identifiers
   - CallGraph: Reference for inter-procedural analysis

4. **CFG Operations**:
   - NewControlFlowGraph(): Creates CFG with entry/exit blocks
   - AddBlock(): Adds basic block to CFG
   - AddEdge(): Connects blocks with control flow edges
   - GetBlock(): Retrieves block by ID
   - GetSuccessors(): Returns successor blocks
   - GetPredecessors(): Returns predecessor blocks

5. **Dominator Analysis**:
   - ComputeDominators(): Calculates dominator sets using iterative data flow
   - IsDominator(): Checks if one block dominates another
   - Used to verify sanitization always occurs before usage

6. **Path Analysis**:
   - GetAllPaths(): Enumerates all execution paths from entry to exit
   - dfsAllPaths(): DFS-based path enumeration
   - Used for exhaustive security analysis

7. **Helper Functions**:
   - intersect(): Set intersection for dominator computation
   - slicesEqual(): Compare string slices for fixed-point detection

### Comprehensive Tests (cfg_test.go)

Created 23 test functions covering:

**Construction Tests:**
- CFG creation with entry/exit blocks
- Basic block creation with all fields
- Block addition to CFG

**Edge Management Tests:**
- Adding edges between blocks
- Duplicate edge handling
- Non-existent block edge handling

**Graph Navigation Tests:**
- Block retrieval by ID
- Successor block retrieval
- Predecessor block retrieval

**Dominator Analysis Tests:**
- Linear CFG dominators (A→B→C)
- Branching CFG dominators (if/else merge)
- Dominator checking

**Path Analysis Tests:**
- All paths in linear CFG
- All paths in branching CFG

**Helper Function Tests:**
- Set intersection operations
- Slice equality checking

**Complex Integration Test:**
- Realistic function CFG with branches
- Multiple blocks and paths
- Dominator relationships verification

## Test Coverage

- Overall: 92.7%
- NewControlFlowGraph: 100.0%
- AddBlock: 100.0%
- AddEdge: 100.0%
- GetBlock: 100.0%
- GetSuccessors: 87.5%
- GetPredecessors: 87.5%
- ComputeDominators: 100.0%
- IsDominator: 75.0%
- GetAllPaths: 100.0%
- dfsAllPaths: 91.7%
- intersect: 100.0%
- slicesEqual: 100.0%

## Design Decisions

1. **Entry/Exit blocks always created**:
   - Simplifies analysis by providing single entry/exit points
   - Standard CFG construction practice

2. **Dominator computation uses iterative algorithm**:
   - Simple fixed-point iteration
   - Converges quickly for most real-world CFGs
   - More efficient than other dominator algorithms for small graphs

3. **Path enumeration with cycle detection**:
   - Avoids infinite loops in cyclic CFGs
   - Uses visited tracking during DFS
   - WARNING: Can be exponential for complex CFGs

4. **Blocks store CallSites as instructions**:
   - Links CFG to call graph for inter-procedural analysis
   - Enables tracking tainted data through function calls

5. **Condition stored as string**:
   - Simple representation for conditional blocks
   - Could be enhanced with AST expression nodes later

## Use Cases

CFGs enable several security analysis patterns:

**Taint Analysis:**
- Track data flow through execution paths
- Detect if tainted data reaches sensitive sinks

**Sanitization Verification:**
- Use dominators to check if sanitization always occurs
- Detect missing sanitization on some paths

**Dead Code Detection:**
- Find unreachable blocks
- Identify code that never executes

**Inter-Procedural Analysis:**
- Combine CFG with call graph
- Track data flow across function boundaries

## Example CFG

```python
def process_user(user_id):
    user = get_user(user_id)        # Block 1 (entry)
    if user.is_admin():              # Block 2 (conditional)
        grant_access()               # Block 3 (true branch)
    else:
        deny_access()                # Block 4 (false branch)
    log_action(user)                 # Block 5 (merge point)
    return                           # Block 6 (exit)
```

CFG Structure:
```
Entry → Block1 → Block2 → Block3 → Block5 → Exit
                       ↘ Block4 ↗
```

Dominators:
- Block1 dominates all blocks (always executes)
- Block2 dominates Block3, Block4, Block5
- Block3 does NOT dominate Block5 (false branch skips it)
- Block4 does NOT dominate Block5 (true branch skips it)

## Next Steps

Future PRs will:
- PR #8: Implement pattern registry for security rules
- Use CFG to detect missing sanitization patterns
- Implement taint tracking across CFG paths
- Combine CFG with call graph for full analysis

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Add pattern registry with hardcoded code injection example

Implements pattern matching infrastructure for security analysis with one example pattern (code injection via eval). Additional patterns will be loaded from queries in future PRs. Includes pattern types (source-sink, missing-sanitizer, dangerous-function) and matching algorithms with 92.4% test coverage.

* feat: Integrate call graph into initialization pipeline

Adds InitializeCallGraph() to wire together the 3-pass algorithm (module registry, call graph building, pattern loading) and AnalyzePatterns() for security pattern detection. Includes end-to-end integration tests with 92.6% coverage.

* add callgraph integration

* chore: comment the debugging code

* feat: Add comprehensive benchmark suite for performance testing

This commit adds a complete benchmark suite to measure performance across
small, medium, and large Python projects. The benchmarks establish baseline
metrics for future optimization work.

Changes:
- Add benchmark_test.go with benchmarks for:
  * Module registry building (Pass 1)
  * Import extraction (Pass 2A)
  * Call site extraction (Pass 2B)
  * Call target resolution
  * Pattern matching
- Test against 3 real-world codebases:
  * Small: simple_project (~5 files)
  * Medium: label-studio (~1000 files)
  * Large: salt (~10,000 files)
- Fix patterns_test.go assertions for PatternMatchDetails return type
- Fix godot lint errors in builder.go

Baseline Performance Results (Apple M2 Max, 5 iterations):
- BuildModuleRegistry_Small: 80µs (target: <10ms) ✓
- BuildModuleRegistry_Medium: 6.5ms (target: <500ms) ✓
- BuildModuleRegistry_Large: 3.3ms (target: <2s) ✓
- ExtractImports_Small: 101µs (target: <20ms) ✓
- ExtractImports_Medium: 433ms (target: <2s) ✓
- ExtractCallSites_Small: 91µs (target: <30ms) ✓
- ResolveCallTarget: 533ns (target: <1µs) ✓

All benchmarks meet performance targets. Medium/Large project benchmarks
are skipped by default to keep CI fast. Enable manually with:
  go test -bench=Medium -run=^$
  go test -bench=Large -run=^$

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat: Add ImportMap caching with sync.RWMutex for performance

This commit implements thread-safe caching of ImportMap instances to avoid
re-parsing imports from the same file multiple times. This provides significant
performance improvements when the same imports are needed repeatedly.

Changes:
- Add ImportMapCache struct with RWMutex-protected cache map
- Implement Get(), Put(), and GetOrExtract() cache methods
- Update BuildCallGraph to use import caching
- Add comprehensive cache_test.go with:
  * Basic CRUD operations tests
  * Cache hit/miss scenarios
  * Concurrent access safety tests
  * Performance benchmarks

Performance characteristics:
- Get operation: O(1) with read lock (allows concurrent reads)
- Put operation: O(1) with write lock (exclusive access)
- Thread-safe for concurrent access from multiple goroutines
- Cache hit avoids expensive tree-sitter parsing

Test coverage:
- NewImportMapCache: 100%
- Get: 100%
- Put: 100%
- GetOrExtract: 85.7%
- All tests pass including concurrent access tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix: Correct matchesFunctionName test expectations

The test was incorrectly expecting 'evaluation' to match 'eval' via
substring matching, but the implementation correctly only supports:
- Exact matches: 'eval' == 'eval'
- Suffix matches: 'myapp.utils.eval' ends with '.eval'
- Prefix matches: 'request.GET.get' starts with 'request.GET.'

This prevents false positives like matching 'evaluation' to 'eval'.

Updated test case to expect false for 'evaluation' vs 'eval' match.
All tests now pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix: Update main_test.go to include analyze command in expected output

The analyze command was added in a previous commit (cmd/analyze.go) but the
main_test.go wasn't updated to reflect this new command in the help output.

This caused TestExecute/Successful_execution to fail because it expected
the old command list without 'analyze'.

Updated expected output to include:
  analyze     Analyze source code for security vulnerabilities using call graph

All tests now pass with gradle testGo.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feature: add diagnostic report command for callgraph resolution

* feature: added resolution for framework and its corresponding support

* chore: fixed lint issues

* added orm related resolutions with framework support

---------

Co-authored-by: Claude <[email protected]>
shivasurya added a commit that referenced this pull request Nov 6, 2025
…word matching

Instead of relying solely on keyword matching, now ask the LLM to directly
categorize potential failure modes in its response.

Changes:

1. **DataflowTestCase struct** (types.go):
   - Added `FailureCategory` field with JSON tag
   - Documents all 12 category types in comments

2. **LLM Prompt** (prompt.go):
   - Added guideline #7: Explicitly asks LLM to categorize each test case
   - Lists all 12 categories with descriptions
   - Added "EXAMPLE DATAFLOW PATTERNS WITH CATEGORIES" section
   - Shows concrete examples of each category with proper annotation

3. **Categorization Logic** (comparator.go):
   - Strategy 1: Use LLM-provided category (most reliable)
   - Strategy 2: Fallback to keyword matching (backwards compatible)
   - Preserves all existing keyword matching logic

Benefits:

- **More Accurate**: LLM understands context better than keyword matching
- **Self-Documenting**: LLM explains WHY it chose each category
- **Backwards Compatible**: Falls back to keyword matching if LLM doesn't provide category
- **Future-Proof**: Easy to add new categories by updating prompt

Example LLM output:
{
  "test_id": 1,
  "description": "Flow through conditional branch",
  "reasoning": "Direct flow from user input to eval() through variable 'dangerous' in a conditional branch",
  "failure_category": "control_flow_branch"
}

Results with improved prompt:
- Before: 50% "unknown" failures
- After: 75% properly categorized, 25% "unknown"
- Categories: assignment_chain, container_operation, control_flow_branch

This enables more precise failure analysis and data-driven algorithm improvement.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
shivasurya added a commit that referenced this pull request Nov 6, 2025
…word matching

Instead of relying solely on keyword matching, now ask the LLM to directly
categorize potential failure modes in its response.

Changes:

1. **DataflowTestCase struct** (types.go):
   - Added `FailureCategory` field with JSON tag
   - Documents all 12 category types in comments

2. **LLM Prompt** (prompt.go):
   - Added guideline #7: Explicitly asks LLM to categorize each test case
   - Lists all 12 categories with descriptions
   - Added "EXAMPLE DATAFLOW PATTERNS WITH CATEGORIES" section
   - Shows concrete examples of each category with proper annotation

3. **Categorization Logic** (comparator.go):
   - Strategy 1: Use LLM-provided category (most reliable)
   - Strategy 2: Fallback to keyword matching (backwards compatible)
   - Preserves all existing keyword matching logic

Benefits:

- **More Accurate**: LLM understands context better than keyword matching
- **Self-Documenting**: LLM explains WHY it chose each category
- **Backwards Compatible**: Falls back to keyword matching if LLM doesn't provide category
- **Future-Proof**: Easy to add new categories by updating prompt

Example LLM output:
{
  "test_id": 1,
  "description": "Flow through conditional branch",
  "reasoning": "Direct flow from user input to eval() through variable 'dangerous' in a conditional branch",
  "failure_category": "control_flow_branch"
}

Results with improved prompt:
- Before: 50% "unknown" failures
- After: 75% properly categorized, 25% "unknown"
- Categories: assignment_chain, container_operation, control_flow_branch

This enables more precise failure analysis and data-driven algorithm improvement.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
shivasurya added a commit that referenced this pull request Nov 15, 2025
This PR extracts ~1050 LOC of call graph builder logic from builder.go
into a dedicated builder/ package for better modularity and organization.

## Changes

### New Files Created

1. **builder/cache.go** (100 LOC)
   - ImportMapCache for thread-safe import map caching
   - Comprehensive tests with concurrent access validation

2. **builder/builder.go** (800 LOC)
   - BuildCallGraph - main orchestration function
   - indexFunctions, getFunctionsInFile, findContainingFunction
   - resolveCallTarget - core resolution logic with type inference
   - validateStdlibFQN, validateFQN - validation functions
   - detectPythonVersion - Python version detection
   - All functions have public wrappers for external use

3. **builder/helpers.go** (50 LOC)
   - ReadFileBytes - file reading utility
   - FindFunctionAtLine - AST traversal for function lookup

4. **builder/taint.go** (80 LOC)
   - GenerateTaintSummaries - taint analysis (Pass 5)

5. **builder/integration.go** (50 LOC)
   - BuildCallGraphFromPath - convenience function for 3-pass build

6. **builder/doc.go** (60 LOC)
   - Package documentation with usage examples

### Files Modified

1. **graph/callgraph/builder.go** - Backward compatibility layer
   - Type aliases for ImportMapCache
   - Wrapper functions delegating to builder package
   - All wrappers marked as Deprecated with migration path

2. **graph/callgraph/integration.go** - Simplified
   - Now uses builder.BuildCallGraphFromPath
   - Maintains same public API

3. **graph/callgraph/python_version_detector.go** - Simplified
   - Delegates to builder.DetectPythonVersion

### Files Removed

1. **cache_test.go** - Moved to builder/cache_test.go
2. **python_version_detector_test.go** - Tests moved to builder package

## Testing

✅ All tests pass (18 packages)
✅ Build succeeds (gradle buildGo)
✅ Lint passes (0 issues)
✅ Zero breaking changes - backward compatibility maintained

## Architecture

The builder package now contains all call graph construction logic:
- Pass 1: Module registry (registry package)
- Pass 2: Import extraction (resolution package)
- Pass 3: Call graph building (builder package) ← This PR
- Pass 4: Type inference integration
- Pass 5: Taint analysis

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
shivasurya added a commit that referenced this pull request Nov 16, 2025
## Summary
Creates dedicated `patterns` package for security pattern detection and framework identification. This PR isolates pattern matching logic into a clean, testable package structure.

## Changes

### New Package Structure
```
graph/callgraph/patterns/
├── detector.go        # Pattern matching & vulnerability detection (475 LOC)
├── frameworks.go      # Framework detection helpers (52 LOC)
├── helpers.go         # AST traversal helpers (34 LOC)
├── doc.go            # Package documentation (32 LOC)
└── detector_test.go   # Comprehensive tests (15 tests)
```

### Files Modified
- `patterns.go` - Backward compatibility wrappers with type aliases

### Key Features
- **PatternRegistry** for managing security patterns
- **3 Pattern Types**: SourceSink, MissingSanitizer, DangerousFunction
- **Framework Detection**: Django, Flask, FastAPI, Tornado, etc.
- **Taint Analysis Integration**: Intra-procedural vulnerability detection
- **Full Backward Compatibility**: All existing code continues to work

### Pattern Matching
```go
registry := patterns.NewPatternRegistry()
registry.AddPattern(&patterns.Pattern{
    ID: "SQL-INJECTION-001",
    Type: patterns.PatternTypeMissingSanitizer,
    Sources: []string{"request.GET", "request.POST"},
    Sinks: []string{"execute", "executemany"},
    Sanitizers: []string{"escape_sql"},
})

match := registry.MatchPattern(pattern, callGraph)
if match.Matched {
    fmt.Printf("Vulnerability: %s -> %s\n", match.SourceFQN, match.SinkFQN)
}
```

### Framework Detection
```go
framework := patterns.DetectFramework(importMap)
if framework != nil {
    fmt.Printf("Using %s (%s)\n", framework.Name, framework.Category)
}
```

## Test Coverage
- **Coverage**: 77.8% of statements
- **Tests**: 15 tests, all passing
- **Test file**: Moved from `patterns_test.go` to `patterns/detector_test.go`

## Build Verification
```bash
✅ gradle buildGo - SUCCESS
✅ go test ./graph/callgraph/... - ALL PASS
✅ All existing tests pass - NO BREAKING CHANGES
```

## Dependencies
- Imports from `core/`, `extraction/`, `analysis/taint/`
- Uses `core.CallGraph` for pattern matching
- Integrates with taint analysis for vulnerability detection

## Graphite Stack
```
main
 └─ refactor/05-advanced-resolution (#376)
     └─ refactor/06-patterns (#XXX) ← THIS PR
```

## Related PRs
- Depends on: #376 (PR #5: Advanced Resolution)
- Blocks: PR #7 (Builder Refactor)

🤖 Generated with [Claude Code](https://claude.com/claude-code)
shivasurya added a commit that referenced this pull request Nov 16, 2025
This PR extracts ~1050 LOC of call graph builder logic from builder.go
into a dedicated builder/ package for better modularity and organization.

## Changes

### New Files Created

1. **builder/cache.go** (100 LOC)
   - ImportMapCache for thread-safe import map caching
   - Comprehensive tests with concurrent access validation

2. **builder/builder.go** (800 LOC)
   - BuildCallGraph - main orchestration function
   - indexFunctions, getFunctionsInFile, findContainingFunction
   - resolveCallTarget - core resolution logic with type inference
   - validateStdlibFQN, validateFQN - validation functions
   - detectPythonVersion - Python version detection
   - All functions have public wrappers for external use

3. **builder/helpers.go** (50 LOC)
   - ReadFileBytes - file reading utility
   - FindFunctionAtLine - AST traversal for function lookup

4. **builder/taint.go** (80 LOC)
   - GenerateTaintSummaries - taint analysis (Pass 5)

5. **builder/integration.go** (50 LOC)
   - BuildCallGraphFromPath - convenience function for 3-pass build

6. **builder/doc.go** (60 LOC)
   - Package documentation with usage examples

### Files Modified

1. **graph/callgraph/builder.go** - Backward compatibility layer
   - Type aliases for ImportMapCache
   - Wrapper functions delegating to builder package
   - All wrappers marked as Deprecated with migration path

2. **graph/callgraph/integration.go** - Simplified
   - Now uses builder.BuildCallGraphFromPath
   - Maintains same public API

3. **graph/callgraph/python_version_detector.go** - Simplified
   - Delegates to builder.DetectPythonVersion

### Files Removed

1. **cache_test.go** - Moved to builder/cache_test.go
2. **python_version_detector_test.go** - Tests moved to builder package

## Testing

✅ All tests pass (18 packages)
✅ Build succeeds (gradle buildGo)
✅ Lint passes (0 issues)
✅ Zero breaking changes - backward compatibility maintained

## Architecture

The builder package now contains all call graph construction logic:
- Pass 1: Module registry (registry package)
- Pass 2: Import extraction (resolution package)
- Pass 3: Call graph building (builder package) ← This PR
- Pass 4: Type inference integration
- Pass 5: Taint analysis

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
shivasurya added a commit that referenced this pull request Nov 16, 2025
## Summary

This PR extracts ~1050 LOC of call graph builder logic from `builder.go` into a dedicated `builder/` package for better modularity and organization.

## Changes

### New Files Created

1. **builder/cache.go** (100 LOC)
   - ImportMapCache for thread-safe import map caching
   - Comprehensive tests with concurrent access validation

2. **builder/builder.go** (800 LOC)
   - BuildCallGraph - main orchestration function
   - indexFunctions, getFunctionsInFile, findContainingFunction
   - resolveCallTarget - core resolution logic with type inference
   - validateStdlibFQN, validateFQN - validation functions
   - detectPythonVersion - Python version detection
   - All functions have public wrappers for external use

3. **builder/helpers.go** (50 LOC)
   - ReadFileBytes - file reading utility
   - FindFunctionAtLine - AST traversal for function lookup

4. **builder/taint.go** (80 LOC)
   - GenerateTaintSummaries - taint analysis (Pass 5)

5. **builder/integration.go** (50 LOC)
   - BuildCallGraphFromPath - convenience function for 3-pass build

6. **builder/doc.go** (60 LOC)
   - Package documentation with usage examples

### Files Modified

1. **graph/callgraph/builder.go** - Backward compatibility layer
   - Type aliases for ImportMapCache
   - Wrapper functions delegating to builder package
   - All wrappers marked as Deprecated with migration path

2. **graph/callgraph/integration.go** - Simplified
   - Now uses builder.BuildCallGraphFromPath
   - Maintains same public API

3. **graph/callgraph/python_version_detector.go** - Simplified
   - Delegates to builder.DetectPythonVersion

### Files Removed

1. **cache_test.go** - Moved to builder/cache_test.go
2. **python_version_detector_test.go** - Tests moved to builder package

## Testing

✅ All tests pass (18 packages)
✅ Build succeeds (gradle buildGo)
✅ Lint passes (0 issues)
✅ Zero breaking changes - backward compatibility maintained

## Architecture

The builder package now contains all call graph construction logic:
- Pass 1: Module registry (registry package)
- Pass 2: Import extraction (resolution package)
- Pass 3: Call graph building (builder package) ← This PR
- Pass 4: Type inference integration
- Pass 5: Taint analysis

## Dependencies

- Parent PR: #6 (Patterns Package)
- Stack: refactor/01-foundation-types → refactor/02-infrastructure-core → refactor/03-stdlib-taint → refactor/04-ast-extraction → refactor/05-advanced-resolution → refactor/06-patterns → **refactor/07-builder** (this PR)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
shivasurya added a commit that referenced this pull request Nov 21, 2025
This fixes three critical bugs discovered during validation testing. The first fix prevents wildcard pattern flags from incorrectly propagating to argument constraints. The second ensures argument validation is actually performed during scans. The third changes tuple extraction to properly distinguish between error conditions and valid empty string values using Go idioms. These fixes are essential for correct operation of the argument matching features.
shivasurya added a commit that referenced this pull request Nov 22, 2025
## Summary

Final PR in the output standardization stack. Removes deprecated `query` and `analyze` commands and adds comprehensive documentation for all output formats, verbosity levels, and exit codes.

## Changes

### Deprecated Commands Removed
- **`cmd/query.go`** - Removed entirely
- **`cmd/query_test.go`** - Removed entirely
- **`cmd/analyze.go`** - Removed entirely
- **`main_test.go`** - Updated to remove references to deprecated commands

**BREAKING CHANGE**: No backward compatibility provided per requirements.

### Documentation Updates

#### README.md
- **Usage Examples**: Scan and CI command examples
- **Output Formats**: Text, JSON, CSV, SARIF examples with real output
- **Verbosity Levels**: Table showing default/verbose/debug behavior
- **Exit Codes**: Table and examples for exit code 0, 1, 2

#### docs/CLI.md (New)
- **Command Reference**: Complete flag documentation for all commands
- **Output Format Reference**: JSON schema, CSV columns, SARIF features
- **Exit Code Reference**: Detailed exit code behavior and --fail-on syntax

### Verification Tests
- **`cmd/command_cleanup_test.go`**: Integration tests verifying:
  - `query` command returns "unknown command"
  - `analyze` command returns "unknown command"
  - Help text no longer mentions removed commands

## Test Results

All tests passing ✅

```bash
$ gradle testGo
ok  	.../cmd	0.343s

$ ./build/go/pathfinder query
Error: unknown command "query" for "pathfinder"

$ ./build/go/pathfinder analyze
Error: unknown command "analyze" for "pathfinder"

$ ./build/go/pathfinder --help | grep -E "(query|analyze)"
# (no output - commands not shown)
```

## Documentation Examples

### Scan Command
```bash
pathfinder scan --rules rules/ --project /path/to/project
pathfinder scan --rules rules/ --project . --verbose
pathfinder scan --rules rules/ --project . --fail-on=critical,high
```

### CI Command
```bash
pathfinder ci --rules rules/ --project . --output json > results.json
pathfinder ci --rules rules/ --project . --output sarif > results.sarif
pathfinder ci --rules rules/ --project . --output csv --fail-on=critical
```

### Exit Code Behavior
```bash
# Default: always exit 0
pathfinder scan --rules rules/ --project .
echo $?  # 0 even with findings

# Fail on critical or high
pathfinder scan --rules rules/ --project . --fail-on=critical,high
echo $?  # 1 if critical/high found, 0 otherwise
```

## Migration Notes

### Breaking Changes
- **`query` command removed**: Use `scan` command instead
- **`analyze` command removed**: Use `scan` or `ci` command instead
- No migration path provided per requirements

### Non-Breaking
- All existing `scan` and `ci` commands continue to work
- Documentation is backwards compatible

## Checklist

- [x] query command removed
- [x] analyze command removed
- [x] main_test.go updated
- [x] Verification tests added
- [x] README.md updated with comprehensive docs
- [x] docs/CLI.md created
- [x] All tests passing
- [x] Linter passing
- [x] Binary builds successfully
- [x] Help text verified

## Stacked PRs

This PR stacks on top of:
- PR #6: Exit Code Standardization (#396)

This is the **final PR** in the output standardization feature stack.

## Verification

Commands removed successfully:
```bash
$ ./build/go/pathfinder query
Error: unknown command "query" for "pathfinder"

$ ./build/go/pathfinder analyze
Error: unknown command "analyze" for "pathfinder"
```

Valid commands work:
```bash
$ ./build/go/pathfinder scan --help
# Shows scan command help

$ ./build/go/pathfinder ci --help
# Shows ci command help
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)
shivasurya added a commit that referenced this pull request Dec 9, 2025
Adds advanced DSL features for complex container rules:
- all_of(), any_of(), none_of() logic combinators
- instruction_after(), instruction_before() sequence validation
- stage(), final_stage_has() for multi-stage builds
- custom_check() for programmatic validation
- DockerfileAccess and ComposeAccess wrapper classes

All components have 97-100% test coverage (44 new tests).

Files added:
- python-dsl/rules/container_combinators.py
- python-dsl/rules/container_programmatic.py
- python-dsl/tests/test_container_combinators.py
- python-dsl/tests/test_container_programmatic.py

Files modified:
- python-dsl/rules/__init__.py (added new exports)

Part of: Dockerfile & Docker Compose Support
Depends on: PR #5 (Python DSL Core)
Next PR: #7 Integration & Rule Library
shivasurya added a commit that referenced this pull request Dec 10, 2025
Adds advanced DSL features for complex container rules:
- all_of(), any_of(), none_of() logic combinators
- instruction_after(), instruction_before() sequence validation
- stage(), final_stage_has() for multi-stage builds
- custom_check() for programmatic validation
- DockerfileAccess and ComposeAccess wrapper classes

All components have 97-100% test coverage (44 new tests).

Files added:
- python-dsl/rules/container_combinators.py
- python-dsl/rules/container_programmatic.py
- python-dsl/tests/test_container_combinators.py
- python-dsl/tests/test_container_programmatic.py

Files modified:
- python-dsl/rules/__init__.py (added new exports)

Part of: Dockerfile & Docker Compose Support
Depends on: PR #5 (Python DSL Core)
Next PR: #7 Integration & Rule Library
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants