Skip to content

Conversation

@shivasurya
Copy link
Owner

@shivasurya shivasurya commented Dec 9, 2025

Summary

Integrates container security scanning into the scan command automatically. No flags required - discovers and scans Dockerfile/docker-compose files transparently.

Depends on: #422 (container rules infrastructure)

Implementation

New: container_scanner.go (352 lines)

  • TryContainerScan(): Silent integration point, returns nil if unavailable
  • DiscoverContainerFiles(): Finds Dockerfiles and docker-compose.yml via filepath.Walk
  • CompileContainerRules(): Compiles Python DSL to JSON IR with 30s timeout
  • ScanContainerFiles(): Executes rules via ContainerRuleExecutor
  • convertToEnrichedDetection(): Maps RuleMatchEnrichedDetection

Modified: scan.go

  • Added Step 6: Automatic container scanning
  • Merges container findings with dataflow detections
  • Changed "no source files" to warning (not error)
  • Updated help text

Other Changes

  • Removed compiled_rules.json from .gitignore
  • Ships pre-compiled rules (10KB, 18 rules)
  • Fixed 13 golangci-lint errors

Behavior

pathfinder scan --rules rules.py --project /path/to/project

Automatic Steps:

  1. Build code graph (source files)
  2. Build callgraph (dataflow analysis)
  3. Execute Python DSL rules → dataflow detections
  4. Discover container files (Dockerfile, docker-compose.yml)
  5. Load/compile container rules (pre-compiled or on-demand)
  6. Execute container rules → pattern detections
  7. Merge findings → unified EnrichedDetection[]
  8. Display with severity grouping

Silent Operation:

  • No container files → skip silently
  • Rules unavailable → skip silently
  • Python missing → skip silently
  • Logs at DEBUG level only

Output Format

Critical Issues (1):
  [critical] [Pattern] COMPOSE-SEC-001: Service Running in Privileged Mode
    CWE-250
    docker-compose.yml:8

Medium Issues (2):
  [medium] [Pattern] DOCKER-AUD-003: Dockerfile:10
  [medium] [Pattern] DOCKER-BP-001: Dockerfile:1

Summary: 6 findings (1 critical, 2 medium, 3 low)
Detection Methods: pattern (6)

Technical Details

Detection Types

  • Source code: DetectionTypeTaintLocal, DetectionTypeTaintGlobal
  • Containers: DetectionTypePattern
  • Unified format for output formatter

Rule Compilation

  1. Try python-dsl/compiled_rules.json (instant)
  2. Fallback: exec.CommandContext with 30s timeout
  3. Skip if unavailable (no errors)

File Discovery

  • filepath.Walk with skip rules: .git, node_modules, __pycache__
  • Matches: Dockerfile*, docker-compose*.{yml,yaml}

Error Handling

  • Parser errors → warning, continue scanning
  • Missing rules → silent skip
  • Propagate filepath.Walk errors

Future Extensibility

Same pattern for:

  • Kubernetes manifests (YAML)
  • Terraform files (HCL)
  • Cloud-init scripts
  • GitHub Actions workflows

Testing

Verified with test project:

  • app.py: SQL injection (dataflow rule)
  • Dockerfile: 4 security issues (pattern rules)
  • docker-compose.yml: 2 security issues (pattern rules)

Results:

  • ✅ 6 container findings detected
  • ✅ Unified output format
  • ✅ No flags required
  • ✅ Silent when unavailable

Migration

Users: No changes needed, automatic detection
Developers: Use same EnrichedDetection format for all scanners

shivasurya and others added 3 commits December 9, 2025 14:34
Redesigns container security scanning to be automatic and transparent,
integrating seamlessly with dataflow analysis like Python DSL rules.

Key Changes:
- Created container_scanner.go with TryContainerScan() for automatic scanning
- Removed --skip-container flag - container scanning now happens automatically
- Integrated into scan command as Step 6 (transparent to users)
- Unified output format - all findings use EnrichedDetection structure
- Silent operation - gracefully skips if no container files or rules exist
- Future-proof architecture - easy to add YAML, Terraform, K8s scanners

Integration Flow:
1. Scan discovers container files automatically (Dockerfile, docker-compose.yml)
2. Attempts to load pre-compiled rules or compile on-demand
3. Executes container security rules if available
4. Merges findings with source code detections
5. Displays unified output with severity grouping

Behavior:
- No special flags needed - works like Python DSL rules
- Scans both source code and container files in single pass
- Supports mixed rule types (dataflow + pattern-based)
- Container findings labeled with [Pattern] detection method
- All findings sorted by severity: critical, high, medium, low

Fixes:
- Fixed severity casing (lowercase for formatter compatibility)
- Proper EnrichedDetection structure with all required fields
- Correct CWE/OWASP field types ([]string not string)

Testing:
- Verified with test project containing Python + Dockerfile + docker-compose
- Successfully detected 6 container security issues automatically
- Output displays correctly with severity grouping

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Commits compiled_rules.json so users can use container scanning
without needing Python installed. The rules are pre-defined examples
that ship with pathfinder as documented in PR #422.

Changes:
- Removed compiled_rules.json from .gitignore
- Committed python-dsl/compiled_rules.json (10KB, 18 rules)
- Contains 10 Dockerfile rules + 8 docker-compose rules
- Rules automatically compile on-demand if file is missing

This enables container scanning to work out-of-the-box without
requiring users to have Python and codepathfinder package installed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Fixes all 13 linting issues reported by golangci-lint:

- godot (7 fixes): Added periods to all function/type comments
- nilerr (1 fix): Changed to propagate errors instead of returning nil
- noctx (1 fix): Use exec.CommandContext with 30s timeout instead of exec.Command
- prealloc (2 fixes): Pre-allocate slices with capacity for better performance
- unparam (2 fixes): Removed unused logger parameters from internal functions

Changes:
- Use context.WithTimeout for Python compilation (prevents hangs)
- Pre-allocate findings slices with len(matches) capacity
- Remove logger from scanDockerfile and scanComposeFile signatures
- Propagate filepath.Walk errors properly

All linting checks now pass (0 issues).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@safedep
Copy link

safedep bot commented Dec 9, 2025

SafeDep Report Summary

Green Malicious Packages Badge Green Vulnerable Packages Badge Green Risky License Badge

No dependency changes detected. Nothing to scan.

This report is generated by SafeDep Github App

@shivasurya shivasurya self-assigned this Dec 9, 2025
@shivasurya shivasurya added enhancement New feature or request docker Docker/Dockerfile related changes labels Dec 9, 2025
@codecov
Copy link

codecov bot commented Dec 9, 2025

Codecov Report

❌ Patch coverage is 56.96203% with 102 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.13%. Comparing base (bf64169) to head (1b5e34b).

Files with missing lines Patch % Lines
sast-engine/cmd/container_scanner.go 61.36% 81 Missing and 4 partials ⚠️
sast-engine/cmd/scan.go 0.00% 10 Missing ⚠️
sast-engine/cmd/ci.go 0.00% 7 Missing ⚠️
Additional details and impacted files
@@                          Coverage Diff                           @@
##           docker/07-integration-rule-library     #423      +/-   ##
======================================================================
- Coverage                               81.74%   81.13%   -0.61%     
======================================================================
  Files                                      84       85       +1     
  Lines                                    8649     8880     +231     
======================================================================
+ Hits                                     7070     7205     +135     
- Misses                                   1310     1402      +92     
- Partials                                  269      273       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@shivasurya shivasurya changed the base branch from main to docker/07-integration-rule-library December 9, 2025 09:06
@shivasurya shivasurya changed the title feat: Seamless container scanning integration feat(docker): Seamless container scanning integration Dec 9, 2025
shivasurya and others added 2 commits December 9, 2025 14:44
…ning

Enhances container scanning with complete test coverage, code snippet
extraction, and seamless CI command integration.

Key Changes:

1. Code Snippet Extraction
   - Added extractCodeSnippet() to read and display actual code
   - Shows 3 lines of context around findings
   - Highlights vulnerable line with '>' marker
   - Uses bufio.Scanner for efficient file reading

2. Comprehensive Test Coverage (container_scanner_test.go)
   - TestDiscoverContainerFiles: File discovery with 5 scenarios
   - TestCompileContainerRules: Rule compilation edge cases
   - TestFilterByType: File type filtering
   - TestConvertToEnrichedDetection: RuleMatch conversion with 3 cases
   - TestGetContainerSummary: Severity counting
   - TestFindProjectRoot: Project root detection
   - TestScanContainerFiles: End-to-end scanning
   - TestTryContainerScan: Integration point
   - All tests passing ✅

3. CI Command Integration
   - Added automatic container scanning to 'ci' command
   - Uses same TryContainerScan() integration point
   - Merges container findings with dataflow detections
   - Updated help text to mention container scanning
   - Changed "no source files" from error to warning
   - Works with SARIF, JSON, CSV outputs

Technical Details:

**Code Snippets:**
Before: Empty Lines[] array
After: Populated with actual code from Dockerfile/compose files

Example output:
```
docker-compose.yml:8
  > 8 | privileged: true
    9 | environment:
   10 |   - DEBUG=1
```

**Test Structure:**
- Uses testify/assert for assertions
- Table-driven tests where applicable
- Temp directories for isolation
- VerbosityDefault for quiet logging

**CI Integration Flow:**
1. Build code graph (source files)
2. Execute dataflow rules
3. Scan container files automatically
4. Merge all findings
5. Generate SARIF/JSON/CSV output

Behavior:
- Code snippets show automatically for all findings
- Tests achieve good coverage of core functionality
- CI command follows same pattern as scan command
- Silent operation when containers unavailable

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Removes pre-compiled container rules from git as it's a generated file
that should compile on-demand.

Rationale:
- Generated files shouldn't be in version control
- Can get out of sync with source rules
- On-demand compilation works fine (~1s with 30s timeout)
- Similar to Python .pyc files or Go binaries

Behavior:
1. First run: Compiles rules automatically
2. Cached locally at python-dsl/compiled_rules.json
3. Recompiles when rules change
4. Falls back gracefully if Python unavailable

Users with Python (required for dataflow rules anyway) will have
container rules compile transparently on first scan.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@shivasurya
Copy link
Owner Author

Closing in favor of PR #426 which implements proper runtime rule loading from --rules path instead of hardcoded compiled_rules.json. PR #426 has the correct architecture following Python DSL patterns.

@shivasurya shivasurya closed this Dec 9, 2025
shivasurya added a commit that referenced this pull request Dec 10, 2025
… 7/8) (#422)

## Summary

Container security scanning infrastructure with Dockerfile and docker-compose parsers, Python DSL, rule executor, and 18 OWASP-aligned security rules.

**Next PR:** #423 (seamless integration into scan command)

## Components

### Parsers
- **Dockerfile Parser**: Tree-sitter based, supports all 18 instructions
- **Compose Parser**: YAML parser with security-focused queries
- **Test Coverage**: 100% for parsers

### Python DSL
- Declarative rule syntax: `@dockerfile_rule`, `@compose_rule`
- Matchers: `Instruction()`, `Service()`, `Command()`, `Port()`, `Volume()`
- Combinators: `And()`, `Or()`, `Not()`
- Pattern support: wildcards, regex

### Rule Executor
- Executes compiled JSON IR against container graphs
- Returns structured `RuleMatch` with file/line/service metadata
- **Coverage**: 94.6%

### Security Rules (18)

**Dockerfile (10):**
- `DOCKER-BP-001`: Using :latest tag (MEDIUM)
- `DOCKER-BP-003`: Deprecated MAINTAINER (LOW)
- `DOCKER-BP-005`: apt-get without --no-install-recommends (LOW)
- `DOCKER-BP-007`: apk without --no-cache (LOW)
- `DOCKER-BP-008`: pip without --no-cache-dir (LOW)
- `DOCKER-BP-022`: Missing HEALTHCHECK (LOW)
- `DOCKER-AUD-003`: Privileged port exposed (MEDIUM)
- `DOCKER-SEC-001`: Running as root (HIGH)
- `DOCKER-SEC-005`: Secret in build arg (CRITICAL)
- `DOCKER-SEC-006`: Docker socket mounted (HIGH)

**Compose (8):**
- `COMPOSE-SEC-001`: Privileged mode (CRITICAL)
- `COMPOSE-SEC-002`: Docker socket exposed (HIGH)
- `COMPOSE-SEC-003`: Seccomp disabled (HIGH)
- `COMPOSE-SEC-006`: Writable filesystem (LOW)
- `COMPOSE-SEC-007`: Host network mode (HIGH)
- `COMPOSE-SEC-008`: Dangerous capabilities (MEDIUM)
- `COMPOSE-SEC-009`: Host PID mode (MEDIUM)
- `COMPOSE-SEC-010`: Host IPC mode (MEDIUM)

## Structure

```
rules/docker/              # 10 Dockerfile rules
rules/docker-compose/      # 8 Compose rules
python-dsl/
├── compile_container_rules.py
├── compiled_rules.json    # Pre-compiled IR (10KB)
└── codepathfinder/rules/  # DSL implementation
sast-engine/
├── graph/docker/          # Dockerfile parser
├── graph/compose_parser.go
└── executor/container_executor.go
```

## Example Rule

```python
@dockerfile_rule(
    id="DOCKER-SEC-001",
    name="Container Running as Root",
    severity="HIGH",
    cwe=["CWE-250"],
    owasp=["A01:2021"]
)
def missing_user_instruction():
    return Not(Instruction("USER"))
```

## Testing

```bash
# Parser tests (100%)
cd sast-engine && go test ./graph/docker/...

# Executor tests (94.6%)
cd python-dsl && pytest tests/test_container_*.py --cov

# Rule compilation
cd python-dsl && python3 compile_container_rules.py
```

## Technical Details

- **AST Representation**: Tree-sitter for Dockerfile, YAML parsing for Compose
- **Rule Format**: JSON IR with matchers, conditions, metadata
- **Execution Model**: Pattern matching against graph nodes
- **Output**: Structured `RuleMatch` with location/severity/CWE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docker Docker/Dockerfile related changes enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants