AGENTS.md - AI Agent Instructions

This file helps AI agents quickly understand and contribute to the Context Builder codebase.

Project Overview

Context Builder is a blazing-fast Rust CLI for aggregating entire codebases into single, LLM-friendly markdown files. Published on crates.io under MIT license.

If this is your first time: Read this file, then run cargo run -- --help to see all options.

Tech Stack

Technology	Usage
Language	Rust (Edition 2024)
Build	Cargo (no npm/bun/node)
CLI	`clap` (derive)
Parallelism	`rayon` (optional, default on) + `crossbeam-channel`
Diffing	`similar` (unified diffs)
File traversal	`ignore` crate (gitignore-aware)
Token counting	`tiktoken-rs` (`cl100k_base`)
Caching	JSON + `fs2` file locking
Config	TOML (`context-builder.toml`)
Encoding	`encoding_rs` (transcoding non-UTF-8)
Logging	`env_logger`
Branch	`master` (not `main`)

Project Structure

context-builder/
├── src/
│   ├── main.rs              # Entry point — calls lib::run()
│   ├── lib.rs               # Core orchestration, run_with_args(), Prompter trait, --init
│   ├── cli.rs               # Args struct via clap derive
│   ├── config.rs            # Config struct, TOML deserialization
│   ├── config_resolver.rs   # Merges CLI args + TOML config (CLI > config > defaults)
│   ├── file_utils.rs        # .gitignore-aware traversal, OverrideBuilder for custom ignores
│   ├── tree.rs              # BTreeMap file tree (deterministic ordering)
│   ├── state.rs             # ProjectState/FileState structured snapshots
│   ├── markdown.rs          # Streaming file renderer, binary detection, encoding, parallel
│   ├── cache.rs             # JSON-based caching with fs2 locking, old cache migration
│   ├── diff.rs              # Per-file unified diffs via similar
│   └── token_count.rs       # Real tokenization via tiktoken-rs (cl100k_base, lazy init)
├── tests/                   # 10 integration test files
├── benches/                 # Criterion benchmark suite
├── scripts/                 # generate_samples.rs (benchmark dataset generator)
├── context-builder.toml     # Project's own config file
├── Cargo.toml               # Crate metadata, dependencies, features
├── DEVELOPMENT.md           # Contributor guide
├── BENCHMARKS.md            # Performance benchmarking guide
├── CHANGELOG.md             # Release history
└── .github/workflows/ci.yml # CI: fmt, clippy, build, test, security audit (ubuntu/win/macos)

Key Commands

# Build
cargo build

# Run
cargo run -- --help
cargo run -- -d . -o out.md -f rs -f toml
cargo run -- --preview        # File tree only, no output
cargo run -- --init           # Create config file with auto-detected filters

# Test (MUST use single thread — tests share CWD)
cargo test -- --test-threads=1

# Lint (must pass -D warnings)
cargo clippy --all-targets --all-features -- -D warnings

# Format
cargo fmt --all

Key Design Patterns

Prompter trait — Abstracts user confirmation (overwrite/processing). Tests use MockPrompter/TestPrompter. Never add stdin reads in library code.
Streaming writes — markdown.rs processes files line-by-line for low memory. With parallel feature, uses crossbeam channels for concurrent processing.
Structured state — v0.5.0 replaced fragile text-based cache parsing with JSON ProjectState snapshots for reliable auto-diff.
Deterministic output — BTreeMap everywhere ensures identical output across runs.
Config precedence — CLI args > TOML config > defaults, with explicit detection in config_resolver.rs.

Feature Flags

Feature	Default	Purpose
`parallel`	✅	Rayon for parallel file processing
`samples-bin`	❌	Exposes `generate_samples` binary for benchmarking

Environment Variables

Variable	Purpose
`CB_SILENT`	`"1"` suppresses user-facing prints (benchmarks set this)
`CB_BENCH_MEDIUM`	`"1"` enables heavier benchmark datasets
`CB_BENCH_DATASET_DIR`	External benchmark dataset root
`RUST_LOG`	Controls `env_logger` verbosity (e.g., `RUST_LOG=info`)

Code Style Guidelines

Error handling — Use io::Result. Prefer returning errors over panicking. unwrap()/expect() OK in tests, NOT in library code.
Cross-platform — Normalize path separators in tests for string comparisons.
New CLI flags — Add in cli.rs, update tests in same file, propagate through run_with_args.
Language detection — Keep simple and deterministic; add mappings in one place.
Binary detection — Lightweight: NUL byte check + UTF-8 validity.
Logging — Use log::{info, warn, error}. Let env_logger control emission.

Test Organization

Unit tests: Inline #[cfg(test)] modules in every source file
Integration tests (10 files in tests/):
- test_auto_diff.rs — Auto-diff workflow (largest test file)
- test_determinism.rs — Output determinism verification
- test_config_resolution.rs — CLI/config merge behavior
- test_cwd_independence.rs — Path independence
- test_comprehensive_edge_cases.rs — Edge cases
- cli_integration.rs — End-to-end CLI tests
- test_binary_file_autodiff.rs, test_parallel_memory.rs, test_phase4_integration.rs, diff_integration.rs
Benchmarks: Criterion suite at benches/context_bench.rs

Critical: Tests MUST run with --test-threads=1 (CI enforces this). Many tests use set_current_dir() which is process-global. Use #[serial] attribute where order matters.

Known Hazards

Year in tests: Watch for hardcoded year strings in timestamp assertions. Use dynamic Utc::now().format("%Y") instead.
CWD mutation: Tests that set_current_dir() must restore the original directory in all code paths (including panics).
Config from CWD: load_config() reads from CWD. load_config_from_path() reads from explicit root. Prefer the latter in tests.
Cache collisions: Cache keys are project-path + config hash. Different configs = different cache files.

Release Process

cargo fmt --all && cargo clippy --all-targets --all-features -- -D warnings && cargo test -- --test-threads=1
Bump version in Cargo.toml, add entry to CHANGELOG.md
git commit -am "chore(release): vX.Y.Z" && git tag vX.Y.Z && git push && git push --tags
cargo publish

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AGENTS.md - AI Agent Instructions

Project Overview

Tech Stack

Project Structure

Key Commands

Key Design Patterns

Feature Flags

Environment Variables

Code Style Guidelines

Test Organization

Known Hazards

Release Process

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

AGENTS.md - AI Agent Instructions

Project Overview

Tech Stack

Project Structure

Key Commands

Key Design Patterns

Feature Flags

Environment Variables

Code Style Guidelines

Test Organization

Known Hazards

Release Process