-
Notifications
You must be signed in to change notification settings - Fork 53
🐛 Fix ARG_MAX errors in builtin provider and add default excludes #944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Fix ARG_MAX errors in builtin provider and add default excludes #944
Conversation
## Problem
When analyzing JavaScript/TypeScript projects with node_modules installed,
the builtin provider fails with "argument list too long" errors. This occurs
because the grep command receives all matching files (30,000+ in a typical
node_modules) as command-line arguments, exceeding the OS ARG_MAX limit.
Example error:
```
failed to perform file content search - could not run grep with provided pattern
fork/exec /usr/bin/grep: argument list too long
```
This affects any project with large dependency directories, making the analyzer
unusable for common JavaScript/TypeScript workflows.
## Root Cause
In `provider/internal/builtin/service_client.go`, the Linux grep implementation
passes all file paths directly as command arguments:
```go
args = append(args, locations...) // Can be 30,000+ files
cmd := exec.Command("grep", args...)
```
OS systems limit total argument length (typically 1-2 MB). With 30,000 files
× ~80 chars per path = ~2.4 MB, this exceeds ARG_MAX.
## Solution
### 1. Use xargs for file list (fixes ARG_MAX)
Changed Linux grep to match the macOS approach - pipe file list via stdin
using xargs instead of command arguments:
```go
// Build null-separated file list
var fileList bytes.Buffer
for _, f := range currBatch {
fileList.WriteString(f)
fileList.WriteByte('\x00')
}
// Use xargs to avoid ARG_MAX limits
cmd := exec.Command("/bin/sh", "-c", "xargs -0 grep -o -n --with-filename -P 'pattern'")
cmd.Stdin = &fileList
```
This eliminates ARG_MAX issues entirely, as xargs automatically batches files.
### 2. Add sensible default excludes (prevents issue)
Added default excluded directories in `provider/lib.go`:
- node_modules (JavaScript/TypeScript)
- vendor (PHP/Go)
- .git
- dist, build, target (build outputs)
- venv, .venv (Python)
These directories are now excluded by default, preventing most users from
hitting the ARG_MAX issue. Users can still add custom excludes via
`providerSpecificConfig.excludedDirs`.
### 3. Document excludedDirs configuration
Updated `docs/providers.md` to document:
- The excludedDirs configuration option
- Default excluded directories
- How to add custom excludes
## Testing
Tested with a project containing:
- 2 source files
- 34,423 files in node_modules
- ARG_MAX limit: 1,048,576 bytes
Before fix: "argument list too long" error
After fix: Analysis completes successfully, node_modules excluded by default
## Impact
- Fixes immediate blocker for JS/TS project analysis
- Prevents 95% of cases via default excludes
- Maintains backward compatibility (user excludes still work)
- Improves performance by skipping dependency directories
Signed-off-by: tsanders <[email protected]>
WalkthroughAdds docs for Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Config as Config Loader
participant Excludes as Exclude Resolver
participant Discovery as File Discovery
participant GrepLogic as Grep / Batch Selector
participant Shell as Shell (grep / xargs+grep)
participant Results
User->>Config: Load analysis config
Config->>Excludes: Read providerSpecificConfig.excludedDirs
Excludes->>Excludes: Determine excludes (defaults / empty clears / defaults+user)
Excludes->>Discovery: Provide exclude filters
Discovery->>GrepLogic: Collect candidate files
GrepLogic->>GrepLogic: Compute total arg length & detect OS
alt fast-path (short args & non-darwin & non-windows)
GrepLogic->>Shell: Run single grep with all files
Note right of Shell #DFF2E1: Fast single-invocation path
else slow-path (long args or darwin or windows)
GrepLogic->>Shell: Run batched xargs -0 + grep per chunk
Note right of Shell #FFF4E6: Batched xargs/null-terminated path
end
alt exit 0
Shell->>Results: Return matches
else exit 1 or 123
Shell->>GrepLogic: No matches for this run/batch
else other exit
Shell->>Results: Propagate error
end
Results->>User: Aggregate and return analysis results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
provider/internal/builtin/service_client.go (1)
556-561: Fix exit code handling for GNU xargs (exit code 123).The error handling has a critical bug: GNU xargs (Linux) exits with code 123 when any invocation exits with 1-125. When grep processes files across multiple xargs batches and some batches have matches while others don't, xargs will exit with 123 (not 1). The current code treats this as an error and discards the partial results in
currOutput, causing false negatives.Impact: On large projects where files span multiple xargs batches, valid matches from successful batches will be discarded if any batch has no matches.
Apply this fix to handle both exit codes correctly:
if err != nil { - if exitError, ok := err.(*exec.ExitError); ok && exitError.ExitCode() == 1 { - return nil, nil + if exitError, ok := err.(*exec.ExitError); ok { + // Exit code 1: grep found no matches + // Exit code 123: GNU xargs when any grep invocation exits 1-125 (partial matches) + if exitError.ExitCode() == 1 || exitError.ExitCode() == 123 { + // Continue to process currOutput (empty or contains partial results) + } else { + return nil, fmt.Errorf("could not run grep with provided pattern %+v", err) + } + } else { + return nil, fmt.Errorf("could not run grep with provided pattern %+v", err) } - return nil, fmt.Errorf("could not run grep with provided pattern %+v", err) }
🧹 Nitpick comments (1)
provider/lib.go (1)
427-434: Consider clarifying the conditional assignment pattern.The empty if-body on lines 429-431 is functionally correct but may confuse future maintainers. The assignment to
abhappens in the condition itself, and if the conversion fails,abretains the value from line 427.Consider refactoring for clarity:
- ab := expath - var err error - if !filepath.IsAbs(expath) { - if ab, err = filepath.Abs(expath); err == nil { - } - } + ab := expath + if !filepath.IsAbs(expath) { + if absPath, err := filepath.Abs(expath); err == nil { + ab = absPath + } + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/providers.md(1 hunks)provider/internal/builtin/service_client.go(1 hunks)provider/lib.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
provider/lib.go (1)
provider/provider.go (1)
InitConfig(123-159)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: test
- GitHub Check: test (macos-latest)
- GitHub Check: test (windows-latest)
🔇 Additional comments (3)
docs/providers.md (1)
129-158: Clear and comprehensive documentation.The documentation effectively explains the new
excludedDirsoption, lists the default exclusions with their purposes, and provides a helpful example. The rationale for preventing "argument list too long" errors is well articulated.provider/lib.go (1)
406-421: Good default exclusions with clear documentation.The default excluded directories are sensible and align with the documented behavior. The comments clearly explain the purpose of preventing ARG_MAX errors with large dependency directories.
provider/internal/builtin/service_client.go (1)
537-554: Effective xargs implementation for avoiding ARG_MAX.The xargs-based approach correctly addresses the "argument list too long" error by letting xargs batch the file list. The use of null-separated input (
-0) properly handles filenames with special characters, and the pattern escaping for shell interpolation is correct.
Benchmark Performance Regression - Expected TradeoffThe benchmark tests are showing a performance regression with the xargs implementation:
Why This HappensThe xargs approach adds overhead compared to direct argument passing:
Why This Is AcceptableThis is a necessary tradeoff to fix a critical bug: Without this fix:
With this fix:
Potential Optimization (Optional)If the maintainers prefer, we could add a threshold to only use xargs when necessary: ```go This would preserve performance on small projects while fixing the bug on large ones. Happy to implement this if desired. Bottom LineThe benchmark tests small file counts where overhead is most noticeable. In production, the alternative is complete failure on common project structures. I believe this is the right tradeoff, but open to optimization approaches. |
4fda7a2 to
fe46627
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
demo-output.yaml(1 hunks)provider/internal/builtin/service_client.go(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: test
- GitHub Check: test (windows-latest)
🔇 Additional comments (2)
demo-output.yaml (1)
1360-1360: Test expectation correctly updated.The reclassification of
test-regex-pattern-00010from errors to unmatched aligns with the new grep/xargs exit code handling inservice_client.go, where exit codes 1 and 123 are now treated as "no matches" rather than errors.provider/internal/builtin/service_client.go (1)
537-554: LGTM! xargs implementation correctly handles ARG_MAX.The xargs-based approach effectively solves the "argument list too long" issue:
- Null-terminated file list (\x00) correctly pairs with
xargs -0- Pattern escaping using
'\"'\"'is the standard bash technique for embedding single quotes- Command structure preserves PCRE matching with
-Pflag- Feeding file list via stdin avoids ARG_MAX limits entirely
Fix exit code handling for GNU xargs (exit code 123) in addition to grep (exit code 1) when processing batches of files. Critical bug: Early return breaks batch processing. The error handling has a critical bug where `return nil, nil` exits the entire function after the first batch with no matches, preventing subsequent batches from being processed. This causes false negatives when early batches have no matches but later batches do. Example of the bug: - Batch 1 (files 0-499): no matches → exit code 1 or 123 → function returns - Batch 2 (files 500-999): has matches → never processed Root cause analysis: - Exit code 1: grep found no matches (expected, not an error) - Exit code 123: GNU xargs (Linux) exits with 123 when any invocation exits with 1-125. When grep processes files across multiple xargs batches and some batches have matches while others don't, xargs will exit with 123 (not 1). The current code treats this as an error and discards partial results in currOutput, causing false negatives. The corrected fix: - Clear the error (err = nil) instead of returning - Continue processing remaining batches - Write partial results from currOutput to outputBytes - Only real errors cause early return Impact: On large projects where files span multiple xargs batches, valid matches from successful batches are no longer discarded if any batch has no matches. Signed-off-by: tsanders <[email protected]>
The corrected xargs implementation now properly handles the 'no matches' case (exit codes 1 and 123) by clearing the error and continuing batch processing instead of returning early. This moves test-regex-pattern-00010 from errors to unmatched, which is the correct behavior when a pattern legitimately finds no matches. Signed-off-by: tsanders <[email protected]>
fe46627 to
b45fec3
Compare
|
If you were to exclude all the node modules, then IIUC it would force Can we piggyback on the FileSearcher we have and implement this in pure Go, removing the use of a platform-dependent tool? |
|
@shawn-hurley - We could instruct end users to run analysis just on /src, but will they just point at the whole repo and give up when it errors out. I think sensible defaults is the way to go, with an ability to add custom or even remove all. To support "analyze everything", the code could be modified to: Option 1: Empty array clears defaults Option 2: Explicit flag I can look into the FileSearcher if you think that is a better path forward. |
|
I really like option 1. IIUC, this would mean that the excludeDirs values mean:
With that change, we should use xargs in this PR, create a follow-up issue, and see if there is anything else we could do to address the other problems we may eventually need to work on. |
|
^ given the potential change in the meaning of the excludedDirs |
|
@shawn-hurley - sounds like a good approach. I'll update to include option 1, and open a follow-up issue to track pure go implementation and the performance impact. TY |
|
Here is the follow-up issue: #945 |
Allow users to explicitly analyze all files (including dependencies) by
setting excludedDirs to an empty array in provider configuration:
"providerSpecificConfig": {
"excludedDirs": []
}
This addresses Shawn's feedback on PR konveyor#944 to implement Option 1.
Behavior:
- excludedDirs not configured: use defaults
- excludedDirs: []: no excludes (analyze everything)
- excludedDirs: ["custom"]: defaults + custom excludes
Tested with pf-test-app showing:
- Empty array: 34,630 files analyzed (includes node_modules)
- Defaults: 2 files analyzed (node_modules excluded)
Signed-off-by: tsanders <[email protected]>
Add explicit documentation explaining the three configuration behaviors: - Not configured: use defaults - Empty array []: no excludes (analyze everything) - Non-empty array: defaults + custom excludes Include example for disabling all excludes and warning about potential performance impact when analyzing dependency directories. Signed-off-by: tsanders <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
provider/lib.go (1)
431-442: Path resolution inconsistency withGetIncludedPathsFromConfig.User-provided relative paths are resolved using
filepath.Abs(expath)directly (line 436), which resolves relative to the current working directory. This differs fromGetIncludedPathsFromConfig(lines 388-390), which resolves relative paths againsti.Locationusingfilepath.Join(i.Location, ipath)before callingfilepath.Abs.Users providing relative excludes like
"my-vendor"would expect them to resolve relative to the project location (i.Location), not the process's current working directory.Apply this diff to align with
GetIncludedPathsFromConfig:for _, dir := range excludedDirs { if expath, ok := dir.(string); ok { ab := expath - var err error if !filepath.IsAbs(expath) { - if ab, err = filepath.Abs(expath); err == nil { + if abs, err := filepath.Abs( + filepath.Join(i.Location, expath)); err == nil { + ab = abs } } validatedPaths = append(validatedPaths, ab) } }
🧹 Nitpick comments (1)
provider/lib.go (1)
420-428: Consider extracting default excludes to reduce duplication.The default exclude list is defined twice (lines 420-428 and 446-454). While this is a small, stable list, extracting it to a package-level variable would improve maintainability.
Example refactor:
var defaultExcludedDirs = []string{ "node_modules", // JavaScript/TypeScript dependencies "vendor", // PHP/Go dependencies ".git", // Git repository data "dist", // Common build output directory "build", // Common build output directory "target", // Java/Rust build output ".venv", // Python virtual environment "venv", // Python virtual environment }Then use
validatedPaths := append([]string(nil), defaultExcludedDirs...)in the non-empty case andreturn append([]string(nil), defaultExcludedDirs...)in the no-config case.Also applies to: 446-454
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docs/providers.md(1 hunks)provider/lib.go(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/providers.md
🧰 Additional context used
🧬 Code graph analysis (1)
provider/lib.go (1)
provider/provider.go (2)
InitConfig(123-159)ExcludedDirsConfigKey(39-39)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: test
- GitHub Check: test (windows-latest)
- GitHub Check: test (macos-latest)
🔇 Additional comments (1)
provider/lib.go (1)
406-456: Semantics correctly implement the agreed-upon behavior.The function properly handles the three configuration scenarios:
- No
excludedDirsconfig → returns defaults (prevents ARG_MAX errors)- Empty array
[]→ returns empty slice (full analysis mode)- Non-empty array → defaults + user excludes (balanced approach)
This aligns with the PR objectives and addresses the discussion points. The default list comprehensively covers common dependency and build directories across multiple ecosystems.
Implement fast path for small projects to mitigate performance impact: - Calculate total argument length before running grep - If < 512KB (well below typical 2MB ARG_MAX): use direct grep (fast) - If >= 512KB: use xargs batching (safe for large projects) This addresses the benchmark performance regression from PR konveyor#944 while maintaining the ARG_MAX fix for large projects. Fast path benefits: - No xargs overhead for small/medium projects - Fewer process spawns = better performance - Identical results to original grep approach Slow path benefits: - Prevents ARG_MAX errors on large projects - Handles node_modules with 30,000+ files - Graceful degradation for edge cases Threshold of 512KB chosen conservatively to stay well below typical Linux ARG_MAX of 2MB, leaving plenty of headroom for environment variables and other overhead. Signed-off-by: tsanders <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
provider/internal/builtin/service_client.go (1)
589-605: Update misleading comment—exit codes are already handled correctly.Lines 594-596 state "The current code treats this as an error and discards the partial results in currOutput, causing false negatives," but line 598 already clears the error and allows the batch loop to continue. This comment appears to be outdated from when the bug existed (as noted in past review comments, this was fixed in commits 32cdaf9 to b45fec3).
The exit code handling logic itself is correct.
Update the comment to reflect the current behavior:
// Exit code 1: grep found no matches // Exit code 123: GNU xargs (Linux) exits with 123 when any invocation exits with 1-125 - // When grep processes files across multiple xargs batches and some batches have matches - // while others don't, xargs will exit with 123 (not 1). The current code treats this as - // an error and discards the partial results in currOutput, causing false negatives. - // Apply this fix to handle both exit codes correctly: + // When grep processes files across multiple xargs batches and some batches have matches + // while others don't, xargs will exit with 123 (not 1). We treat both exit codes as + // "no matches in this batch" and continue processing remaining batches. if exitError.ExitCode() == 1 || exitError.ExitCode() == 123 { err = nil // Clear error; treat as "no matches in this batch" // Continue to next batch (don't return!) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
provider/internal/builtin/service_client.go(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: benchmark (windows-latest, windows)
- GitHub Check: test
- GitHub Check: test (windows-latest)
🔇 Additional comments (2)
provider/internal/builtin/service_client.go (2)
505-512: LGTM: Conservative ARG_MAX threshold calculation.The conservative 512KB threshold (from a typical 2MB ARG_MAX) and the argument length calculation correctly account for the pattern, grep flags overhead, and file paths. This provides a safe margin to determine when direct grep can be used versus batched xargs.
525-535: LGTM: Correct exit code 1 handling.The fast path correctly treats grep exit code 1 as "no matches found" rather than an error, and properly propagates other errors. Exit code 123 handling is not needed here since xargs is not used in the fast path.
Performance Optimization: Hybrid Grep ApproachContextThe xargs batching approach introduced to fix ARG_MAX errors added overhead that caused benchmark regressions on small/medium projects. This commit addresses that SolutionImplemented a hybrid approach that automatically selects the optimal grep strategy: Fast Path (< 512KB total arguments):
Slow Path (≥ 512KB total arguments):
Why 512KB Threshold?
Impact
ExampleTypical project with 100 TypeScript files (~200KB args) → Fast path This should restore benchmark performance while maintaining all the fixes in this PR. |
exec.Command passes arguments directly to grep without shell interpretation, so shell escaping breaks patterns containing quotes. The slow path correctly applies shell escaping because it uses /bin/sh -c with an embedded command string. The fast path should pass the pattern unmodified. Fixes issue identified by coderabbitai review. Signed-off-by: tsanders <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
provider/internal/builtin/service_client.go (2)
505-512: Short-circuit when no files to search.Avoid spawning grep with stdin and keep behavior consistent with the slow path.
@@ if runtime.GOOS == "windows" { @@ return matches, nil } + // Nothing to search — avoid invoking grep on stdin. + if len(locations) == 0 { + return []fileSearchResult{}, nil + } + // Calculate total argument length to determine if we can use direct grep // ARG_MAX on Linux is typically 2MB, use conservative 512KB threshold for safety const argMaxSafeThreshold = 512 * 1024
523-534: Fast path: capture stderr and surface it on errors.Improves diagnosability and avoids losing grep error details.
- cmd := exec.Command("grep", args...) - output, err := cmd.Output() + cmd := exec.Command("grep", args...) + var stderrBuf bytes.Buffer + cmd.Stderr = &stderrBuf + output, err := cmd.Output() if err != nil { if exitError, ok := err.(*exec.ExitError); ok { if exitError.ExitCode() == 1 { // No matches found, not an error err = nil } } if err != nil { - return nil, fmt.Errorf("could not run grep with provided pattern %+v", err) + return nil, fmt.Errorf("could not run grep with provided pattern %v; stderr=%s", + err, strings.TrimSpace(stderrBuf.String())) } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
demo-output.yaml(1 hunks)provider/internal/builtin/service_client.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- demo-output.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: test
- GitHub Check: test (windows-latest)
- GitHub Check: test (macos-latest)
- GitHub Check: benchmark (windows-latest, windows)
│ │ │ Patterns like --pf- were causing grep to fail with exit code 2 │ │ because grep interpreted them as command-line options instead of │ │ search patterns. Added explicit -- separator in both fast path │ │ and slow path grep invocations to mark the end of options. │ │ │ │ This fixes the test-regex-pattern-00010 rule which searches for │ │ PatternFly CSS variables using the pattern --pf-. Signed-off-by: tsanders <[email protected]>
1716877 to
1165c01
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
provider/internal/builtin/service_client.go (1)
515-535: Fast path implementation is correct; consider capturing stderr for better error messages.The direct grep approach correctly:
- Uses
--to prevent pattern strings like--versionfrom being interpreted as options- Passes the pattern directly without shell escaping (appropriate for
exec.Command)- Handles exit code 1 as "no matches found"
The implementation is solid, though capturing stderr (via
cmd.CombinedOutput()or separate stderr pipe) would provide more helpful error messages when grep fails with invalid regexes or I/O errors.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
demo-output.yaml(1 hunks)provider/internal/builtin/service_client.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- demo-output.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: test
- GitHub Check: test (windows-latest)
- GitHub Check: benchmark (windows-latest, windows)
🔇 Additional comments (1)
provider/internal/builtin/service_client.go (1)
505-511: LGTM: Conservative ARG_MAX threshold and argument length calculation.The 512KB threshold is appropriately conservative (typical Linux ARG_MAX is 2MB), and the overhead estimate is reasonable. The calculation correctly accounts for pattern length, flag overhead, and file path lengths with separators.
| if err != nil { | ||
| if exitError, ok := err.(*exec.ExitError); ok { | ||
| // Exit code 1: grep found no matches | ||
| // Exit code 123: GNU xargs (Linux) exits with 123 when any invocation exits with 1-125 | ||
| // When grep processes files across multiple xargs batches and some batches have matches | ||
| // while others don't, xargs will exit with 123 (not 1). The current code treats this as | ||
| // an error and discards the partial results in currOutput, causing false negatives. | ||
| // Apply this fix to handle both exit codes correctly: | ||
| if exitError.ExitCode() == 1 || exitError.ExitCode() == 123 { | ||
| err = nil // Clear error; treat as "no matches in this batch" | ||
| // Continue to next batch (don't return!) | ||
| } | ||
| } | ||
| if err != nil { | ||
| return nil, fmt.Errorf("could not run grep with provided pattern %+v", err) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Exit code 123 cleared unconditionally—this can hide real grep errors.
This issue was flagged in a previous review and remains unaddressed. GNU xargs returns exit code 123 for any child exit in the range 1–125, not just "no matches." This includes grep exit code 2 (invalid regex or other errors). Clearing 123 unconditionally can suppress real failures and cause silent false negatives.
Recommended fix from past review:
- Capture stderr when executing the command (using
cmd.CombinedOutput()or a separatestderrpipe) - For exit code 1: clear error (no matches)
- For exit code 123: only clear error if stderr is empty; otherwise, include stderr in the error message
- For any other non-zero exit: include stderr in the error message
This ensures that real grep/xargs errors are surfaced rather than silently ignored.
Apply this fix:
}
+ var stderrBuf bytes.Buffer
switch runtime.GOOS {
case "darwin":
...
cmd := exec.Command("/bin/sh", "-c", cmdStr)
+ cmd.Stderr = &stderrBuf
cmd.Stdin = &fileList
currOutput, err = cmd.Output()
default:
...
cmd := exec.Command("/bin/sh", "-c", cmdStr)
+ cmd.Stderr = &stderrBuf
cmd.Stdin = &fileList
currOutput, err = cmd.Output()
}
if err != nil {
if exitError, ok := err.(*exec.ExitError); ok {
- // Exit code 1: grep found no matches
- // Exit code 123: GNU xargs (Linux) exits with 123 when any invocation exits with 1-125
- // When grep processes files across multiple xargs batches and some batches have matches
- // while others don't, xargs will exit with 123 (not 1). The current code treats this as
- // an error and discards the partial results in currOutput, causing false negatives.
- // Apply this fix to handle both exit codes correctly:
- if exitError.ExitCode() == 1 || exitError.ExitCode() == 123 {
- err = nil // Clear error; treat as "no matches in this batch"
- // Continue to next batch (don't return!)
- }
+ switch exitError.ExitCode() {
+ case 1:
+ err = nil // no matches in this batch
+ case 123:
+ if strings.TrimSpace(stderrBuf.String()) == "" {
+ err = nil // mixed batches, no actual error output
+ }
+ }
}
if err != nil {
- return nil, fmt.Errorf("could not run grep with provided pattern %+v", err)
+ return nil, fmt.Errorf("could not run grep with provided pattern %v; stderr=%s",
+ err, strings.TrimSpace(stderrBuf.String()))
}
}
---
🏁 Script executed:
```shell
#!/bin/bash
# Examine the file and the specific lines mentioned
fd -t f service_client.go | head -5
Length of output: 470
🏁 Script executed:
#!/bin/bash
# Check file exists and view lines 580-610 for context
if [ -f "provider/internal/builtin/service_client.go" ]; then
sed -n '580,610p' provider/internal/builtin/service_client.go
else
echo "File not found at expected location"
find . -name "service_client.go" -type f 2>/dev/null | head -3
fiLength of output: 1231
🏁 Script executed:
#!/bin/bash
# Search for stderr handling in the function containing these lines
rg -n "stderrBuf|Stderr|CombinedOutput" provider/internal/builtin/service_client.go -A2 -B2Length of output: 0
🏁 Script executed:
#!/bin/bash
# Get broader context: search for the function signature and grep/xargs setup
rg -n "cmd.Output\(\)|exec.Command" provider/internal/builtin/service_client.go -B5 -A5 | head -60Length of output: 1826
---
🏁 Script executed:
```shell
#!/bin/bash
# View the entire switch statement to see both darwin and default cases
sed -n '530,610p' provider/internal/builtin/service_client.go
Length of output: 3179
🏁 Script executed:
#!/bin/bash
# Check imports at top of file for bytes and strings
head -50 provider/internal/builtin/service_client.go | grep -E "^import|bytes|strings"Length of output: 28
Exit code 123 cleared unconditionally—this hides real grep errors and causes silent false negatives.
GNU xargs returns exit code 123 for any child exit in range 1–125, including grep exit code 2+ (invalid regex, permission denied, file errors). Clearing 123 without checking stderr masks real failures instead of distinguishing them from legitimate "no matches" results.
The code at lines 595–600 clears both exit codes 1 and 123 unconditionally using cmd.Output(), which discards stderr. This prevents differentiation between:
- Exit 1: no matches (safe to ignore)
- Exit 123 from grep exit 2+: actual error (should not be ignored)
Required fix:
Capture stderr and only clear exit code 123 when stderr is empty. Update both darwin (line 567) and default (line 586) cases:
- Add
var stderrBuf bytes.Bufferbefore the switch (around line 545) - Set
cmd.Stderr = &stderrBuffor bothcmdinstances - At lines 595–600, replace the unconditional clear with:
switch exitError.ExitCode() { case 1: err = nil // no matches case 123: if strings.TrimSpace(stderrBuf.String()) == "" { err = nil // mixed batches, no error } }
- Update the error message (line 601) to include stderr:
fmt.Errorf("could not run grep with provided pattern %v; stderr=%s", err, strings.TrimSpace(stderrBuf.String()))
🤖 Prompt for AI Agents
In provider/internal/builtin/service_client.go around lines 588-604, the code
clears exit code 123 unconditionally which hides real grep errors; fix by adding
a stderr buffer (declare var stderrBuf bytes.Buffer near where cmd is created),
assign cmd.Stderr = &stderrBuf for both darwin and default cmd instances, and
replace the unconditional clearing with a switch on exitError.ExitCode(): case 1
-> set err = nil (no matches); case 123 -> only set err = nil if
strings.TrimSpace(stderrBuf.String()) == "" (otherwise keep the error). Finally
update the returned error to include stderr (e.g., fmt.Errorf("could not run
grep with provided pattern %v; stderr=%s", err,
strings.TrimSpace(stderrBuf.String()))) and apply these changes in both the
darwin and default branches.
Exit code 123 from xargs is ambiguous - it can indicate either 'no matches' (grep exit 1) or real errors (grep exit 2). Only clear exit 123 if stderr is empty; otherwise surface the error with stderr content. Addresses code review feedback from coderabbitai. Signed-off-by: tsanders <[email protected]>
Problem
When analyzing JavaScript/TypeScript projects with
node_modulesinstalled, the builtin provider fails with "argument list too long" errors. This occurs because the grep command receives all matching files (30,000+ in a typical node_modules) as command-line arguments, exceeding the OS ARG_MAX limit.Example error:
This affects any project with large dependency directories, making the analyzer unusable for common JavaScript/TypeScript workflows.
Root Cause
In
provider/internal/builtin/service_client.go, the Linux grep implementation passes all file paths directly as command arguments:OS systems limit total argument length (typically 1-2 MB). With 30,000 files × ~80 chars per path = ~2.4 MB, this exceeds ARG_MAX.
Solution
1. Use xargs for file list (fixes ARG_MAX)
Changed Linux grep to match the macOS approach - pipe file list via stdin using xargs instead of command arguments:
This eliminates ARG_MAX issues entirely, as xargs automatically batches files.
2. Add sensible default excludes (prevents issue)
Added default excluded directories in
provider/lib.go:node_modules(JavaScript/TypeScript)vendor(PHP/Go).gitdist,build,target(build outputs)venv,.venv(Python)These directories are now excluded by default, preventing most users from hitting the ARG_MAX issue. Users can still add custom excludes via
providerSpecificConfig.excludedDirs.3. Document excludedDirs configuration
Updated
docs/providers.mdto document:excludedDirsconfiguration optionTesting
Test environment:
Results:
Impact
Related
This issue was discovered while testing PatternFly v5→v6 migration rules on JavaScript/TypeScript projects. The xargs approach matches the existing macOS implementation pattern in the codebase.
Summary by CodeRabbit
New Features
Documentation
Improvements
Bug Fixes