-
Notifications
You must be signed in to change notification settings - Fork 53
🐛 perf: Optimize NodeJS provider file processing (fixes #969) #970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Eliminates dual bottlenecks in TypeScript/JavaScript file analysis: **Problem:** 1. 2-second sleep before EACH file (582 files × 2s = 1,164s wasted) 2. GetAllDeclarations called 18 times (once per 32-file batch) **Solution:** - Open all files in single loop (no per-file sleep) - Sleep 500ms once after all didOpen notifications - Call GetAllDeclarations once after all files indexed - Simplified didClose loop **Performance Results:** Test: tackle2-ui (582 TypeScript files, 56 rules) - Before: 20-25 minutes (1200-1500s) - After: 1m 43s (103s) - Improvement: 91% faster (12x speedup) The single sleep after all files prevents race condition where symbol queries happen before LSP server finishes indexing, without the massive overhead of sleeping before each file. Signed-off-by: tsanders <[email protected]>
WalkthroughThe NodeJS provider's file processing workflow is refactored to eliminate per-file 2-second sleeps. Instead of batching files incrementally with sleep-per-file, the system now opens all files in a single pass, applies one 500ms delay, queries symbols once, and closes all files in one consolidated pass. Changes
Sequence DiagramsequenceDiagram
autonumber
participant Client
participant OldFlow as Old Flow (Per-File)
participant NewFlow as New Flow (Batch)
rect rgba(200, 100, 100, 0.2)
Note over OldFlow: ❌ Incremental Batching
loop For each file in batch
OldFlow->>OldFlow: Sleep 2s
OldFlow->>OldFlow: didOpen(file)
OldFlow->>OldFlow: Query symbols
OldFlow->>OldFlow: Close file
end
Note over OldFlow: 582 files × 2s = 19 min
end
rect rgba(100, 200, 100, 0.2)
Note over NewFlow: ✅ Batch Processing
NewFlow->>NewFlow: Open all files
NewFlow->>NewFlow: Sleep 500ms (once)
NewFlow->>NewFlow: Query all symbols
NewFlow->>NewFlow: Evaluate
NewFlow->>NewFlow: Close all files
Note over NewFlow: ~4s total sleep overhead
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
external-providers/generic-external-provider/pkg/server_configurations/nodejs/service_client.go (2)
218-225: Make the post-open delay ctx-aware and less “magic”The single 500ms sleep is a big improvement over 2s per file but is still a hard-coded magic value and ignores
ctxcancellation. You could keep behavior while improving robustness by using a ctx-aware wait and a named constant:- // Sleep once after all files are opened to allow LSP server to process - // all didOpen notifications before querying for symbols. - // This prevents the race condition without requiring sleep before each file. - time.Sleep(500 * time.Millisecond) + const indexWait = 500 * time.Millisecond + // Sleep once after all files are opened to allow the LSP server to process + // didOpen notifications before querying for symbols. + select { + case <-time.After(indexWait): + case <-ctx.Done(): + return resp{}, ctx.Err() + }Optionally,
indexWaitcould later be made configurable if you see variability across environments.
231-235: Close-all loop is correct; consider future tracking of successfully opened filesClosing all URIs in one pass is simpler and should be fine given the preceding logic. If you ever hit partial failures in the open loop, a small enhancement would be to track which files actually succeeded in
didOpenand only close those, but that’s not blocking for this PR.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
external-providers/generic-external-provider/pkg/server_configurations/nodejs/service_client.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
external-providers/generic-external-provider/pkg/server_configurations/nodejs/service_client.go (1)
provider/provider.go (1)
ProviderEvaluateResponse(303-307)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: test
- GitHub Check: test (windows-latest, windows, amd64)
- GitHub Check: test (macos-latest, darwin, arm64)
- GitHub Check: test (macos-latest, darwin, amd64)
- GitHub Check: test (ubuntu-latest, linux, amd64)
- GitHub Check: test (ubuntu-latest, linux, arm64)
- GitHub Check: benchmark (macos-latest, mac)
- GitHub Check: benchmark (windows-latest, windows)
- GitHub Check: benchmark (ubuntu-latest, linux)
- GitHub Check: test (windows-latest, windows, amd64)
- GitHub Check: test (ubuntu-latest, linux, arm64)
- GitHub Check: test (ubuntu-latest, linux, amd64)
- GitHub Check: test (macos-latest, darwin, amd64)
- GitHub Check: test (macos-latest, darwin, arm64)
🔇 Additional comments (1)
external-providers/generic-external-provider/pkg/server_configurations/nodejs/service_client.go (1)
203-215: Open-all didOpen loop achieves the intended perf improvementSequentially reading each file and issuing
didOpenonce per entry is straightforward and removes the per-file sleep bottleneck while preserving existing error handling. This looks correct and aligned with the PR’s goals.
|
Have we tested this with the language server that is being shipped with the generic provider? Just to be clear I am not opposed to merging, just want to make sure that the kantra/hub results are not impacted |
|
Looks like the demo test, does in fact have nodejs references that are still being found and passing the test. This gives me some comfort that it is not fully breaking those results, so merging now. |
Resolved conflict in nodejs/service_client.go by keeping the import-based search implementation which supersedes the previous file batching optimization from PR konveyor#970. The import-based search provides better semantic accuracy by: - Finding imports first (regex scan) - Using LSP definitions to locate symbols - Using LSP references to find all usages This is fundamentally different from and more accurate than the workspace/symbol approach that was being optimized in the upstream changes. Signed-off-by: Todd Sanders <[email protected]> Signed-off-by: tsanders <[email protected]>
Summary
Eliminates dual bottlenecks in TypeScript/JavaScript file analysis that caused 20-25 minute delays on large codebases.
Fixes #969
Problem
The NodeJS provider had two major bottlenecks:
Sleep bottleneck: 2-second sleep before EACH file
Query bottleneck:
GetAllDeclarationscalled 18 timesSolution
Modified
external-providers/generic-external-provider/pkg/server_configurations/nodejs/service_client.go:Before:
didOpennotificationGetAllDeclarationscalled after each batch (18 times total)After:
didOpennotifications sentGetAllDeclarationsonce after all files are indexeddidCloseloopThe single sleep after all files prevents the race condition (symbol queries happening before LSP server finishes indexing) without the massive overhead of sleeping before each file.
Performance Results
Test environment: tackle2-ui (582 TypeScript files, 56 PatternFly rules)
Testing
Additional Notes
This fix is even better than the original proposal because it eliminates both the sleep bottleneck AND the redundant workspace queries.
Summary by CodeRabbit