forked from konveyor/analyzer-lsp
-
Notifications
You must be signed in to change notification settings - Fork 0
Implement import-based search for nodejs.referenced capability #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
tsanders-rh
wants to merge
19
commits into
main
Choose a base branch
from
fix/nodejs-referenced-import-search
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,288
−84
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit implements a new import-based search algorithm for the
nodejs.referenced capability, replacing the previous workspace/symbol
approach which had significant limitations with TypeScript LSP servers.
Problem
-------
The nodejs.referenced capability was not working reliably because:
1. TypeScript LSP workspace/symbol only returns top-level declarations
2. Exported components like PatternFly's Card, Button, etc. are not
top-level declarations in their source packages
3. This caused false negatives where imported symbols were not detected
4. Previous workaround used combo rules with builtin.filecontent, but
this was inefficient and semantically incorrect
Solution
--------
Implement import-based search using a three-step algorithm:
1. FAST SCAN: Scan all TypeScript/JavaScript files for import statements
containing the pattern using regex (no LSP required)
- Handles multiline imports via normalization
- Supports named imports: import { Card } from "package"
- Supports default imports: import Card from "package"
- Correctly identifies word boundaries to avoid false matches
2. GET DEFINITIONS: For each import found, use LSP textDocument/definition
to locate where the symbol is actually defined
3. GET REFERENCES: For each definition, use LSP textDocument/references
to find all usage locations in the workspace
This approach is both faster and more accurate than workspace/symbol
search, as it leverages the fact that developers must import symbols
before using them.
Implementation Details
---------------------
- Added ImportLocation and fileInfo types to track file metadata
- Implemented findImportStatements() for regex-based import detection
- Implemented normalizeMultilineImports() to handle formatted code
- Added helper functions for identifier validation
- Added GetDependencies/GetDependenciesDAG overrides to prevent errors
- Enhanced logging in base service client for better debugging
- Removed debug printf statements, using structured logging only
Files Changed
-------------
- nodejs/service_client.go: Main implementation (~480 lines added)
- base_service_client.go: Enhanced logging for workspace/symbol calls
- cmd_dialer.go: Minor updates
- go.mod/go.sum: Dependency updates
Testing
-------
Tested with PatternFly v5→v6 migration ruleset on tackle2-ui codebase:
- 56 rules for React component migrations
- Successfully detects imports and references for all PatternFly components
- No longer requires builtin.filecontent workarounds
Benefits
--------
1. Accurate detection of imported symbols regardless of export structure
2. Faster execution by avoiding expensive workspace/symbol calls
3. Eliminates need for combo rules with builtin.filecontent
4. Better semantic correctness - finds actual usage, not just text patterns
5. Comprehensive documentation with detailed docstrings
Signed-off-by: Todd Sanders <[email protected]>
Signed-off-by: tsanders <[email protected]>
Resolved conflict in nodejs/service_client.go by keeping the import-based search implementation which supersedes the previous file batching optimization from PR konveyor#970. The import-based search provides better semantic accuracy by: - Finding imports first (regex scan) - Using LSP definitions to locate symbols - Using LSP references to find all usages This is fundamentally different from and more accurate than the workspace/symbol approach that was being optimized in the upstream changes. Signed-off-by: Todd Sanders <[email protected]> Signed-off-by: tsanders <[email protected]>
This commit adds test coverage for the nodejs.referenced import-based search functionality introduced in the main PR. Changes: - Export ImportLocation struct fields (FileURI, LangID, Position, Line) to enable testing - Add test helper functions to expose private methods for testing: - NormalizeMultilineImportsPublic() - FindImportStatementsPublic() - IsIdentifierCharPublic() - Create comprehensive test suite (import_search_test.go) with 38 test cases covering: - Multiline import normalization (5 tests) - Import statement finding (8 tests) - Identifier character detection (12 tests) - Word boundary matching (6 tests) - Edge cases (7 tests) All tests pass (100% success rate) and validate core import detection logic including word boundaries, multiline imports, and various quote/ whitespace styles. Signed-off-by: tsanders <[email protected]>
f66148c to
1ed790f
Compare
This commit updates the demo baseline for violations/incidents to reflect improved precision in the import-based search implementation for the nodejs.referenced condition. Changes: - demo-output.yaml: Removed incorrect references to quarkus-1-6-2-jar-exploded files that don't exist in fresh builds. One fewer nodejs incident due to improved word boundary detection. The rule node-sample-rule-001 no longer matches file:///examples/nodejs/test_a.ts at line 5 because the improved detection correctly identifies that 'greeter' is not the same as 'greet'. - demo-dep-output.yaml: Updated to include java-project dependencies that are correctly generated by fresh builds (matches main branch baseline). Signed-off-by: tsanders <[email protected]>
1ed790f to
6eb0d54
Compare
Accessing sc.Config.WorkspaceFolders[0] without verifying the slice is non-empty could cause a panic. This adds a defensive check that returns a clear error message if no workspace folders are configured. Signed-off-by: $(git config user.name) <$(git config user.email)> Signed-off-by: tsanders <[email protected]>
The 100ms sleep was a code smell relying on timing rather than proper synchronization. This replaces it with retry logic that attempts the textDocument/definition call up to 3 times with exponential backoff (50ms, 100ms, 200ms). This approach is more robust across different LSP server implementations and varying system loads. Signed-off-by: $(git config user.name) <$(git config user.email)> Signed-off-by: tsanders <[email protected]>
Add defensive checks before accessing sc.BaseConfig.WorkspaceFolders[0] at lines 360 and 431 to prevent potential runtime panics. These checks ensure the slice is non-empty before accessing the first element. Signed-off-by: $(git config user.name) <$(git config user.email)> Signed-off-by: tsanders <[email protected]>
Replace simple backslash check with proper escape counting. The previous check (content[i-1] != '\\') didn't correctly handle double-escaped quotes like \\". This fix counts preceding backslashes - if there's an even number (including 0), the quote is not escaped. This correctly handles cases like: - "test\"" (escaped, odd backslashes) - "test\\"" (not escaped, even backslashes) Signed-off-by: $(git config user.name) <$(git config user.email)> Signed-off-by: tsanders <[email protected]>
Replace strings.Contains check with regex using word boundaries to avoid false positives. The previous check matched "from" anywhere in the string, including inside identifiers (e.g., "dataFromAPI") or string literals. The new regex pattern \bfrom\s+["'] ensures: - \b matches word boundary (not part of a larger identifier) - from matches the literal keyword - \s+ matches one or more whitespace characters - ["'] matches a quote character The regex is compiled once at package level for efficiency. Signed-off-by: $(git config user.name) <$(git config user.email)> Signed-off-by: tsanders <[email protected]>
Fix critical consistency bug where Config.WorkspaceFolders was initialized at line 58 but BaseConfig.WorkspaceFolders was never synchronized. This caused lines 363 and 434 to use an uninitialized BaseConfig.WorkspaceFolders, leading to potential silent failures in reference filtering logic. After initializing the base client, explicitly synchronize BaseConfig.WorkspaceFolders from Config.WorkspaceFolders to ensure both configurations remain consistent throughout the code. Signed-off-by: $(git config user.name) <$(git config user.email)> Signed-off-by: tsanders <[email protected]>
…erference In normalizeMultilineImports, the "from" detection on newlines was using result.String() which scans the entire accumulated file content. This could match "from" tokens from previous imports or other code that falls within the 50-character window, causing the current import to be prematurely treated as complete. Now restrict the check to content[importStart:i] to inspect only the current import statement. This avoids spurious matches from earlier code and eliminates the O(n²) behavior from repeatedly calling result.String(). Signed-off-by: $(git config user.name) <$(git config user.email)> Signed-off-by: tsanders <[email protected]>
Extends import detection to handle all valid JavaScript/TypeScript patterns:
- Namespace imports: import * as Pattern from "package"
- Mixed default + named: import React, { useState } from "react"
- Mixed default + namespace: import React, * as All from "react"
- TypeScript type imports: import type { Card } from "package"
Updated regex pattern with 5 capture groups:
1. Default import (when mixed with named/namespace)
2. Named imports (braced list)
3. Default import (when standalone)
4. Namespace import
5. Package name
The regex now includes optional "type" keyword support for TypeScript
type-only imports, ensuring comprehensive coverage of modern JavaScript
and TypeScript import patterns.
Pattern matching logic updated to:
- Check all three import types (default, named, namespace)
- Use exact match for single identifiers (default, namespace)
- Use substring match for comma-separated lists (named imports)
- Added inline comments explaining matching strategy
Test coverage (25 test cases):
- Namespace imports (single line, multiline, multiple per file)
- Mixed default + named imports (both parts searchable, multiline)
- Mixed default + namespace imports (rare but valid)
- TypeScript type imports (named, default, namespace, multiline)
- Side-effect imports (correctly ignored)
- Word boundaries (avoid partial matches like "Card" vs "CardHelper")
- Special characters in package names (@scoped/packages)
- Stress test with 20 named imports in single statement
- Pattern not found scenarios
All 25 tests passing. Build successful.
Addresses code review feedback item #1.
Signed-off-by: tsanders <[email protected]>
Fixes a critical bug where import detection would miss valid matches
when the pattern appeared as a prefix of another symbol on the same line.
Example that was failing:
import { CardFooter, Card } from '@patternfly/react-core';
When searching for "Card":
- Previous: Would find "Card" in "CardFooter", fail word boundary check,
and stop searching - missing the standalone "Card" later on the line
- Fixed: Now searches all occurrences on the line until finding a match
with valid word boundaries
Changes:
- Added inner loop to check multiple occurrences of pattern per line
- Added `found` flag to break from outer line-searching loop once matched
- Prevents false negatives when pattern is substring of other identifiers
Test coverage:
- Pattern as prefix: "CardFooter, Card" - now correctly finds "Card"
- Pattern in middle: "Button, Card, CardBody" - correctly finds "Card"
All 27 tests passing.
Addresses CodeRabbit review feedback.
Signed-off-by: tsanders <[email protected]>
Addresses CodeRabbit feedback: - Remove unused LOG_FILE variable - Replace with CONSOLE_LOG for consistency - Update all references to use correct log file variables The script now correctly references console.log throughout. Signed-off-by: tsanders <[email protected]>
These files are local development artifacts: - CODE_REVIEW_RESPONSE_PLAN.md - planning document - NODEJS_REFERENCED_ANALYSIS.md - analysis notes - test-patternfly-direct.sh - local test script Keeping them local only. Signed-off-by: tsanders <[email protected]>
Extracts complex logic into well-documented helper functions for better maintainability and testability. **New helper functions:** - `isStringDelimiter()` - Checks for quotes and backticks - `isEscapedQuote()` - Counts preceding backslashes to detect escaped quotes - `hasCompletedImportStatement()` - Detects "from" keyword to identify complete imports - `normalizeImportStatement()` - Processes individual import statement normalization **Main function improvements:** - `normalizeMultilineImports()` now delegates to helpers - Added `\r\n` → `\n` preprocessing for cross-platform compatibility - Simplified switch statement in import processing - Removed deeply nested "from" detection logic **Benefits:** - Each helper is independently testable - Main function logic is clearer (26 lines vs 100 lines) - Better separation of concerns - Improved code documentation **Cross-platform line ending handling:** - Normalizes Windows `\r\n` to Unix `\n` at start - Simplifies processing to only check `\n` in switch - Added comments explaining line ending approach **Test coverage:** Added 5 new test cases: - Windows line endings (CRLF) normalization - Mixed line endings normalization - Escaped quotes in import strings - Double backslash before quote - Template literals with backticks All 10 normalization tests passing + 27 import tests passing. Addresses code review feedback items #2, konveyor#3, and konveyor#4. Signed-off-by: tsanders <[email protected]>
Follow Go testing conventions by accessing private methods directly from tests in the same package. This eliminates unnecessary public wrappers that were only used for testing. Changes: - Changed import_search_test.go package from nodejs_test to nodejs - Removed FileInfo exported struct - Removed NormalizeMultilineImportsPublic wrapper - Removed FindImportStatementsPublic wrapper - Removed IsIdentifierCharPublic wrapper - Updated all test calls to use private methods directly All import search tests pass (37 test cases across 5 test functions). Addresses code review feedback on public vs private method exposure. Signed-off-by: tsanders <[email protected]>
Make workspace filtering conditional based on the analysis mode setting. This allows users to control whether they want comprehensive analysis or source-only analysis. Behavior: - source-only mode: Filter references to workspace folders only, excluding dependency folders (existing behavior preserved) - full mode: Include all references regardless of location Changes: - Added AnalysisMode field to NodeServiceClient - Store analysis mode from InitConfig during initialization - Apply workspace filtering only when AnalysisMode is source-only - Log analysis mode in initialization output This addresses the code review feedback about making workspace filtering configurable based on the user's analysis intent. Addresses shawn-hurley's feedback on workspace filtering. Signed-off-by: tsanders <[email protected]>
Refactor to follow the same pattern as builtin provider, building results incrementally during filesystem traversal rather than collecting all file paths first then searching them. Changes: - Extract findImportStatementsInFile to search a single file - Call findImportStatementsInFile during filepath.Walk for each file - Accumulate results incrementally as files are processed - Simplify findImportStatements to delegate to findImportStatementsInFile for backward compatibility with tests - Improve logging to show files processed and imports found Benefits: - Single pass through filesystem (more efficient) - Follows established pattern in builtin provider - More memory efficient (don't hold all file paths in memory) - Cleaner separation: one function per file, one function for batches No functional changes - all 37 import search tests pass. Addresses shawn-hurley's feedback about following builtin provider pattern. Signed-off-by: tsanders <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR implements a new import-based search algorithm for the
nodejs.referencedcapability, replacing the previousworkspace/symbolapproach which had significant limitations with TypeScript LSP servers.Problem Statement
The
nodejs.referencedcapability was not working reliably for the following reasons:workspace/symbolonly returns top-level declarations in each fileCard,Button, etc. are not top-level declarations in their source packages - they're exported from re-export barrel filesnodejs.referencedANDbuiltin.filecontent, which was:Solution: Import-Based Search
This PR implements a three-step algorithm that leverages the fact that developers must import symbols before using them:
Algorithm
FAST SCAN (no LSP required)
import { Card } from "package"import Card from "package"import { Card, CardBody } from "package"GET DEFINITIONS (LSP
textDocument/definition)GET REFERENCES (LSP
textDocument/references)Why This Works Better
workspace/symbolcalls; regex scan is much fasterImplementation Details
New Code (
nodejs/service_client.go)Type Definitions:
Key Methods:
EvaluateReferenced()- Main entry point (~200 lines)findImportStatements()- Import detection (~95 lines)normalizeMultilineImports()- Preprocessing (~90 lines)Helper Functions:
isIdentifierChar()- Validates JavaScript identifier charactersmin()- Utility functionMethod Overrides:
GetDependencies()- Returns empty map (nodejs doesn't use external dependency providers)GetDependenciesDAG()- Returns empty mapEnhanced Logging (
base_service_client.go)workspace/symbolcallsfmt.PrintfstatementsTesting
Tested with PatternFly v5→v6 migration ruleset on
tackle2-uicodebase:builtin.filecontentworkaroundsCard,Button,Chip,Badge,Label, etc.Example Rule (Before vs After)
Before (Combo Rule):
After (Clean):
Benefits
workspace/symbolcallsbuiltin.filecontentFiles Changed
external-providers/generic-external-provider/pkg/server_configurations/nodejs/service_client.go- Main implementation (~480 lines added)lsp/base_service_client/base_service_client.go- Enhanced logginglsp/base_service_client/cmd_dialer.go- Minor updatesexternal-providers/dotnet-external-provider/go.mod- Dependency updatesexternal-providers/dotnet-external-provider/go.sum- Dependency updatesexternal-providers/golang-dependency-provider/go.mod- Dependency updatesexternal-providers/golang-dependency-provider/go.sum- Dependency updatesTotal: 7 files changed, 538 insertions(+), 104 deletions(-)
Impact on Existing Rules
This change enables existing
nodejs.referencedrules to work correctly. Rules that were using combo patterns withbuiltin.filecontentcan be simplified to use onlynodejs.referenced.No breaking changes - existing rules continue to work, but can now be simplified.
Next Steps
After this PR is merged:
analyzer-rule-generatorto stop creating combo rules (guide available)builtin.filecontentworkaroundsRelated Issues
Fixes issues with PatternFly v5→v6 migration rules where components were not being detected.
Signed-off-by: Todd Sanders [email protected]