Skip to content

Conversation

@tsanders-rh
Copy link
Owner

Overview

This PR implements a new import-based search algorithm for the nodejs.referenced capability, replacing the previous workspace/symbol approach which had significant limitations with TypeScript LSP servers.

Problem Statement

The nodejs.referenced capability was not working reliably for the following reasons:

  1. LSP Limitation: TypeScript LSP's workspace/symbol only returns top-level declarations in each file
  2. Export Structure: Exported components like PatternFly's Card, Button, etc. are not top-level declarations in their source packages - they're exported from re-export barrel files
  3. False Negatives: This caused imported symbols to not be detected, resulting in rules failing to match
  4. Inefficient Workaround: Previous solution used combo rules combining nodejs.referenced AND builtin.filecontent, which was:
    • Semantically incorrect (text pattern matching instead of semantic analysis)
    • Slower (required both LSP calls and regex scans)
    • Fragile (required maintaining regex patterns for each symbol)

Solution: Import-Based Search

This PR implements a three-step algorithm that leverages the fact that developers must import symbols before using them:

Algorithm

  1. FAST SCAN (no LSP required)

    • Scan all TypeScript/JavaScript files for import statements containing the pattern
    • Uses regex matching on normalized content
    • Handles multiline imports via preprocessing
    • Supports:
      • Named imports: import { Card } from "package"
      • Default imports: import Card from "package"
      • Multiple symbols: import { Card, CardBody } from "package"
    • Correctly identifies word boundaries to avoid false matches
  2. GET DEFINITIONS (LSP textDocument/definition)

    • For each import location found, use LSP to locate where the symbol is actually defined
    • Opens files in LSP as needed
    • Tracks processed definitions to avoid duplicates
  3. GET REFERENCES (LSP textDocument/references)

    • For each definition, use LSP to find all usage locations in the workspace
    • Filters results to workspace folders only
    • Excludes dependency folders
    • Deduplicates incidents

Why This Works Better

  • Accurate: Finds actual semantic usage, not just text patterns
  • Faster: Avoids expensive workspace/symbol calls; regex scan is much faster
  • Comprehensive: Correctly handles all import patterns including multiline
  • Maintainable: No need for per-symbol regex patterns

Implementation Details

New Code (nodejs/service_client.go)

Type Definitions:

// ImportLocation tracks where a symbol is imported
type ImportLocation struct {
    fileURI  string
    langID   string
    position protocol.Position
    line     string
}

// fileInfo tracks a source file
type fileInfo struct {
    path   string
    langID string
}

Key Methods:

  1. EvaluateReferenced() - Main entry point (~200 lines)

    • Comprehensive docstring explaining the algorithm
    • Scans for imports
    • Opens files in LSP
    • Gets definitions and references
    • Filters and deduplicates results
  2. findImportStatements() - Import detection (~95 lines)

    • Regex-based search for import statements
    • Handles both named and default imports
    • Correctly identifies word boundaries
    • Returns positions for LSP definition lookup
  3. normalizeMultilineImports() - Preprocessing (~90 lines)

    • Converts multiline imports to single lines
    • Preserves string literals
    • Handles brace depth tracking
    • Enables regex matching on formatted code
  4. Helper Functions:

    • isIdentifierChar() - Validates JavaScript identifier characters
    • min() - Utility function
  5. Method Overrides:

    • GetDependencies() - Returns empty map (nodejs doesn't use external dependency providers)
    • GetDependenciesDAG() - Returns empty map

Enhanced Logging (base_service_client.go)

  • Added structured logging for workspace/symbol calls
  • Logs query, result count, and error states
  • Removed debug fmt.Printf statements
  • Helps diagnose fallback behavior

Testing

Tested with PatternFly v5→v6 migration ruleset on tackle2-ui codebase:

  • 56 rules for React component migrations
  • Successfully detects imports and references for all PatternFly components
  • No longer requires builtin.filecontent workarounds
  • Verified with components: Card, Button, Chip, Badge, Label, etc.

Example Rule (Before vs After)

Before (Combo Rule):

when:
  and:
  - nodejs.referenced:
      pattern: Card
  - builtin.filecontent:
      pattern: import.*\{[^}]*\bCard\b[^}]*\}.*from ['"]@patternfly/react-core['"]
      filePattern: \.(j|t)sx?$

After (Clean):

when:
  nodejs.referenced:
    pattern: Card

Benefits

  1. Accurate Detection - Finds actual semantic usage regardless of export structure
  2. Faster Execution - Avoids expensive workspace/symbol calls
  3. Eliminates Workarounds - No need for combo rules with builtin.filecontent
  4. Better Semantics - True semantic analysis instead of text pattern matching
  5. Comprehensive Docs - Detailed docstrings for all major methods
  6. Maintainable - Clear algorithm, well-structured code

Files Changed

  • external-providers/generic-external-provider/pkg/server_configurations/nodejs/service_client.go - Main implementation (~480 lines added)
  • lsp/base_service_client/base_service_client.go - Enhanced logging
  • lsp/base_service_client/cmd_dialer.go - Minor updates
  • external-providers/dotnet-external-provider/go.mod - Dependency updates
  • external-providers/dotnet-external-provider/go.sum - Dependency updates
  • external-providers/golang-dependency-provider/go.mod - Dependency updates
  • external-providers/golang-dependency-provider/go.sum - Dependency updates

Total: 7 files changed, 538 insertions(+), 104 deletions(-)

Impact on Existing Rules

This change enables existing nodejs.referenced rules to work correctly. Rules that were using combo patterns with builtin.filecontent can be simplified to use only nodejs.referenced.

No breaking changes - existing rules continue to work, but can now be simplified.

Next Steps

After this PR is merged:

  1. Update analyzer-rule-generator to stop creating combo rules (guide available)
  2. Simplify existing rulesets to remove builtin.filecontent workarounds
  3. Consider applying similar import-based approach to other language providers if needed

Related Issues

Fixes issues with PatternFly v5→v6 migration rules where components were not being detected.


Signed-off-by: Todd Sanders [email protected]

This commit implements a new import-based search algorithm for the
nodejs.referenced capability, replacing the previous workspace/symbol
approach which had significant limitations with TypeScript LSP servers.

Problem
-------
The nodejs.referenced capability was not working reliably because:
1. TypeScript LSP workspace/symbol only returns top-level declarations
2. Exported components like PatternFly's Card, Button, etc. are not
   top-level declarations in their source packages
3. This caused false negatives where imported symbols were not detected
4. Previous workaround used combo rules with builtin.filecontent, but
   this was inefficient and semantically incorrect

Solution
--------
Implement import-based search using a three-step algorithm:

1. FAST SCAN: Scan all TypeScript/JavaScript files for import statements
   containing the pattern using regex (no LSP required)
   - Handles multiline imports via normalization
   - Supports named imports: import { Card } from "package"
   - Supports default imports: import Card from "package"
   - Correctly identifies word boundaries to avoid false matches

2. GET DEFINITIONS: For each import found, use LSP textDocument/definition
   to locate where the symbol is actually defined

3. GET REFERENCES: For each definition, use LSP textDocument/references
   to find all usage locations in the workspace

This approach is both faster and more accurate than workspace/symbol
search, as it leverages the fact that developers must import symbols
before using them.

Implementation Details
---------------------
- Added ImportLocation and fileInfo types to track file metadata
- Implemented findImportStatements() for regex-based import detection
- Implemented normalizeMultilineImports() to handle formatted code
- Added helper functions for identifier validation
- Added GetDependencies/GetDependenciesDAG overrides to prevent errors
- Enhanced logging in base service client for better debugging
- Removed debug printf statements, using structured logging only

Files Changed
-------------
- nodejs/service_client.go: Main implementation (~480 lines added)
- base_service_client.go: Enhanced logging for workspace/symbol calls
- cmd_dialer.go: Minor updates
- go.mod/go.sum: Dependency updates

Testing
-------
Tested with PatternFly v5→v6 migration ruleset on tackle2-ui codebase:
- 56 rules for React component migrations
- Successfully detects imports and references for all PatternFly components
- No longer requires builtin.filecontent workarounds

Benefits
--------
1. Accurate detection of imported symbols regardless of export structure
2. Faster execution by avoiding expensive workspace/symbol calls
3. Eliminates need for combo rules with builtin.filecontent
4. Better semantic correctness - finds actual usage, not just text patterns
5. Comprehensive documentation with detailed docstrings

Signed-off-by: Todd Sanders <[email protected]>
Signed-off-by: tsanders <[email protected]>
Resolved conflict in nodejs/service_client.go by keeping the import-based
search implementation which supersedes the previous file batching optimization
from PR konveyor#970.

The import-based search provides better semantic accuracy by:
- Finding imports first (regex scan)
- Using LSP definitions to locate symbols
- Using LSP references to find all usages

This is fundamentally different from and more accurate than the workspace/symbol
approach that was being optimized in the upstream changes.

Signed-off-by: Todd Sanders <[email protected]>
Signed-off-by: tsanders <[email protected]>
This commit adds test coverage for the nodejs.referenced import-based
search functionality introduced in the main PR.

Changes:
- Export ImportLocation struct fields (FileURI, LangID, Position, Line)
  to enable testing
- Add test helper functions to expose private methods for testing:
  - NormalizeMultilineImportsPublic()
  - FindImportStatementsPublic()
  - IsIdentifierCharPublic()
- Create comprehensive test suite (import_search_test.go) with 38 test
  cases covering:
  - Multiline import normalization (5 tests)
  - Import statement finding (8 tests)
  - Identifier character detection (12 tests)
  - Word boundary matching (6 tests)
  - Edge cases (7 tests)

All tests pass (100% success rate) and validate core import detection
logic including word boundaries, multiline imports, and various quote/
whitespace styles.

Signed-off-by: tsanders <[email protected]>
@tsanders-rh tsanders-rh force-pushed the fix/nodejs-referenced-import-search branch 3 times, most recently from f66148c to 1ed790f Compare November 20, 2025 12:26
This commit updates the demo baseline for violations/incidents to reflect
improved precision in the import-based search implementation for the
nodejs.referenced condition.

Changes:
- demo-output.yaml: Removed incorrect references to quarkus-1-6-2-jar-exploded
  files that don't exist in fresh builds. One fewer nodejs incident due to
  improved word boundary detection. The rule node-sample-rule-001 no longer
  matches file:///examples/nodejs/test_a.ts at line 5 because the improved
  detection correctly identifies that 'greeter' is not the same as 'greet'.
- demo-dep-output.yaml: Updated to include java-project dependencies that
  are correctly generated by fresh builds (matches main branch baseline).

Signed-off-by: tsanders <[email protected]>
@tsanders-rh tsanders-rh force-pushed the fix/nodejs-referenced-import-search branch from 1ed790f to 6eb0d54 Compare November 20, 2025 12:38
Accessing sc.Config.WorkspaceFolders[0] without verifying the slice is non-empty could cause a panic. This adds a defensive check that returns a clear error message if no workspace folders are configured.

Signed-off-by: $(git config user.name) <$(git config user.email)>
Signed-off-by: tsanders <[email protected]>
The 100ms sleep was a code smell relying on timing rather than proper
synchronization. This replaces it with retry logic that attempts the
textDocument/definition call up to 3 times with exponential backoff
(50ms, 100ms, 200ms). This approach is more robust across different
LSP server implementations and varying system loads.

Signed-off-by: $(git config user.name) <$(git config user.email)>
Signed-off-by: tsanders <[email protected]>
Add defensive checks before accessing sc.BaseConfig.WorkspaceFolders[0]
at lines 360 and 431 to prevent potential runtime panics. These checks
ensure the slice is non-empty before accessing the first element.

Signed-off-by: $(git config user.name) <$(git config user.email)>
Signed-off-by: tsanders <[email protected]>
Replace simple backslash check with proper escape counting. The previous
check (content[i-1] != '\\') didn't correctly handle double-escaped quotes
like \\". This fix counts preceding backslashes - if there's an even number
(including 0), the quote is not escaped. This correctly handles cases like:
- "test\"" (escaped, odd backslashes)
- "test\\"" (not escaped, even backslashes)

Signed-off-by: $(git config user.name) <$(git config user.email)>
Signed-off-by: tsanders <[email protected]>
Replace strings.Contains check with regex using word boundaries to avoid
false positives. The previous check matched "from" anywhere in the string,
including inside identifiers (e.g., "dataFromAPI") or string literals.

The new regex pattern \bfrom\s+["'] ensures:
- \b matches word boundary (not part of a larger identifier)
- from matches the literal keyword
- \s+ matches one or more whitespace characters
- ["'] matches a quote character

The regex is compiled once at package level for efficiency.

Signed-off-by: $(git config user.name) <$(git config user.email)>
Signed-off-by: tsanders <[email protected]>
Fix critical consistency bug where Config.WorkspaceFolders was initialized
at line 58 but BaseConfig.WorkspaceFolders was never synchronized. This
caused lines 363 and 434 to use an uninitialized BaseConfig.WorkspaceFolders,
leading to potential silent failures in reference filtering logic.

After initializing the base client, explicitly synchronize
BaseConfig.WorkspaceFolders from Config.WorkspaceFolders to ensure both
configurations remain consistent throughout the code.

Signed-off-by: $(git config user.name) <$(git config user.email)>
Signed-off-by: tsanders <[email protected]>
…erference

In normalizeMultilineImports, the "from" detection on newlines was using
result.String() which scans the entire accumulated file content. This could
match "from" tokens from previous imports or other code that falls within
the 50-character window, causing the current import to be prematurely
treated as complete.

Now restrict the check to content[importStart:i] to inspect only the
current import statement. This avoids spurious matches from earlier code
and eliminates the O(n²) behavior from repeatedly calling result.String().

Signed-off-by: $(git config user.name) <$(git config user.email)>
Signed-off-by: tsanders <[email protected]>
Extends import detection to handle all valid JavaScript/TypeScript patterns:
- Namespace imports: import * as Pattern from "package"
- Mixed default + named: import React, { useState } from "react"
- Mixed default + namespace: import React, * as All from "react"
- TypeScript type imports: import type { Card } from "package"

Updated regex pattern with 5 capture groups:
1. Default import (when mixed with named/namespace)
2. Named imports (braced list)
3. Default import (when standalone)
4. Namespace import
5. Package name

The regex now includes optional "type" keyword support for TypeScript
type-only imports, ensuring comprehensive coverage of modern JavaScript
and TypeScript import patterns.

Pattern matching logic updated to:
- Check all three import types (default, named, namespace)
- Use exact match for single identifiers (default, namespace)
- Use substring match for comma-separated lists (named imports)
- Added inline comments explaining matching strategy

Test coverage (25 test cases):
- Namespace imports (single line, multiline, multiple per file)
- Mixed default + named imports (both parts searchable, multiline)
- Mixed default + namespace imports (rare but valid)
- TypeScript type imports (named, default, namespace, multiline)
- Side-effect imports (correctly ignored)
- Word boundaries (avoid partial matches like "Card" vs "CardHelper")
- Special characters in package names (@scoped/packages)
- Stress test with 20 named imports in single statement
- Pattern not found scenarios

All 25 tests passing. Build successful.

Addresses code review feedback item #1.

Signed-off-by: tsanders <[email protected]>
Fixes a critical bug where import detection would miss valid matches
when the pattern appeared as a prefix of another symbol on the same line.

Example that was failing:
  import { CardFooter, Card } from '@patternfly/react-core';

When searching for "Card":
- Previous: Would find "Card" in "CardFooter", fail word boundary check,
  and stop searching - missing the standalone "Card" later on the line
- Fixed: Now searches all occurrences on the line until finding a match
  with valid word boundaries

Changes:
- Added inner loop to check multiple occurrences of pattern per line
- Added `found` flag to break from outer line-searching loop once matched
- Prevents false negatives when pattern is substring of other identifiers

Test coverage:
- Pattern as prefix: "CardFooter, Card" - now correctly finds "Card"
- Pattern in middle: "Button, Card, CardBody" - correctly finds "Card"

All 27 tests passing.

Addresses CodeRabbit review feedback.

Signed-off-by: tsanders <[email protected]>
Addresses CodeRabbit feedback:
- Remove unused LOG_FILE variable
- Replace with CONSOLE_LOG for consistency
- Update all references to use correct log file variables

The script now correctly references console.log throughout.

Signed-off-by: tsanders <[email protected]>
These files are local development artifacts:
- CODE_REVIEW_RESPONSE_PLAN.md - planning document
- NODEJS_REFERENCED_ANALYSIS.md - analysis notes
- test-patternfly-direct.sh - local test script

Keeping them local only.

Signed-off-by: tsanders <[email protected]>
Extracts complex logic into well-documented helper functions for better
maintainability and testability.

**New helper functions:**
- `isStringDelimiter()` - Checks for quotes and backticks
- `isEscapedQuote()` - Counts preceding backslashes to detect escaped quotes
- `hasCompletedImportStatement()` - Detects "from" keyword to identify complete imports
- `normalizeImportStatement()` - Processes individual import statement normalization

**Main function improvements:**
- `normalizeMultilineImports()` now delegates to helpers
- Added `\r\n` → `\n` preprocessing for cross-platform compatibility
- Simplified switch statement in import processing
- Removed deeply nested "from" detection logic

**Benefits:**
- Each helper is independently testable
- Main function logic is clearer (26 lines vs 100 lines)
- Better separation of concerns
- Improved code documentation

**Cross-platform line ending handling:**
- Normalizes Windows `\r\n` to Unix `\n` at start
- Simplifies processing to only check `\n` in switch
- Added comments explaining line ending approach

**Test coverage:**
Added 5 new test cases:
- Windows line endings (CRLF) normalization
- Mixed line endings normalization
- Escaped quotes in import strings
- Double backslash before quote
- Template literals with backticks

All 10 normalization tests passing + 27 import tests passing.

Addresses code review feedback items #2, konveyor#3, and konveyor#4.

Signed-off-by: tsanders <[email protected]>
Follow Go testing conventions by accessing private methods directly
from tests in the same package. This eliminates unnecessary public
wrappers that were only used for testing.

Changes:
- Changed import_search_test.go package from nodejs_test to nodejs
- Removed FileInfo exported struct
- Removed NormalizeMultilineImportsPublic wrapper
- Removed FindImportStatementsPublic wrapper
- Removed IsIdentifierCharPublic wrapper
- Updated all test calls to use private methods directly

All import search tests pass (37 test cases across 5 test functions).

Addresses code review feedback on public vs private method exposure.

Signed-off-by: tsanders <[email protected]>
Make workspace filtering conditional based on the analysis mode setting.
This allows users to control whether they want comprehensive analysis
or source-only analysis.

Behavior:
- source-only mode: Filter references to workspace folders only,
  excluding dependency folders (existing behavior preserved)
- full mode: Include all references regardless of location

Changes:
- Added AnalysisMode field to NodeServiceClient
- Store analysis mode from InitConfig during initialization
- Apply workspace filtering only when AnalysisMode is source-only
- Log analysis mode in initialization output

This addresses the code review feedback about making workspace
filtering configurable based on the user's analysis intent.

Addresses shawn-hurley's feedback on workspace filtering.

Signed-off-by: tsanders <[email protected]>
Refactor to follow the same pattern as builtin provider, building results
incrementally during filesystem traversal rather than collecting all file
paths first then searching them.

Changes:
- Extract findImportStatementsInFile to search a single file
- Call findImportStatementsInFile during filepath.Walk for each file
- Accumulate results incrementally as files are processed
- Simplify findImportStatements to delegate to findImportStatementsInFile
  for backward compatibility with tests
- Improve logging to show files processed and imports found

Benefits:
- Single pass through filesystem (more efficient)
- Follows established pattern in builtin provider
- More memory efficient (don't hold all file paths in memory)
- Cleaner separation: one function per file, one function for batches

No functional changes - all 37 import search tests pass.

Addresses shawn-hurley's feedback about following builtin provider pattern.

Signed-off-by: tsanders <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants