Skip to content

Conversation

@brendan-kellam
Copy link
Contributor

@brendan-kellam brendan-kellam commented Jul 24, 2025

We were hitting a bug where the LLM was not including enough information about a reference in order for it to be accurately resolved (hence ambiguous reference). This happened under two circumstances:

  1. The LLM does not include enough of a path to distinguish two files with the same name. This might happen if repo X has two README.md files, a/README.md and b/README.md. If the LLM only uses @file:{README.md} in it's reference, then it's not clear what README.md it is referencing.

  2. There are files with the same exact name and path, but are in different repos. For example, README.md in repo X and README.md in repo Y have the same path and name, so @file:{README.md} is not clear what README.md it is referencing.

This PR addresses the second reason by instructing the LLM to include the repository name when referencing a file.

Summary by CodeRabbit

  • New Features

    • File references in chat messages now include both repository and file path information for improved clarity and context.
  • Bug Fixes

    • Enhanced matching and parsing of file references to ensure accurate identification based on both repository and path.
  • Tests

    • Updated all related tests to validate the new file reference format with repository and path fields.
  • Style

    • Improved formatting and examples in user-facing instructions to reflect the updated file reference format.

@coderabbitai
Copy link

coderabbitai bot commented Jul 24, 2025

Walkthrough

This change refactors the file reference format throughout the chat feature, replacing the single fileName property with a combination of repo and path properties. All related code, tests, schemas, regex patterns, and string formatting functions are updated to support this new repository-qualified file reference format, ensuring consistent parsing, creation, and rendering of references.

Changes

Files/Group Change Summary
.../chat/types.ts fileReferenceSchema updated: remove fileName, add repo and path properties.
.../chat/constants.ts FILE_REFERENCE_REGEX updated to capture repo and path separately using double colon syntax.
.../chat/utils.ts All file reference functions updated to use repo and path instead of fileName; signatures and logic changed accordingly.
.../chat/utils.test.ts Test data and expected outputs updated to use repo and path fields, and the new reference format in all relevant tests.
.../chat/useExtractReferences.ts Parsing logic updated to extract and use repo and path from file references.
.../chat/useExtractReferences.test.ts Test cases updated for new file reference format and expected extracted objects.
.../chat/components/chatThread/codeFoldingExtension.test.ts All FileReference objects in tests updated to use repo and path instead of fileName.
.../chat/components/chatThread/markdownRenderer.tsx remarkReferencesPlugin callback updated to accept and use repo and path for file references.
.../chat/components/chatThread/referencedSourcesListView.tsx resolveFileReference logic updated to match both repo and path suffixes for reference resolution.
.../chat/agent.ts Instructions and examples updated to require and illustrate repo and path in code references, replacing fileName.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant ChatUI
    participant ReferenceParser
    participant Utils
    participant Renderer

    User->>ChatUI: Submits message with file reference
    ChatUI->>ReferenceParser: Extract file references
    ReferenceParser->>Utils: Parse repo and path from message
    Utils-->>ReferenceParser: Return FileReference {repo, path, range}
    ReferenceParser-->>ChatUI: Annotated message parts
    ChatUI->>Renderer: Render message with references
    Renderer->>Utils: Format reference with repo and path
    Utils-->>Renderer: Formatted reference string
    Renderer-->>ChatUI: Display message with repository-qualified references
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~18 minutes

Poem

In the warren, code hops anew,
With repo and path, our references grew!
No more lone file names lost in the night,
Each bunny now knows the repo is right.
So let us all cheer, with paws in the air,
For code that’s precise, and changes that care! 🐇✨

Note

⚡️ Unit Test Generation - Beta

CodeRabbit's unit test generation is now available in Beta! Automatically generate comprehensive unit tests for your code changes, ensuring better test coverage and catching edge cases you might miss. Our AI analyzes your code structure and creates tests that follow best practices and your project's testing patterns. Learn more here, or just try it under ✨ Finishing Touches.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da8d49f and 0703766.

📒 Files selected for processing (10)
  • packages/web/src/features/chat/agent.ts (2 hunks)
  • packages/web/src/features/chat/components/chatThread/codeFoldingExtension.test.ts (18 hunks)
  • packages/web/src/features/chat/components/chatThread/markdownRenderer.tsx (1 hunks)
  • packages/web/src/features/chat/components/chatThread/referencedSourcesListView.tsx (1 hunks)
  • packages/web/src/features/chat/constants.ts (1 hunks)
  • packages/web/src/features/chat/types.ts (1 hunks)
  • packages/web/src/features/chat/useExtractReferences.test.ts (7 hunks)
  • packages/web/src/features/chat/useExtractReferences.ts (1 hunks)
  • packages/web/src/features/chat/utils.test.ts (3 hunks)
  • packages/web/src/features/chat/utils.ts (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*

📄 CodeRabbit Inference Engine (.cursor/rules/style.mdc)

Filenames should always be camelCase. Exception: if there are filenames in the same directory with a format other than camelCase, use that format to keep things consistent.

Files:

  • packages/web/src/features/chat/types.ts
  • packages/web/src/features/chat/useExtractReferences.ts
  • packages/web/src/features/chat/components/chatThread/markdownRenderer.tsx
  • packages/web/src/features/chat/components/chatThread/referencedSourcesListView.tsx
  • packages/web/src/features/chat/components/chatThread/codeFoldingExtension.test.ts
  • packages/web/src/features/chat/constants.ts
  • packages/web/src/features/chat/utils.test.ts
  • packages/web/src/features/chat/useExtractReferences.test.ts
  • packages/web/src/features/chat/agent.ts
  • packages/web/src/features/chat/utils.ts
🧠 Learnings (1)
packages/web/src/features/chat/utils.test.ts (2)

Learnt from: brendan-kellam
PR: #307
File: packages/backend/src/repoCompileUtils.ts:491-503
Timestamp: 2025-05-14T19:20:48.667Z
Learning: SSH/SCP-style Git URLs (like [email protected]:org/repo.git) throw errors when passed directly to JavaScript's new URL() constructor. A fix is to convert them using: new URL(ssh://${url.replace(':', '/')}) for non-HTTP URLs.

Learnt from: brendan-kellam
PR: #307
File: packages/backend/src/repoCompileUtils.ts:491-503
Timestamp: 2025-05-14T19:20:48.667Z
Learning: SSH/SCP-style Git URLs (like [email protected]:org/repo.git) throw errors when passed directly to JavaScript's new URL() constructor. A fix is to convert them using: new URL(ssh://${url.replace(':', '/')}) for non-HTTP URLs.

🧬 Code Graph Analysis (1)
packages/web/src/features/chat/types.ts (1)
packages/web/src/app/[domain]/search/components/codePreviewPanel/codePreview.tsx (1)
  • CodePreviewFile (29-36)
🔇 Additional comments (21)
packages/web/src/features/chat/types.ts (1)

28-29: Well-designed schema update for file reference disambiguation.

The replacement of fileName with separate repo and path properties effectively addresses the ambiguity issue described in the PR objectives. This structured approach enables precise identification of files across different repositories.

packages/web/src/features/chat/components/chatThread/referencedSourcesListView.tsx (1)

39-43: Improved file reference matching with repository context.

The updated logic correctly implements dual matching on both repository and path, addressing the disambiguation requirements. The use of endsWith for both comparisons is appropriate given that repository names and paths may have different prefixes in different contexts.

packages/web/src/features/chat/useExtractReferences.ts (1)

21-28: Correct implementation of repository-aware reference extraction.

The extraction logic properly handles the updated file reference format by extracting both repository and path information from the regex match and passing them to createFileReference with the correct parameter mapping.

packages/web/src/features/chat/components/chatThread/markdownRenderer.tsx (1)

48-57: Proper integration of repository context in markdown rendering.

The callback signature and createFileReference call have been correctly updated to handle the new repository-qualified file reference format. The display text logic appropriately continues to show only the filename for better user experience.

packages/web/src/features/chat/components/chatThread/codeFoldingExtension.test.ts (1)

20-27: Comprehensive test data updates for new file reference structure.

The test data has been systematically updated to include the repo property and use path instead of fileName, maintaining consistency with the updated schema while preserving all test logic and assertions.

packages/web/src/features/chat/constants.ts (1)

3-7: LGTM! Regex pattern correctly captures repository and path separately.

The updated FILE_REFERENCE_REGEX properly implements the new format @file:{repoName::fileName:startLine-endLine} with distinct capture groups for repository and file path, which addresses the PR objective of disambiguating file references across repositories.

packages/web/src/features/chat/agent.ts (3)

203-216: LGTM! Instructions updated to reflect new repository-qualified file reference format.

The code reference requirement section correctly demonstrates the new format using repo and path properties, with clear examples showing both simple file references and those with line ranges.


210-215: Good examples of correct and incorrect usage patterns.

The examples clearly illustrate the required format and common mistakes to avoid, which will help the LLM generate properly formatted file references consistently.


224-224: Example answer structure properly demonstrates the new format.

The example correctly shows how to use the repository-qualified file reference format in practice with both repository name and file path included.

packages/web/src/features/chat/utils.test.ts (3)

15-28: LGTM! Test cases properly updated for new file reference format.

The fileReferenceToString tests correctly verify both simple file references and those with line ranges using the new repo and path properties instead of fileName.


31-45: Good verification that formatted strings match the regex.

These tests ensure that the output from fileReferenceToString is compatible with FILE_REFERENCE_REGEX, maintaining consistency between string generation and parsing.


246-330: Comprehensive repairCitations test coverage for new format.

All repairCitations test cases have been properly updated to include repository prefixes in both input and expected output strings, ensuring the citation repair logic works correctly with the new format.

packages/web/src/features/chat/useExtractReferences.test.ts (3)

14-46: LGTM! Test properly verifies extraction of repository-qualified file references.

The test correctly validates that useExtractReferences can parse the new format and extract separate repo and path properties, with proper handling of line ranges.


55-87: Good coverage for reasoning content extraction.

This test ensures that file reference extraction works correctly for reasoning-type message parts, maintaining consistency across different content types.


96-164: Comprehensive multi-part message testing.

The test thoroughly validates extraction from multiple message parts with different content types, ensuring all repository-qualified references are properly identified and parsed.

packages/web/src/features/chat/utils.ts (6)

213-215: LGTM! File reference ID generation updated for repository format.

The getFileReferenceId function correctly generates unique IDs using the new repo::path format, ensuring proper identification of file references across different repositories.


217-219: Good implementation of repository-qualified string format.

The fileReferenceToString function properly constructs the new @file:{repo::path} format with optional line ranges, maintaining consistency with the updated regex pattern.


221-237: Well-structured createFileReference function update.

The function correctly accepts separate repo and path parameters and properly constructs FileReference objects with the new format, maintaining proper range handling logic.


132-132: Slate content conversion properly updated.

The slateContentToString function correctly uses the new fileReferenceToString signature with repo and path properties from the mention data.


245-258: Portable markdown conversion maintains display logic.

The convertLLMOutputToPortableMarkdown function correctly handles the new regex capture groups while preserving the existing display name logic for markdown links.


298-301: Citation repair patterns updated for new format.

The repairCitations function's regex patterns correctly handle the repository-qualified format while maintaining the same repair logic for common formatting issues.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bkellam/fix_ambiguous_references

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions
Copy link

@brendan-kellam your pull request is missing a changelog!

@brendan-kellam brendan-kellam merged commit dbd8ef7 into main Jul 24, 2025
6 checks passed
@brendan-kellam brendan-kellam deleted the bkellam/fix_ambiguous_references branch July 24, 2025 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants