Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a new searcher that enables finding related documents using nearest neighbor search based on document embeddings. The searcher fetches an embedding from a source document and performs a nearest neighbor query to find similar documents.
Changes:
- Added
RelatedDocumentsByNearestNeighborSearcherto enable "find similar documents" functionality in a single query - Added comprehensive test coverage for the new searcher with various scenarios
- Updated ABI spec to include the new public API
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| container-search/src/main/java/com/yahoo/search/searchers/RelatedDocumentsByNearestNeighborSearcher.java | New searcher that fetches embeddings from source documents and performs nearest neighbor search to find related documents |
| container-search/src/test/java/com/yahoo/search/searchers/RelatedDocumentsByNearestNeighborSearcherTestCase.java | Comprehensive test suite covering error cases, query construction, and parameter handling |
| container-search/abi-spec.json | ABI specification update to expose the new searcher as public API |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Result result = execution.search(fetchQuery); | ||
| execution.fill(result, summary); | ||
|
|
||
| if (result.hits().size() < 1) { | ||
| return null; | ||
| } |
There was a problem hiding this comment.
The fetchEmbedding method should check for errors in the result from execution.search(fetchQuery) before accessing result.hits(). If the fetch query fails (e.g., due to backend communication errors), the method should handle the error appropriately rather than proceeding to check hit count. Consider adding a check like: if (result.hits().getError() != null) return null; after line 100.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
| void testDocumentNotFoundReturnsError() { | ||
| var searcher = new RelatedDocumentsByNearestNeighborSearcher(); | ||
| var query = new Query("?relatedTo.id=doc1&relatedTo.embeddingField=embedding&relatedTo.queryTensorName=q"); | ||
| var result = executeWithMockBackend(searcher, query); | ||
|
|
||
| assertNotNull(result.hits().getError(), "Expected error but got none"); | ||
| assertTrue(result.hits().getError().getDetailedMessage().contains("Could not find document"), | ||
| "Error message was: " + result.hits().getError().getDetailedMessage()); | ||
| } |
There was a problem hiding this comment.
There is no test coverage for the case where the embedding field exists but is not a Tensor type (line 109-112 in the main code). Consider adding a test case that verifies the behavior when the field contains a non-Tensor value, ensuring it returns an appropriate error message to the user.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.