Fixes for a couple of issues discovered with vector support#2433
Fixes for a couple of issues discovered with vector support#2433lvca merged 4 commits intoArcadeData:mainfrom
Conversation
There was a problem hiding this comment.
Summary of Changes
Hello @odysseaspenta, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request addresses minor indexing issues related to vector storage and retrieval within ArcadeDB. It includes corrections to how vector neighbors are identified in SQL functions and ensures that stream parsing limits are correctly applied during text embedding imports. Additionally, new integration tests have been added to validate these vector functionalities, improving the robustness of the vector indexing system.
Highlights
- Vector Neighbor Query Fix: Corrected the method used for finding vector neighbors in SQL functions, switching from
findNeighborsFromVectortofindNeighborsFromIdto ensure correct indexing. - Stream Parsing Limit Correction: Fixed an issue in the text embeddings importer where the
parsingLimitEntrieswas not correctly applied to the input stream, ensuring that the limit is now properly enforced. - New Vector Import Integration Tests: Introduced a new test file (
FastTextVectorImportTest.java) with two integration tests to validate thevectorNeighborsSQL function and theparsingLimitEntriesoption during vector data import.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request addresses two issues related to vector indexing: one where vectorNeighbors failed with an ID, and another where parsingLimitEntries was ignored during import. The changes correctly fix the issue with parsingLimitEntries. However, the fix for vectorNeighbors introduces a regression, as it no longer supports lookups by vector. I've provided a critical suggestion to handle both ID and vector lookups correctly. Additionally, I've suggested a medium-severity refactoring for the new test class to improve maintainability by using the existing TestHelper base class.
|
|
||
| import static org.assertj.core.api.Assertions.assertThat; | ||
|
|
||
| public class FastTextVectorImportTest |
There was a problem hiding this comment.
This test class contains boilerplate code for database setup and teardown that is duplicated in each test method. To improve maintainability and adhere to the project's testing patterns, consider extending com.arcadedb.TestHelper. This will handle database lifecycle management automatically.
You would need to:
- Extend
TestHelper:public class FastTextVectorImportTest extends TestHelper - Override
getDatabasePath():return "target/databases/test-fasttextsmall"; - Remove the manual database setup/teardown logic from the test methods and use the
databasefield fromTestHelper.
|
That's great! Thanks @odysseaspenta! |
* Fixed a couple of issues with creating a vector index of embeddings and querying the index. * Cleaned-up the test case by using the TestHelper (cherry picked from commit 531fcf6)
What does this PR do?
It fixes a couple of small issues with the indexing of vectors.
Motivation
I was trying out the support for the storage and retrievals of vectors in ArcadeDB and came across a couple of minor issues.
Related issues
It fixes issues #2415 and #2432
Additional Notes
Anything else we should know when reviewing?
Checklist
mvn clean packagecommand