Skip to content

Fixes for a couple of issues discovered with vector support#2433

Merged
lvca merged 4 commits intoArcadeData:mainfrom
odysseaspenta:vectors
Aug 19, 2025
Merged

Fixes for a couple of issues discovered with vector support#2433
lvca merged 4 commits intoArcadeData:mainfrom
odysseaspenta:vectors

Conversation

@odysseaspenta
Copy link
Contributor

What does this PR do?

It fixes a couple of small issues with the indexing of vectors.

Motivation

I was trying out the support for the storage and retrievals of vectors in ArcadeDB and came across a couple of minor issues.

Related issues

It fixes issues #2415 and #2432

Additional Notes

Anything else we should know when reviewing?

Checklist

  • I have run the build using mvn clean package command
  • My unit tests cover both failure and success scenarios

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @odysseaspenta, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses minor indexing issues related to vector storage and retrieval within ArcadeDB. It includes corrections to how vector neighbors are identified in SQL functions and ensures that stream parsing limits are correctly applied during text embedding imports. Additionally, new integration tests have been added to validate these vector functionalities, improving the robustness of the vector indexing system.

Highlights

  • Vector Neighbor Query Fix: Corrected the method used for finding vector neighbors in SQL functions, switching from findNeighborsFromVector to findNeighborsFromId to ensure correct indexing.
  • Stream Parsing Limit Correction: Fixed an issue in the text embeddings importer where the parsingLimitEntries was not correctly applied to the input stream, ensuring that the limit is now properly enforced.
  • New Vector Import Integration Tests: Introduced a new test file (FastTextVectorImportTest.java) with two integration tests to validate the vectorNeighbors SQL function and the parsingLimitEntries option during vector data import.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@odysseaspenta odysseaspenta changed the title Vectors Fixes for a couple of issues discovered with vector support Aug 19, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses two issues related to vector indexing: one where vectorNeighbors failed with an ID, and another where parsingLimitEntries was ignored during import. The changes correctly fix the issue with parsingLimitEntries. However, the fix for vectorNeighbors introduces a regression, as it no longer supports lookups by vector. I've provided a critical suggestion to handle both ID and vector lookups correctly. Additionally, I've suggested a medium-severity refactoring for the new test class to improve maintainability by using the existing TestHelper base class.


import static org.assertj.core.api.Assertions.assertThat;

public class FastTextVectorImportTest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test class contains boilerplate code for database setup and teardown that is duplicated in each test method. To improve maintainability and adhere to the project's testing patterns, consider extending com.arcadedb.TestHelper. This will handle database lifecycle management automatically.

You would need to:

  1. Extend TestHelper: public class FastTextVectorImportTest extends TestHelper
  2. Override getDatabasePath(): return "target/databases/test-fasttextsmall";
  3. Remove the manual database setup/teardown logic from the test methods and use the database field from TestHelper.

@lvca lvca self-requested a review August 19, 2025 14:28
@lvca lvca added this to the 25.8.1 milestone Aug 19, 2025
@lvca lvca added the bug label Aug 19, 2025
@lvca
Copy link
Contributor

lvca commented Aug 19, 2025

That's great! Thanks @odysseaspenta!

@lvca lvca merged commit 531fcf6 into ArcadeData:main Aug 19, 2025
3 checks passed
robfrank pushed a commit that referenced this pull request Sep 10, 2025
* Fixed a couple of issues with creating a vector index of embeddings and querying the index.

* Cleaned-up the test case by using the TestHelper

(cherry picked from commit 531fcf6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants