Skip to content

Add quantization state reader and writer#1997

Merged
jmazanec15 merged 47 commits intoopensearch-project:mainfrom
ryanbogan:quantization_state_writer_reader
Sep 5, 2024
Merged

Add quantization state reader and writer#1997
jmazanec15 merged 47 commits intoopensearch-project:mainfrom
ryanbogan:quantization_state_writer_reader

Conversation

@ryanbogan
Copy link
Member

@ryanbogan ryanbogan commented Aug 21, 2024

Description

Adds a quantization state writer and reader for native engines

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
@ryanbogan
Copy link
Member Author

I still need to write unit tests, but raising now for initial feedback

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
Signed-off-by: Ryan Bogan <rbogan@amazon.com>
Signed-off-by: Ryan Bogan <rbogan@amazon.com>
Signed-off-by: Ryan Bogan <rbogan@amazon.com>
Signed-off-by: Ryan Bogan <rbogan@amazon.com>
Signed-off-by: Ryan Bogan <rbogan@amazon.com>
Signed-off-by: Ryan Bogan <rbogan@amazon.com>
@ryanbogan
Copy link
Member Author

ryanbogan commented Aug 22, 2024

I'll add end to end integration tests to fully test the quantization reader and writer once the integration is complete with indexing and query flow. Additionally, I'll add another unit test for the second read method with the PR for integration to the query flow, since the method is currently incomplete.

@ryanbogan ryanbogan requested a review from jmazanec15 August 22, 2024 19:07
Signed-off-by: Ryan Bogan <rbogan@amazon.com>
*
* @param state the read state to read from
*/
public Map<String, byte[]> read(SegmentReadState state) {
Copy link
Collaborator

@Vikasht34 Vikasht34 Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's Reafctor the code between both reads. Most of the things are common

Copy link
Collaborator

@navneet1v navneet1v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that writer is only added for flush and not for merge. Will there be separate PR for that?

@ryanbogan
Copy link
Member Author

I see that writer is only added for flush and not for merge. Will there be separate PR for that?

I thought that the merge method eventually uses flush based on https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/codec/KNN990Codec/NativeEngines990KnnVectorsWriter.java#L90

If that is not the case, I can add in separate PR

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
readQuantizationStateInfos.put(fieldName, stateBytes);
}
} catch (Exception e) {
return readQuantizationStateInfos;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming an exception is expected if the file is not found. In that case the catch all is risky here, change to catch a specific exception and then return Collections.emptyMap(). Creating a map on line 52 is not ideal as it allocates memory, Collections.emptyMap will not. Probably move the new map() to line 55.

For all other exceptions log an error and please throw

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking to return empty map for all type of errors , and let's not load the cache , we really don't want to throw the excpetion and break and query flow. let it run usual.

Collections.emptyMap() also allocates memory in the same way new does , the only differnce is it's singleton and does not allocate every time , but here it does not matter.

Copy link
Collaborator

@shatejas shatejas Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking to return empty map for all type of errors , and let's not load the cache , we really don't want to throw the excpetion and break and query flow. let it run usual.

Why? Silent failures are not good

Will it return the right results if there is an exception while reading and an empty map is returned?

Collections.emptyMap() also allocates memory in the same way new does , the only differnce is it's singleton and does not allocate every time , but here it does not matter.

For every request that is does not need quantization its creating new map which allocates memory, so that singleton is needed

Copy link
Collaborator

@Vikasht34 Vikasht34 Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read functionality is getting called from NativeReader Constructor for both cases , like with quantization and without , without quantization we are returing empty map and with also in case of any error empty hash map basically don't load the hash map to load cache. Which in my view it's fine ..becuase flow would be same for both. Having said that I don't have any strong preference , I would be fine for both.

Segment reader is opened only once , and it's not going to open for each request , so it does not matter.

Copy link
Collaborator

@shatejas shatejas Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a public function, so the code needs to be defensive

Lets make sure we aren't assigning memory where its not needed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vikasht34 and @ryanbogan if we are catching the exception we should have logs too on what was the exception. Lets say if there is genuinely some error, then just eating up the exception is not correct.

More over, we are using IndexInput's exception to say if the file is present or not. Why not just get all the segmentfiles from the segment info and check if the desired file is present or not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to have another check in both the read functions which check if QS file is present or not. and then safely return an empty map. and then we don't need to catch all the exceptions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make Sense ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address in follow-up

String quantizationStateFileName = getQuantizationStateFileName(segmentReadState);
int fieldNumber = segmentReadState.fieldInfos.fieldInfo(field).getFieldNumber();

try (IndexInput input = segmentReadState.directory.openInput(quantizationStateFileName, IOContext.READ)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a try without a catch, The same applies here, return null for a specific exception and throw for others

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we can have it , but this piece of code is getting called from KnnWeight only when QuantizationParm is not null.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for this #1997 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File not found excpetion will never be for this case , as in KNN wight we already have check before calling this function

Copy link
Collaborator

@shatejas shatejas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, some minor changes

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
This reverts commit 2accfb1.

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
@ryanbogan ryanbogan force-pushed the quantization_state_writer_reader branch from f00b607 to a6c87e3 Compare September 4, 2024 22:16
Signed-off-by: Ryan Bogan <rbogan@amazon.com>
Signed-off-by: Ryan Bogan <rbogan@amazon.com>
@ryanbogan
Copy link
Member Author

Adding skip-changelog because entry is in the release notes instead


private QuantizationState quantizationState;

private final String NATIVE_ENGINE_SEARCH_ERROR_MESSAGE = "Search functionality using codec is not supported with Native Engine Reader";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use static

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add to follow-up

@jmazanec15 jmazanec15 merged commit a58a9dc into opensearch-project:main Sep 5, 2024
@navneet1v
Copy link
Collaborator

navneet1v commented Sep 5, 2024

Merging the PR to unblock others. The test which is failing will be fixed in next iteration

discussed with @jmazanec15 and @ryanbogan

@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1997-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 a58a9dcc7b7c264693dbf25e2ad37b0807c2f724
# Push it to GitHub
git push --set-upstream origin backport/backport-1997-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1997-to-2.x.

@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.17 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.17 2.17
# Navigate to the new working tree
cd .worktrees/backport-2.17
# Create a new branch
git switch --create backport/backport-1997-to-2.17
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 a58a9dcc7b7c264693dbf25e2ad37b0807c2f724
# Push it to GitHub
git push --set-upstream origin backport/backport-1997-to-2.17
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.17

Then, create a pull request where the base branch is 2.17 and the compare/head branch is backport/backport-1997-to-2.17.

opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 5, 2024
Add quantization state reader and writer

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
(cherry picked from commit a58a9dc)
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 5, 2024
Add quantization state reader and writer

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
(cherry picked from commit a58a9dc)
ryanbogan added a commit that referenced this pull request Sep 5, 2024
Add quantization state reader and writer

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
(cherry picked from commit a58a9dc)

Co-authored-by: Ryan Bogan <rbogan@amazon.com>
ryanbogan added a commit that referenced this pull request Sep 5, 2024
Add quantization state reader and writer

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
(cherry picked from commit a58a9dc)

Co-authored-by: Ryan Bogan <rbogan@amazon.com>
akashsha1 pushed a commit to akashsha1/k-NN that referenced this pull request Sep 16, 2024
Add quantization state reader and writer

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
Signed-off-by: Akash Shankaran <akash.shankaran@intel.com>
jingqimao77-spec pushed a commit to jingqimao77-spec/k-NN that referenced this pull request Mar 15, 2026
Add quantization state reader and writer

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants