Skip to content

Add changes for AVX-512 support in k-NN. #2110

Merged
naveentatikonda merged 8 commits intoopensearch-project:mainfrom
akashsha1:intel/avx512faiss_3
Sep 19, 2024
Merged

Add changes for AVX-512 support in k-NN. #2110
naveentatikonda merged 8 commits intoopensearch-project:mainfrom
akashsha1:intel/avx512faiss_3

Conversation

@akashsha1
Copy link
Contributor

@akashsha1 akashsha1 commented Sep 16, 2024

Description

This change adds support to speed up vector search and indexing in faiss using AVX512 hardware accelerator.

Related Issues

Resolves #2056

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@akashsha1 akashsha1 mentioned this pull request Sep 16, 2024
5 tasks
@assanedi
Copy link
Contributor

assanedi commented Sep 18, 2024

Benchmark was run using opensearch-benchmark with cohere dataset(768 dimensions).
Her are some configuration details for indexing:
{
"target_index_name": "target_index",
"target_field_name": "target_field",
"target_index_body": "indices/faiss-index.json",
"target_index_primary_shards": 4,
"target_index_replica_shards": 1,
"target_index_dimension": 768,
"target_index_space_type": "innerproduct",
"target_index_bulk_size": 100,
"target_index_bulk_index_data_set_format": "hdf5",
"target_index_bulk_index_data_set_path": "/mnt/nvme1/documents-1m.hdf5",
"target_index_bulk_indexing_clients": 20,
"target_index_max_num_segments": 1,
"hnsw_ef_search": 256,
"hnsw_ef_construction": 256
}

Her are some configuration details for search:
{
"target_index_name": "target_index",
"target_field_name": "target_field",
"query_k": 100,
"query_body": {
"docvalue_fields" : ["_id"],
"stored_fields" : "none"
},
"query_data_set_format": "hdf5",
"query_data_set_path": "/mnt/nvme1/queries-1m-100k.hdf5",
"query_count": 30000,
"search_clients": 20
}

A forcemerge to reduce the number of max_num_segments to 1 is executed via the API before the seach.

The opensearch cluster was deployed with 2 data nodes (r7i.2xlarges), 1 replica and 4 shards.
Using this setup AVX512 shows 15% improvement over AVX2 on indexing and 7 % on search as shown below:

image

@naveentatikonda
Copy link
Member

Benchmark was run using opensearch-benchmark with cohere dataset(768 dimensions). The opensearch cluster was deployed with 2 data nodes (r7i.2xlarges), 1 replica and 4 shards. Using this setup AVX512 shows 15% improvement over AVX2 on indexing and 7 % on search as shown below:

image

@assanedi Can you also pls add other configuration details like the indexing clients, query clients, ef_construction, ef_search, etc

Copy link
Member

@naveentatikonda naveentatikonda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @akashsha1 @assanedi

@naveentatikonda
Copy link
Member

"target_index_bulk_indexing_clients": 20,
"target_index_max_num_segments": 10,
"hnsw_ef_search": 256,
"hnsw_ef_construction": 256

@assanedi Isn't the max_num_segments was 1 during forcemerge ?

@assanedi
Copy link
Contributor

"target_index_bulk_indexing_clients": 20,
"target_index_max_num_segments": 10,
"hnsw_ef_search": 256,
"hnsw_ef_construction": 256

@assanedi Isn't the max_num_segments was 1 during forcemerge ?

Yes I run the forcemerge API, here is the results of it:
curl -X POST -k --user admin:admin http://10.0.0.80:9200/_forcemerge?max_num_segments=1
{"_shards":{"total":8,"successful":8,"failed":0}}

@naveentatikonda
Copy link
Member

Yes I run the forcemerge API, here is the results of it: curl -X POST -k --user admin:admin http://10.0.0.80:9200/_forcemerge?max_num_segments=1 {"_shards":{"total":8,"successful":8,"failed":0}}

Yes, but in the configuration you mentioned it as 10 instead of 1 for target_index_max_num_segments

@naveentatikonda
Copy link
Member

For FP32 we don’t need to make any changes in Faiss as they are using auto-vectorization to achieve the optimization with AVX512. But, for Scalar Quantization Intel have raised a PR to Faiss which is under review
facebookresearch/faiss#3853

@assanedi
Copy link
Contributor

Yes I run the forcemerge API, here is the results of it: curl -X POST -k --user admin:admin http://10.0.0.80:9200/_forcemerge?max_num_segments=1 {"_shards":{"total":8,"successful":8,"failed":0}}

Yes, but in the configuration you mentioned it as 10 instead of 1 for target_index_max_num_segments

I updated the configuration details


public static boolean isFaissAVX512Disabled() {
try {
return KNNSettings.state().getSettingValue(KNNSettings.KNN_FAISS_AVX512_DISABLED);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do proper null checks here? In general, I think its best to avoid catching all exceptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like java boolean cannot be null. So null check won't be possible.
Your second point on exceptions is valid, and this code shouldn't throw exceptions as a default value is set. I've removed the try/catch block.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akashsha1 as users will manually set this setting in opensearch.yml, there is a chance of getting null. To avoid it shall we change it to
return Booleans.parseBoolean(KNNSettings.state().getSettingValue(KNNSettings.KNN_FAISS_AVX512_DISABLED).toString(), false);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Booleans.parseBoolean will do the null validation and if it is null, it will return the default value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spoke with Naveen on slack, and updated to
return Booleans.parseBoolean(KNNSettings.state().getSettingValue(KNNSettings.KNN_FAISS_AVX512_DISABLED).toString(), KNN_DEFAULT_FAISS_AVX512_DISABLED_VALUE);

@naveentatikonda naveentatikonda added enhancement backport 2.x Features Introduces a new unit of functionality that satisfies a requirement and removed enhancement labels Sep 18, 2024
Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks

@naveentatikonda naveentatikonda merged commit 5423cc1 into opensearch-project:main Sep 19, 2024
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 19, 2024
* changes for AVX-512. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* add cpu detection logic to security workflow. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* add cpu detection logic to backward compat test workflow. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* fix bwc  workflow. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* address PR feedback. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* fix a bug in KNNSettings. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* fix a bug in KNNSettings. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* update KNNSettings. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

---------

Signed-off-by: Akash Shankaran <[email protected]>
(cherry picked from commit 5423cc1)
ryanbogan pushed a commit that referenced this pull request Sep 20, 2024
* changes for AVX-512. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* add cpu detection logic to security workflow. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* add cpu detection logic to backward compat test workflow. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* fix bwc  workflow. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* address PR feedback. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* fix a bug in KNNSettings. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* fix a bug in KNNSettings. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* update KNNSettings. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

---------

Signed-off-by: Akash Shankaran <[email protected]>
(cherry picked from commit 5423cc1)
Signed-off-by: Ryan Bogan <[email protected]>
naveentatikonda pushed a commit that referenced this pull request Sep 23, 2024
* changes for AVX-512. Signed-off by: Akash Shankaran <[email protected]>



* add cpu detection logic to security workflow. Signed-off by: Akash Shankaran <[email protected]>



* add cpu detection logic to backward compat test workflow. Signed-off by: Akash Shankaran <[email protected]>



* fix bwc  workflow. Signed-off by: Akash Shankaran <[email protected]>



* address PR feedback. Signed-off by: Akash Shankaran <[email protected]>



* fix a bug in KNNSettings. Signed-off by: Akash Shankaran <[email protected]>



* fix a bug in KNNSettings. Signed-off by: Akash Shankaran <[email protected]>



* update KNNSettings. Signed-off by: Akash Shankaran <[email protected]>



---------


(cherry picked from commit 5423cc1)

Signed-off-by: Akash Shankaran <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Co-authored-by: akashsha1 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.x Features Introduces a new unit of functionality that satisfies a requirement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add support for FAISS AVX512

4 participants