Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
33039a3
Allow dynamic HNSW search threshold updates for collaborative search
krickert Feb 7, 2026
63eb03f
Introduce CollaborativeKnnCollector and Manager to core
krickert Feb 7, 2026
323c395
Clarify visibility semantics and apply formatting
krickert Feb 7, 2026
13a5960
Add CHANGES.txt entry for collaborative search
krickert Feb 7, 2026
50c6255
Add High-K and High-Dimension test scenarios for collaborative search
krickert Feb 7, 2026
2f36edd
Commit missing CollaborativeKnnCollector and Manager
krickert Feb 7, 2026
564f878
Cleanup extraneous newlines in TestCollaborativeHnswSearch
krickert Feb 7, 2026
3eb6a8e
Remove extraneous newlines and fix indentation in TestCollaborativeHn…
krickert Feb 7, 2026
5d1019d
Replace AtomicLong with AtomicInteger for global similarity threshold
krickert Feb 7, 2026
d90da2a
Add multi-segment collaborative pruning test for HNSW search
krickert Feb 7, 2026
88a5188
Add multi-index performance tests for collaborative HNSW pruning
krickert Feb 7, 2026
f7c8b63
Comprehensive multi-segment and multi-index collaborative tests
krickert Feb 7, 2026
66c7ad3
Fix forbiddenApis by adding Locale.ROOT to String.format
krickert Feb 7, 2026
f0dc67c
Mark multi-index collaborative tests as @Nightly
krickert Feb 7, 2026
c452d14
Move testHighKPruning and testHighDimensionPruning to @Nightly
krickert Feb 7, 2026
bbc0875
Idiomatic Collaborative HNSW search with LongAccumulator and DocScore…
krickert Feb 7, 2026
2e3c64c
Fix multi-index pruning bug and add recall measurement to tests
krickert Feb 7, 2026
17fba5c
Add definitive scaling and stress tests for collaborative search
krickert Feb 7, 2026
3476365
Cleanup and concurrent simulation for TestCollaborativeHnswSearch
krickert Feb 7, 2026
3c491fe
Accumulate floor score for high-recall pruning and add real-world seg…
krickert Feb 7, 2026
1f8cee1
Merge branch 'apache:main' into feature/collaborative-hnsw-search
krickert Feb 17, 2026
713325a
Feature/HNSW: Refine Collaborative Search for robust recall and distr…
krickert Feb 10, 2026
6d135a9
Feature/HNSW: Refine Collaborative Search for robust recall and distr…
krickert Feb 10, 2026
e5c894e
Lucene: implement Recall-Safe pruning using Global Floor vs Local Max…
krickert Feb 17, 2026
65b27bb
Lucene: implement topology-aware coordination with Hamming Affinity a…
krickert Feb 17, 2026
7677ac2
Restore CollaborativeKnnCollector Golden Logic (earlyTerminated, loca…
krickert Feb 17, 2026
9cc3d4c
Add 4-arg constructor for TestCollaborativeHnswSearch compatibility
krickert Feb 17, 2026
b330268
Fix trailing whitespace in CHANGES.txt
krickert Feb 18, 2026
d4adb69
Apply google-java-format via gradlew tidy
krickert Feb 18, 2026
926e5f4
Remove unused fields and imports to fix ecjLint failures
krickert Feb 18, 2026
7b77d6a
Apply google-java-format to CollaborativeKnnCollector
krickert Feb 18, 2026
56018b4
Fix forbiddenApis by using NamedThreadFactory in TestCollaborativeHns…
krickert Feb 18, 2026
2691faa
Apply google-java-format to TestCollaborativeHnswSearch
krickert Feb 18, 2026
f618bdd
Merge branch 'apache:main' into feature/collaborative-hnsw-search
krickert Feb 19, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions lucene/CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,14 @@ API Changes

* GITHUB#14615 : Remove unnecessary public methods in FuzzySet (Greg Miller)

* GITHUB#15295 : Switched to a fixed CFS threshold (Shubham Sharma)
* GITHUB#15295 : Switched to a fixed CFS threshold(Shubham Sharma)

New Features
---------------------

* GITHUB#KNN-COLLAB: Introduce Collaborative HNSW search, allowing dynamic threshold
updates from collectors to enable cluster-wide search pruning. (ai-pipestream)

* GITHUB#15505: Upgrade snowball to 2d2e312df56f2ede014a4ffb3e91e6dea43c24be. New stemmer: PolishStemmer (and
PolishSnowballAnalyzer in the stempel package) (Justas Sakalauskas, Dawid Weiss)

Expand Down Expand Up @@ -1774,7 +1777,7 @@ Optimizations

* GITHUB#13184: Make the HitQueue size more appropriate for KNN exact search (Pan Guixin)

* GITHUB#13199: Speed up dynamic pruning by breaking point estimation when threshold get exceeded. (Guo Feng)
* GITHUB#13199: Speed up dynamic pruning by breaking point estimation when thresholdget exceeded. (Guo Feng)

* GITHUB#13203: Speed up writeGroupVInts (Zhang Chao)

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.lucene.search;

import java.util.concurrent.atomic.LongAccumulator;
import java.util.function.IntUnaryOperator;
import org.apache.lucene.search.knn.KnnSearchStrategy;

/**
* A {@link KnnCollector} that allows for collaborative search. PRUNING BASED ON GLOBAL FLOOR vs
* LOCAL MAX.
*/
public class CollaborativeKnnCollector extends KnnCollector.Decorator {

private static final IntUnaryOperator IDENTITY_MAPPER = docId -> docId;
private static final int GLOBAL_BAR_MIN_VISITS = 100;
private static final float GLOBAL_BAR_TERMINATION_SLACK = 0.0001f;

private final LongAccumulator minScoreAcc;
private final int docBase;
private final IntUnaryOperator docIdMapper;

private float localMaxScore = Float.NEGATIVE_INFINITY;
private float lastSharedScore = Float.NEGATIVE_INFINITY;

/** Convenience constructor for tests. */
public CollaborativeKnnCollector(
int k, int visitLimit, LongAccumulator minScoreAcc, int docBase) {
this(new TopKnnCollector(k, visitLimit), minScoreAcc, docBase, IDENTITY_MAPPER);
}

public CollaborativeKnnCollector(
int k,
int visitLimit,
LongAccumulator minScoreAcc,
int docBase,
IntUnaryOperator docIdMapper) {
this(new TopKnnCollector(k, visitLimit), minScoreAcc, docBase, docIdMapper);
}

public CollaborativeKnnCollector(
int k,
int visitLimit,
KnnSearchStrategy searchStrategy,
LongAccumulator minScoreAcc,
int docBase,
IntUnaryOperator docIdMapper) {
this(new TopKnnCollector(k, visitLimit, searchStrategy), minScoreAcc, docBase, docIdMapper);
}

private CollaborativeKnnCollector(
KnnCollector delegate,
LongAccumulator minScoreAcc,
int docBase,
IntUnaryOperator docIdMapper) {
super(delegate);
this.minScoreAcc = minScoreAcc;
this.docBase = docBase;
this.docIdMapper = docIdMapper;
}

@Override
public float minCompetitiveSimilarity() {
// Pathfinding always uses local bar
return super.minCompetitiveSimilarity();
}

@Override
public boolean earlyTerminated() {
if (super.earlyTerminated()) return true;
if (visitedCount() < GLOBAL_BAR_MIN_VISITS) return false;

long globalFloorCode = minScoreAcc.get();
if (globalFloorCode == Long.MIN_VALUE) return false;

float globalFloorScore = DocScoreEncoder.toScore(globalFloorCode);

// CRITICAL: Only stop if our BEST hit is worse than the global floor.
// If localMax < globalFloor, it's impossible for this shard to make the Top K.
return localMaxScore > Float.NEGATIVE_INFINITY
&& localMaxScore < (globalFloorScore - GLOBAL_BAR_TERMINATION_SLACK);
}

@Override
public boolean collect(int docId, float similarity) {
boolean collected = super.collect(docId, similarity);

if (similarity > localMaxScore) {
localMaxScore = similarity;
}

if (collected) {
float floorScore = super.minCompetitiveSimilarity();
if (floorScore > Float.NEGATIVE_INFINITY && floorScore > lastSharedScore + 0.0001f) {

int absoluteDocId = docId + docBase;
minScoreAcc.accumulate(
DocScoreEncoder.encode(docIdMapper.applyAsInt(absoluteDocId), floorScore));
lastSharedScore = floorScore;
}
}
return collected;
}

public static float toScore(long value) {
return DocScoreEncoder.toScore(value);
}

public static long encode(int docId, float score) {
return DocScoreEncoder.encode(docId, score);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.lucene.search.knn;

import java.io.IOException;
import java.util.concurrent.atomic.LongAccumulator;
import java.util.function.IntUnaryOperator;
import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.search.CollaborativeKnnCollector;
import org.apache.lucene.search.KnnCollector;

/**
* A {@link KnnCollectorManager} that creates {@link CollaborativeKnnCollector} instances sharing a
* single {@link LongAccumulator} for global pruning across segments, gated by topological hints.
*
* @lucene.experimental
*/
public class CollaborativeKnnCollectorManager implements KnnCollectorManager {

private final int k;
private final LongAccumulator minScoreAcc;
private final IntUnaryOperator docIdMapper;

/**
* Create a new CollaborativeKnnCollectorManager
*
* @param k number of neighbors to collect
* @param minScoreAcc shared accumulator for global pruning
*/
public CollaborativeKnnCollectorManager(int k, LongAccumulator minScoreAcc) {
this(k, minScoreAcc, docId -> docId);
}

/** Create a new CollaborativeKnnCollectorManager with a docId mapper */
public CollaborativeKnnCollectorManager(
int k, LongAccumulator minScoreAcc, IntUnaryOperator docIdMapper) {
this.k = k;
this.minScoreAcc = minScoreAcc;
this.docIdMapper = docIdMapper;
}

@Override
public KnnCollector newCollector(
int visitedLimit, KnnSearchStrategy searchStrategy, LeafReaderContext context)
throws IOException {
return new CollaborativeKnnCollector(
k, visitedLimit, searchStrategy, minScoreAcc, context.docBase, docIdMapper);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,13 @@ void searchLevel(
// We should allow exploring equivalent minAcceptedSimilarity values at least once
boolean shouldExploreMinSim = true;
while (candidates.size() > 0 && results.earlyTerminated() == false) {
// Update the threshold dynamically from the collector to allow external pruning.
float liveMinSimilarity = results.minCompetitiveSimilarity();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this do when we are not using external pruning?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we're not using external pruning, results.minCompetitiveSimilarity() still returns the minimum competitive similarity of the current top‑K results, but only from this shard’s collector.

So it's the same kind of threshold (used to prune the graph: skip nodes that can’t beat the worst of the current top‑K), but it's not updated from other shards.

The loop and the pruning logic are unchanged; only the source of the threshold is internal (this shard) instead of external (collaborative). In other words: same API, same pruning, no cross-shard updates when external pruning is off.

if (liveMinSimilarity > minAcceptedSimilarity) {
minAcceptedSimilarity = liveMinSimilarity;
shouldExploreMinSim = true;
}

// get the best candidate (closest or best scoring)
float topCandidateSimilarity = candidates.topScore();
if (topCandidateSimilarity < minAcceptedSimilarity) {
Expand Down
Loading
Loading