Skip to content

Add inter-query concurrency for knnPerfTest.py#509

Open
abernardi597 wants to merge 4 commits intomikemccand:mainfrom
abernardi597:parallel-query
Open

Add inter-query concurrency for knnPerfTest.py#509
abernardi597 wants to merge 4 commits intomikemccand:mainfrom
abernardi597:parallel-query

Conversation

@abernardi597
Copy link
Contributor

I implemented inter-query concurrency by adding a new thread pool for dispatching queries to the IndexSearcher.

To do so, I made the VectorReader use random reads on its FileChannel to make the object thread-safe.

There are some other small refactors as well.

Closes #505.

Copy link
Owner

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @abernardi597! Sorry for the long delay ...

I left a bunch of comments. Net/net I like the change, and this is a badly needed feature, but can we find a way to not add the two new allocations (buffers/arrays) for every doc we index? I think today we reuse a single byte[] or float[] array and ByteBuffer. I like to brutally minimize any "outside of Lucene" cost so that we (as best we can) measure specifically Lucene's cost. I like that the generics for VectorReader simplified some code, but I don't like the added cost, spook methods like interpret, etc. ;)

I also suggested possible names (the hardest part) -- -numSearchThread and -numQueryThread seem like synonyms to me (searching vs querying).

I think we lost some prior code / checks (like vector files containing exactly whole vectors); I tried to catch & add comments but please double check on the deleted/refactored code that it didn't functionally lose something? Or if it's on purpose, just call it out / explain a bit.

Finally, using Java's nice async APIs ... I don't think that's really buying us much here (Lucene's internal IO itself isn't async-aware yet, I think?), and maybe losing things (prompt reporting of clear exception, and exit(1), promptly, if a query fails).

}
break;
case "-numQueryThread":
// 0: single thread mode (not passing a executorService)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this in -help or so as well? Or javadocs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it to the exception for misusing the flag, but not explicitly anywhere else.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm is there no -help or so existing now?

@github-actions
Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Feb 14, 2026
@github-actions github-actions bot removed the Stale label Feb 27, 2026
@abernardi597 abernardi597 force-pushed the parallel-query branch 5 times, most recently from 3ebf955 to 8edb9e0 Compare March 3, 2026 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add inter-query concurrency for knnPerfTest.py

2 participants