Coordinate DocValuesSkipper across fields for multi-range conjunctions#15793
Open
sgup432 wants to merge 2 commits intoapache:mainfrom
Open
Coordinate DocValuesSkipper across fields for multi-range conjunctions#15793sgup432 wants to merge 2 commits intoapache:mainfrom
sgup432 wants to merge 2 commits intoapache:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Related issue for more details - #15770
This PR adds MultiFieldDocValuesRangeQuery, which coordinates DocValuesSkipper evaluation across fields. BooleanQuery.rewrite() detects the pattern (2+ required NumericDocValuesRangeQuery clauses on distinct fields) and replaces them with a single coordinated query.
MultiFieldDocValuesRangeQuery contains Concatenated iterator where the main logic lies. It work together with all the desired fields docValueSkipper and move them together.
Also contains a jmh benchmark to validate this.
Tested across different data patterns, document counts, and number of concurrent range fields.
JMH Benchmark Results
Query used
Data Pattern:
clustered: All field values increase with docID (e.g., time-series data where timestamp, sequence number, and sensor readings grow together). Narrow query ranges eliminate most blocks. Best case for coordination (3.2–5.0x).
mixed: Combination of monotonic (timestamp), low-cardinality (20 values, like order status), and random fields (price). Resembles e-commerce order filtering. Moderate gains (1.2–1.7x).
sorted: Index sorted by one field (timestamp), other fields random. Resembles time-series indexed by ingestion time but queried on unsorted metric fields. Similar to mixed (1.1–1.4x).
random: All fields uniformly random with wide query ranges. Worst case, but still gains (1.1–1.7x) — when one field eliminates a block, it saves checking all others.