Skip to content

Coordinate DocValuesSkipper across fields for multi-range conjunctions#15793

Open
sgup432 wants to merge 2 commits intoapache:mainfrom
sgup432:multi_field_doc_values_skip
Open

Coordinate DocValuesSkipper across fields for multi-range conjunctions#15793
sgup432 wants to merge 2 commits intoapache:mainfrom
sgup432:multi_field_doc_values_skip

Conversation

@sgup432
Copy link
Contributor

@sgup432 sgup432 commented Mar 4, 2026

Description

Related issue for more details - #15770

  • This PR adds MultiFieldDocValuesRangeQuery, which coordinates DocValuesSkipper evaluation across fields. BooleanQuery.rewrite() detects the pattern (2+ required NumericDocValuesRangeQuery clauses on distinct fields) and replaces them with a single coordinated query.

  • MultiFieldDocValuesRangeQuery contains Concatenated iterator where the main logic lies. It work together with all the desired fields docValueSkipper and move them together.

  • Also contains a jmh benchmark to validate this.

  • Tested across different data patterns, document counts, and number of concurrent range fields.

JMH Benchmark Results

Pattern Docs Fields Without Optimization With optimization Speedup
clustered 1M 3 16,417 61,342 3.7x
clustered 1M 5 11,523 57,487 5.0x
clustered 10M 3 16,148 55,677 3.4x
clustered 10M 5 13,128 42,154 3.2x
mixed 1M 3 859 1,001 1.17x
mixed 1M 5 514 873 1.70x
mixed 10M 3 76 79 1.03x
mixed 10M 5 50 69 1.38x
random 1M 3 62 68 1.10x
random 1M 5 45 64 1.42x
random 10M 3 4.3 6.5 1.51x
random 10M 5 3.5 5.8 1.65x
sorted 1M 3 920 841 0.91x
sorted 1M 5 611 882 1.44x
sorted 10M 3 69 78 1.14x
sorted 10M 5 55 68 1.22x

Query used

{"bool":{"filter":[{"range":{"field0":{"gte":"X","lte":"Y"}}},{"range":{"field1":{"gte":"A","lte":"B"}}},{"range":{"field2":{"gte":"M","lte":"N"}}}]}}

Data Pattern:

  • clustered: All field values increase with docID (e.g., time-series data where timestamp, sequence number, and sensor readings grow together). Narrow query ranges eliminate most blocks. Best case for coordination (3.2–5.0x).

  • mixed: Combination of monotonic (timestamp), low-cardinality (20 values, like order status), and random fields (price). Resembles e-commerce order filtering. Moderate gains (1.2–1.7x).

  • sorted: Index sorted by one field (timestamp), other fields random. Resembles time-series indexed by ingestion time but queried on unsorted metric fields. Similar to mixed (1.1–1.4x).

  • random: All fields uniformly random with wide query ranges. Worst case, but still gains (1.1–1.7x) — when one field eliminates a block, it saves checking all others.

@github-actions github-actions bot added this to the 11.0.0 milestone Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant