-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Description
Component: core/codecs
Description:
The lucene103 blocktree codec replaced the in-memory FST term index with
an on-disk TrieReader. This causes a significant performance regression
for workloads that perform high-frequency seekExact() calls on the _id
field during document indexing.
Environment
- OpenSearch 3.3 (Lucene 10.x with lucene103 codec) vs OpenSearch 2.19 (Lucene 9.12.0 with lucene90 codec)
- JDK: Amazon Corretto 21.0.8
- Workload: 32 KNN indices, 6 shards each, mixed ingest+query (50/50),
bulk indexing with explicit _id (UUID), ~400 segments per index at
refresh_interval=1s
Problem
Every indexed document with an explicit _id triggers
PerThreadIDVersionAndSeqNoLookup.getDocID() which calls
SegmentTermsEnum.seekExact(BytesRef) on every segment to check for
version conflicts. With ~400 segments per index, each document requires
~400 seekExact calls.
In lucene90, seekExact navigates an in-memory FST (heap-resident).
In lucene103, seekExact navigates a TrieReader via memory-mapped file
reads, where each read triggers MemorySessionImpl.checkValidStateRaw()
(Panama Foreign Memory API bounds check).
JFR Evidence
Write thread profiling (JFR ExecutionSample) shows:
lucene10.3.1 : 10.0% of write thread time in seekExact path
DataInput.readVLong()
SegmentTermsEnumFrame.loadBlock()
SegmentTermsEnum.lambda$prepareSeekExact$1(BytesRef)
SegmentTermsEnum.seekExact(BytesRef)
PerThreadIDVersionAndSeqNoLookup.getDocID()
lucene9.12 : 2.6% of write thread time in seekExact path
FST$Arc$BitTable.isBitSet()
FST.findTargetArc()
SegmentTermsEnum.seekExact(BytesRef)
PerThreadIDVersionAndSeqNoLookup.getDocID()
Additionally, 6.6% of write thread time is spent in
MemorySessionImpl.checkValidStateRaw() on memory-mapped reads triggered
by the TrieReader navigation.
Combined: 16.6% write thread overhead vs 2.6% = 6.4x regression for
this code path.
Impact
At 256,000 seekExact calls/sec (32 TPS × 20 docs/bulk × 400 segments),
this overhead causes:
- 1.9x per-document indexing latency (577µs vs 303µs)
- Search thread saturation under mixed workload (queries slow down due
to CPU contention) - Ingestion stalls at 297k docs/tenant vs 600k+ on lucene90
Increasing refresh_interval from 1s to 30s (reducing segments from ~400
to ~13) mitigates the issue by reducing seekExact calls 30x, pushing
the stall point from 297k to 497k.
Root Cause
Two compounding factors:
-
TrieReader replaces in-memory FST with on-disk trie navigation.
The FST was loaded into Java heap at segment open time — navigation
was pure CPU (BitTable.isBitSet). The TrieReader reads from
memory-mapped files, adding I/O indirection. -
Each memory-mapped read triggers checkValidStateRaw() — the Panama
Foreign Memory API bounds check that verifies the Arena is still
open. This is called on every byte read from the mmap file.
The _id field is special: it is looked up via seekExact on every single
document indexed. It has a random access pattern (UUIDs) that does not
benefit from the TrieReader's sequential access optimizations.
How to Reproduce
- Create an index with many small segments (refresh_interval=1s,
continuous ingestion) - Bulk index documents with explicit _id (UUIDs)
- Profile write threads with JFR
- Compare seekExact time between lucene90 and lucene103 codecs
Version and environment details
No response