Skip to content

Improve BytesRefHash.sort performance by retrieve byte directly from the pool.#15775

Closed
tyronecai wants to merge 7 commits intoapache:mainfrom
tyronecai:patch-2
Closed

Improve BytesRefHash.sort performance by retrieve byte directly from the pool.#15775
tyronecai wants to merge 7 commits intoapache:mainfrom
tyronecai:patch-2

Conversation

@tyronecai
Copy link
Contributor

@tyronecai tyronecai commented Feb 27, 2026

Description

RadixSort involves a large number of statistical histogram calculations, there are numerous byteAt calls.
Currently, byteAt calls get to retrieve the BytesRef corresponding to position i from the pool,
and then uses cmp.ByteAt to get the value of BytesRef at position k.

Because the byteAt calls are so frequent, they cause an observable performance penalty.
profile_cpu_44438.html

image

Therefore, we can directly retrieve the byte values ​​corresponding to start and i from the pool.

Use save environment as #15772

without #15772 + without pool.byteAt

sort 33554432 unique terms in 4543.19 ms

without #15772 + with pool.byteAt

sort 33554432 unique terms in 3866.08 ms.      (4543.19 - 3866.08) / 4543.19 = 0.149

with #15772 + without pool.byteAt

sort 33554432 unique terms in 3385.94 ms

with #15772 + with pool.byteAt

sort 33554432 unique terms in 2937.54 ms.       (3385.94 - 2937.54) / 3385.94 = 0.132

However, I'm not sure if this change is appropriate from a code structure perspective,
although it does improve performance.

@dweiss @mikemccand please take a look and give some advice

@tyronecai tyronecai changed the title Improve BytesRefHash.sort performance by get byte directly from the pool. Improve BytesRefHash.sort performance by retrieve byte directly from the pool. Feb 27, 2026
@github-actions github-actions bot added this to the 10.5.0 milestone Feb 27, 2026
@tyronecai
Copy link
Contributor Author

@dweiss

Could you please take a look and see if there are any issues with this change ?

@dweiss
Copy link
Contributor

dweiss commented Mar 2, 2026

Checks don't pass. Also, refining this class in microbenchmarks is not the same as running it in the wild - there is significant complexity if you add higher-tier code. I'd say that this may not be worth the increased code complexity. You'd need to try macro-benchmarks (luceneutil) and see if this shows any improvement, I don't think you'll see much there.

@tyronecai
Copy link
Contributor Author

Checks don't pass. Also, refining this class in microbenchmarks is not the same as running it in the wild - there is significant complexity if you add higher-tier code. I'd say that this may not be worth the increased code complexity. You'd need to try macro-benchmarks (luceneutil) and see if this shows any improvement, I don't think you'll see much there.检查未通过。此外,在微基准测试中优化这个类与在真实环境中运行是不同的——如果添加了更高层次的代码,复杂性会显著增加。我认为这种增加的代码复杂性并不值得。您需要尝试宏观基准测试(luceneutil),看看是否能看到任何改进,不过我觉得效果不会太明显。

Okay, I also think this change is a bit strange.

Checks don't pass. Also, refining this class in microbenchmarks is not the same as running it in the wild - there is significant complexity if you add higher-tier code. I'd say that this may not be worth the increased code complexity. You'd need to try macro-benchmarks (luceneutil) and see if this shows any improvement, I don't think you'll see much there.

Okay, I also think this change is a bit strange.

@tyronecai tyronecai closed this Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants