Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions lucene/CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ Optimizations
* GITHUB#14011: Reduce allocation rate in HNSW concurrent merge. (Viliam Durina)
* GITHUB#14022: Optimize DFS marking of connected components in HNSW by reducing stack depth, improving performance and reducing allocations. (Viswanath Kuchibhotla)

* GITHUB#14920: Avoid allocations in IndexFileNames.parseGeneration() method. (Dmytro Dumanskiy)

Bug Fixes
---------------------
* GITHUB#14049: Randomize KNN codec params in RandomCodec. Fixes scalar quantization div-by-zero
Expand Down
20 changes: 17 additions & 3 deletions lucene/core/src/java/org/apache/lucene/index/IndexFileNames.java
Original file line number Diff line number Diff line change
Expand Up @@ -145,14 +145,28 @@ public static String stripSegmentName(String filename) {
/** Returns the generation from this file name, or 0 if there is no generation. */
public static long parseGeneration(String filename) {
assert filename.startsWith("_");
String[] parts = stripExtension(filename).substring(1).split("_");
int dot = filename.indexOf('.');
int end = (dot != -1) ? dot : filename.length();
int start = 1; // skip initial '_'

int first = filename.indexOf('_', start);
if (first == -1 || first >= end) {
return 0;
}

int second = filename.indexOf('_', first + 1);
int third = (second != -1) ? filename.indexOf('_', second + 1) : -1;

int parts = (second == -1 || second >= end) ? 2 : (third == -1 || third >= end) ? 3 : 4;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a massive increase in complexity of the code: the tradeoff isn't worth it.

It's a simple string method called once, doesn't even approach the cost of the system call to actually read the file.

Copy link
Author

@doom369 doom369 Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our scenario, index updates (and we have ~6 different indexes) can happen every 5 seconds. It's profiling info from a 5-minute production run with a async-profiler (one of the best at the moment, imho). So, if these allocations were captured, they're not rare. Agree about complexity. But this method now should be also 5-10x faster if that matters for you. If you want, I can provide you with a microbenchmark.

// 4 cases:
// segment.ext
// segment_gen.ext
// segment_codec_suffix.ext
// segment_gen_codec_suffix.ext
if (parts.length == 2 || parts.length == 4) {
return Long.parseLong(parts[1], Character.MAX_RADIX);
if (parts == 2 || parts == 4) {
String gen = filename.substring(first + 1, (second != -1 && second < end) ? second : end);
return Long.parseLong(gen, Character.MAX_RADIX);
} else {
return 0;
}
Expand Down
Loading