Skip to content

feat: cache generation and sequence to reduce TSM filename parsing#26798

Closed
davidby-influx wants to merge 9 commits intomaster-1.xfrom
DSB_cache_generation
Closed

feat: cache generation and sequence to reduce TSM filename parsing#26798
davidby-influx wants to merge 9 commits intomaster-1.xfrom
DSB_cache_generation

Conversation

@davidby-influx
Copy link
Copy Markdown
Contributor

Closes #26794

@davidby-influx davidby-influx marked this pull request as ready for review September 15, 2025 20:26
Comment on lines +276 to +283
if nil != t.parseFileNameFunc {
// If parseFileNameFunc is nil, we are in a test or another TSMReader use
// that does not involve compaction planning!
t.generation, t.sequence, err = t.parseFileNameFunc(t.Path())
if err != nil {
return nil, err
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns about allowing t.generation and t.sequence to be 0 if WithParseFileNameFunc is not used. How disruptive would it be to require supplying the parsing function? Alternatively, does this function really need to be parameterized? I remember discussion that we never use anything but the default parsing function.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tough one. We have lots of tests that use non-formatted files names (created by Go standard packages) where we don't care about sequence and generation. When I enforced name parsing, all those tests broke, and I couldn't think of a better way to allow unexamined filenames while still caching the sequence and generation efficiently.

Comment on lines +447 to +450
t.generation, t.sequence, err = t.parseFileNameFunc(path)
if err != nil {
return fmt.Errorf("failed parsing filename %q for generation and sequence numbers: %w", path, err)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No check if t.parseFilenameFunc is nil. This isn't an issue if we force supplying the parsing function or stop parameterizing it because we always use the default.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a check (see above why we want to allow it to be nil for test compatibility)

}

group := generations[gen]
group := generations[f.Generation]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check that f.Generation and f.Sequence are not 0 before using them? This isn't an issue if we force supplying a parsing function or stop parameterizing it and always use the default.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we have to parameterize it to preserve tests which use unformatted TSM file names (from various mktemp sort of calls).

return nil, err
}

if nil != t.parseFileNameFunc {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could fallback to using the default parsing function if WithParseFileNameFunc is not used.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would break tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That breaks tests. The whole parsing function pointer could now be a boolean flag (parse/no_parse) and the function hard-coded, if that seems cleaner

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No_parse for some tests, parse by default

devanbenz
devanbenz previously approved these changes Sep 16, 2025
Copy link
Copy Markdown

@devanbenz devanbenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please wait for Geoffrey's re-review.

parseFileName: DefaultParseFileName,
copyFiles: runtime.GOOS == "windows",
readerOptions: options,
readerOptions: append(options, WithParseFileNameFunc(DefaultParseFileName)),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get the expected behavior of NewFileStore("dirname", tsm1.WithParseFileNameFunc(myParseFunc)), the DefaultParseFileName must be prepended to the options. The appended DefaultParseFileName will cause DefaultParseFileName to be used instead of myParseFunc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops!

@gwossum
Copy link
Copy Markdown
Member

gwossum commented Sep 17, 2025

While I'm still not a fan allowing partially working file stores, most of practical concerns could be addressed by adding a way for NewDefaultPlanner to determine if the fileStore it is given is usable for compaction. This could be addressed by adding a FileStore.SupportsCompaction method that lets the caller know if it is capable of being used for compaction (e.g. has a non-nil ParseFileNameFunc). Since DefaultPlanner is the only place that currently uses the generation and sequence number provided by the filename parser, the only possible place this causes a problem would be protected.

@gwossum
Copy link
Copy Markdown
Member

gwossum commented Sep 17, 2025

There's also a "cute" way to check if FileStore has a parse function, which is to ask it to parse a filename generated by the file formatting function. This feels pretty kludgy, though, plus is more invasive to the code because the format file name function has to be passed in more places for this to work.

@gwossum
Copy link
Copy Markdown
Member

gwossum commented Sep 17, 2025

Just did a quick PoC of the FileStore.SupportsCompaction idea. Takes about ~10 lines of code and all tests pass and influxd is happy. We can either say that this proves we don't have an issue with the current code misusing FileStore and call it a day, or we can add the code to prevent future issues.

@davidby-influx
Copy link
Copy Markdown
Contributor Author

Do you wan to add your commit to the branch, or is it too PoC-ish?

@gwossum
Copy link
Copy Markdown
Member

gwossum commented Sep 17, 2025

Let me add a few comments and then I can add it. I won't be able to approve it once I do that, so @devanbenz will have to approve the PR.

Add `FileStore.SupportsCompactionPlanning()` which allows compaction
planners to check if a `FileStore` supports compaction planning.
Currently this means the `FileStore` must have a TSM filename parsing
function available.
Copy link
Copy Markdown

@devanbenz devanbenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment about the panic

func NewDefaultPlanner(fs fileStore, writeColdDuration time.Duration) *DefaultPlanner {
if !fs.SupportsCompactionPlanning() {
// This should only happen due to developer mistakes.
panic("fileStore must support compaction planning")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this should only occur if a dev messes up, but, do we really want a panic in production code? Also, would it be better to have a method called MustSupportCompactionPlanning() to put this in that is similar to other parts of the codebase where we panic?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had this convo in slack. Not panicking would be more invasive to the code (mainly test code), and the reason for even allowing a FileStore that didn't support compaction planning was to avoid a lot of test code changes. I'm also confident that you won't get this panic in production. If you panic in NewDefaultPlanner, influxd won't even start. You also can't a panic if the code is used properly. NewFileStore always sets a parse filename function. You have to either either to force the parse function to nil (not in codebase), create a FileStore without using NewFileStore (also not in codebase), or create various objects directly (only in tests).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With regards to MustSupportCompactionPlanning, SupportsCompactionPlanning isn't where the panic comes from. It leaves the choice of how to handle the issue up to the caller.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good 👍

devanbenz
devanbenz previously approved these changes Sep 17, 2025
Copy link
Copy Markdown

@devanbenz devanbenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davidby-influx
Copy link
Copy Markdown
Contributor Author

davidby-influx commented Oct 14, 2025

Work moved to feat: cache generation and sequence to reduce TSM filename parsing for cleanliness.

@davidby-influx davidby-influx deleted the DSB_cache_generation branch October 14, 2025 01:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants