feat: cache generation and sequence to reduce TSM filename parsing#26798
feat: cache generation and sequence to reduce TSM filename parsing#26798davidby-influx wants to merge 9 commits intomaster-1.xfrom
Conversation
| if nil != t.parseFileNameFunc { | ||
| // If parseFileNameFunc is nil, we are in a test or another TSMReader use | ||
| // that does not involve compaction planning! | ||
| t.generation, t.sequence, err = t.parseFileNameFunc(t.Path()) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| } |
There was a problem hiding this comment.
I have concerns about allowing t.generation and t.sequence to be 0 if WithParseFileNameFunc is not used. How disruptive would it be to require supplying the parsing function? Alternatively, does this function really need to be parameterized? I remember discussion that we never use anything but the default parsing function.
There was a problem hiding this comment.
This is a tough one. We have lots of tests that use non-formatted files names (created by Go standard packages) where we don't care about sequence and generation. When I enforced name parsing, all those tests broke, and I couldn't think of a better way to allow unexamined filenames while still caching the sequence and generation efficiently.
tsdb/engine/tsm1/reader.go
Outdated
| t.generation, t.sequence, err = t.parseFileNameFunc(path) | ||
| if err != nil { | ||
| return fmt.Errorf("failed parsing filename %q for generation and sequence numbers: %w", path, err) | ||
| } |
There was a problem hiding this comment.
No check if t.parseFilenameFunc is nil. This isn't an issue if we force supplying the parsing function or stop parameterizing it because we always use the default.
There was a problem hiding this comment.
I'll add a check (see above why we want to allow it to be nil for test compatibility)
| } | ||
|
|
||
| group := generations[gen] | ||
| group := generations[f.Generation] |
There was a problem hiding this comment.
Should we check that f.Generation and f.Sequence are not 0 before using them? This isn't an issue if we force supplying a parsing function or stop parameterizing it and always use the default.
There was a problem hiding this comment.
I believe we have to parameterize it to preserve tests which use unformatted TSM file names (from various mktemp sort of calls).
| return nil, err | ||
| } | ||
|
|
||
| if nil != t.parseFileNameFunc { |
There was a problem hiding this comment.
Alternatively, we could fallback to using the default parsing function if WithParseFileNameFunc is not used.
There was a problem hiding this comment.
That would break tests.
There was a problem hiding this comment.
That breaks tests. The whole parsing function pointer could now be a boolean flag (parse/no_parse) and the function hard-coded, if that seems cleaner
There was a problem hiding this comment.
No_parse for some tests, parse by default
devanbenz
left a comment
There was a problem hiding this comment.
LGTM, please wait for Geoffrey's re-review.
tsdb/engine/tsm1/file_store.go
Outdated
| parseFileName: DefaultParseFileName, | ||
| copyFiles: runtime.GOOS == "windows", | ||
| readerOptions: options, | ||
| readerOptions: append(options, WithParseFileNameFunc(DefaultParseFileName)), |
There was a problem hiding this comment.
To get the expected behavior of NewFileStore("dirname", tsm1.WithParseFileNameFunc(myParseFunc)), the DefaultParseFileName must be prepended to the options. The appended DefaultParseFileName will cause DefaultParseFileName to be used instead of myParseFunc.
|
While I'm still not a fan allowing partially working file stores, most of practical concerns could be addressed by adding a way for |
|
There's also a "cute" way to check if |
|
Just did a quick PoC of the |
|
Do you wan to add your commit to the branch, or is it too PoC-ish? |
|
Let me add a few comments and then I can add it. I won't be able to approve it once I do that, so @devanbenz will have to approve the PR. |
Add `FileStore.SupportsCompactionPlanning()` which allows compaction planners to check if a `FileStore` supports compaction planning. Currently this means the `FileStore` must have a TSM filename parsing function available.
| func NewDefaultPlanner(fs fileStore, writeColdDuration time.Duration) *DefaultPlanner { | ||
| if !fs.SupportsCompactionPlanning() { | ||
| // This should only happen due to developer mistakes. | ||
| panic("fileStore must support compaction planning") |
There was a problem hiding this comment.
I know this should only occur if a dev messes up, but, do we really want a panic in production code? Also, would it be better to have a method called MustSupportCompactionPlanning() to put this in that is similar to other parts of the codebase where we panic?
There was a problem hiding this comment.
We had this convo in slack. Not panicking would be more invasive to the code (mainly test code), and the reason for even allowing a FileStore that didn't support compaction planning was to avoid a lot of test code changes. I'm also confident that you won't get this panic in production. If you panic in NewDefaultPlanner, influxd won't even start. You also can't a panic if the code is used properly. NewFileStore always sets a parse filename function. You have to either either to force the parse function to nil (not in codebase), create a FileStore without using NewFileStore (also not in codebase), or create various objects directly (only in tests).
There was a problem hiding this comment.
With regards to MustSupportCompactionPlanning, SupportsCompactionPlanning isn't where the panic comes from. It leaves the choice of how to handle the issue up to the caller.
(cherry picked from commit 0ec0101)
|
Work moved to feat: cache generation and sequence to reduce TSM filename parsing for cleanliness. |
Closes #26794