-
Notifications
You must be signed in to change notification settings - Fork 3.5k
PQ: Add support for event-level compression using ZStandard (ZSTD) #18121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
🔍 Preview links for changed docs |
jsvd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first pass, minor annotations, going to test this manually now.
| settings.getQueueMaxBytes(), settings.getMaxUnread(), settings.getCheckpointMaxAcks(), | ||
| settings.getCheckpointMaxWrites(), settings.getCheckpointRetry() | ||
| ); | ||
| return new BuilderImpl(settings); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion, this Builder refactoring could have been a separate PR as it doesn't require the compression settings at all and is still a significant part of the changeset in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another reason to introduce this change ASAP: yet another parameter is coming in https://github.com/elastic/logstash/pull/18000/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pared off as #18180
logstash-core/src/main/java/org/logstash/util/CleanerThreadLocal.java
Outdated
Show resolved
Hide resolved
logstash-core/src/main/java/org/logstash/ackedqueue/AbstractZstdAwareCompressionCodec.java
Outdated
Show resolved
Hide resolved
logstash-core/src/main/java/org/logstash/ackedqueue/AbstractZstdAwareCompressionCodec.java
Outdated
Show resolved
Hide resolved
|
Look at profiling, it seems like with Zstd.compress/decompress the instance spends about nearly 9% of the time doing context initializations:
profile: profile.html Something worth investigating is the use of thread locals for the contexts. |
Adds non-breaking support for event compression to the persisted queue, as
configured by a new per-pipeline setting `queue.compression`, which supports:
- `none` (default): no compression is performed, but if compressed events
are encountered in the queue they will be decompressed
- `speed`: compression optimized for speed
- `balanced`: compression balancing speed against result size
- `size`: compression optimized for maximum reduction of size
- `disabled`: compression support entirely disabled; if a pipeline is run
in this configuration against a PQ that already contains
unacked compressed events, the pipeline WILL crash.
To accomplish this, we then provide an abstract base implementation of the
CompressionCodec whose decode method is capable of _detecting_ and decoding
zstd-encoded payload while letting other payloads through unmodified.
The detection is done with an operation on the first four bytes of the
payload, so no additional context is needed.
An instance of this zstd-aware compression codec is provided with a
pass-through encode operation when configured with `queue.compression: none`,
which is the default, ensuring that by default logstash is able to decode any
event that had previously been written.
We provide an additional implementation that is capable of _encoding_ events
with a configurable goal: speed, size, or a balance of the two.
Co-authored-by: João Duarte <[email protected]>
f18a7b1 to
e90a425
Compare
jsvd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two minor doc changes, otherwise LGTM!
| | `queue.checkpoint.acks` | The maximum number of ACKed events before forcing a checkpoint when persistent queues are enabled (`queue.type: persisted`). Specify `queue.checkpoint.acks: 0` to set this value to unlimited. | 1024 | | ||
| | `queue.checkpoint.writes` | The maximum number of written events before forcing a checkpoint when persistent queues are enabled (`queue.type: persisted`). Specify `queue.checkpoint.writes: 0` to set this value to unlimited. | 1024 | | ||
| | `queue.checkpoint.retry` | When enabled, Logstash will retry four times per attempted checkpoint write for any checkpoint writes that fail. Any subsequent errors are not retried. This is a workaround for failed checkpoint writes that have been seen only on Windows platform, filesystems with non-standard behavior such as SANs and is not recommended except in those specific circumstances. (`queue.type: persisted`) | `true` | | ||
| | `queue.compression` | Set a persisted queue compression level, which allows the pipeline to reduce the event size on disk at the cost of CPU usage. Possible values are `speed`, `balanced`, and `size`. | `none` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| | `queue.compression` | Set a persisted queue compression level, which allows the pipeline to reduce the event size on disk at the cost of CPU usage. Possible values are `speed`, `balanced`, and `size`. | `none` | | |
| | `queue.compression` | Set a persisted queue compression level, which allows the pipeline to reduce the event size on disk at the cost of CPU usage. Possible values are `none`, `speed`, `balanced`, and `size`. | `none` | |
robbavey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of doc comments
Co-authored-by: Rob Bavey <[email protected]>
|
💛 Build succeeded, but was flaky
Failed CI StepsHistory
cc @yaauie |
…18121) * noop: add pq compression-codec with no-op implementation * pq: add support for event compression using zstd Adds non-breaking support for event compression to the persisted queue, as configured by a new per-pipeline setting `queue.compression`, which supports: - `none` (default): no compression is performed, but if compressed events are encountered in the queue they will be decompressed - `speed`: compression optimized for speed - `balanced`: compression balancing speed against result size - `size`: compression optimized for maximum reduction of size - `disabled`: compression support entirely disabled; if a pipeline is run in this configuration against a PQ that already contains unacked compressed events, the pipeline WILL crash. To accomplish this, we then provide an abstract base implementation of the CompressionCodec whose decode method is capable of _detecting_ and decoding zstd-encoded payload while letting other payloads through unmodified. The detection is done with an operation on the first four bytes of the payload, so no additional context is needed. An instance of this zstd-aware compression codec is provided with a pass-through encode operation when configured with `queue.compression: none`, which is the default, ensuring that by default logstash is able to decode any event that had previously been written. We provide an additional implementation that is capable of _encoding_ events with a configurable goal: speed, size, or a balance of the two. * license: add notice for `com.github.luben:zstd-jni` * pq: log compression encode/decode from the codec * Apply docs suggestions from code review Co-authored-by: João Duarte <[email protected]> * remove CleanerThreadLocal utility * license: add mapping for com.github.luben:zstd-jni * Apply suggestions from code review Co-authored-by: Rob Bavey <[email protected]> --------- Co-authored-by: João Duarte <[email protected]> Co-authored-by: Rob Bavey <[email protected]>






Release notes
Adds support for event compression in the persisted queue, controlled by the per-pipeline
queue.compressionsetting, which defaults tonone.What does this PR do?
Adds non-breaking support for event compression to the persisted queue, as
configured by a new per-pipeline setting
queue.compression, which supports:none(default): no compression is performed, but if compressed events are encountered in the queue they will be decompressedspeed: compression optimized for speed (minimal overhead, but less compression)balanced: compression balancing speed against result sizesize: compression optimized for maximum reduction of size (minimal size, but more resource-intensive)disabled: compression support entirely disabled; if a pipeline is run in this configuration against a PQ that already contains unacked compressed events, the pipeline WILL crash.This PR does necessary refactors as no-op stand-alone commits to make reviewing more straight-forward. It is best reviewed in commit order.
Why is it important/What is the impact to the user?
Disk IO is often a performance bottleneck when using the PQ. This feature allows users to spend available resources to reduce the size of events on disk, and therefore also the Disk IO.
Checklist
Author's Checklist
How to test this PR locally
example-input.ndjsonwith event contents-Sto setqueue.type=persisted,queue.drain=true, andqueue.compression=size:lsq-pagedump:Related issues
Use cases