-
Notifications
You must be signed in to change notification settings - Fork 88
docs(clp-s): Describe more compression options; Update out-of-date description of archive-path option for decompression and search.
#1030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
31a84ac
2dadb00
55c5cc7
12caeee
f4e55e2
7f0fca9
1f0529b
c485a09
ab2e69c
d610612
d3096a5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -12,11 +12,27 @@ Usage: | |
| ``` | ||
|
|
||
| * `archives-dir` is the directory that archives should be written to. | ||
| * `input-path` is any new-line-delimited JSON (ndjson) log file or directory containing such files. | ||
| * `options` allow you to specify things like which field should be considered as the log event's | ||
| timestamp (`--timestamp-key <field-path>`), or whether to fully parse array entries and encode | ||
| them into dedicated columns (`--structurize-arrays`). | ||
| * For a complete list, run `./clp-s c --help` | ||
| * `input-path` is any new-line-delimited JSON (ndjson) log file, KV-IR file, URL pointing to such | ||
| files, or directory containing such files. | ||
|
||
| * `options` allow you to specify how data gets compressed into an archive, for example: | ||
gibber9809 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * `--single-file-archive` specifies that single-file archives should be produced (i.e. each | ||
gibber9809 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| archive is a single file in `archives-dir`). | ||
| * `--file-type <json|kv-ir>` specifies whether the input files are encoded as ndjson or KV-IR. | ||
| * `--timestamp-key <field-path>` specifies which field should be treated as each log event's | ||
| timestamp. | ||
gibber9809 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| * `--target-encoded-size` specifies the threshold in bytes for the size of the dictionaries and | ||
| encoded messages at which archives are split. This acts as a soft limit on memory usage for | ||
| compression, decompression, and search and also has a significant effect on compression ratio. | ||
gibber9809 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * `--structurize-arrays` specifies that arrays should be fully parsed and array entries should be | ||
| encoded into dedicated columns. | ||
| * `--auth <s3|none>` specifies the authentication method that should be used for network requests | ||
| if the input path is a URL. When S3 authentication is enabled we issue a GET request following | ||
| the presigned URL v4 specification. This request draws on the environment variables | ||
|
||
| `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, and, optionally, `AWS_SESSION_TOKEN` if it exists. | ||
| For more information on usage with S3, see our | ||
| [dedicated guide](#guides-using-object-storage/index). | ||
gibber9809 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| For a complete list of options, run `./clp-s c --help`. | ||
|
|
||
| ### Examples | ||
|
|
||
|
|
@@ -37,6 +53,14 @@ Specifying the timestamp-key will create a range-index for the timestamp column | |
| compression ratio and search performance. | ||
| ::: | ||
|
|
||
| **Compress a KV-IR file stored on S3 to a single-file archive:** | ||
gibber9809 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ```shell | ||
| AWS_ACCESS_KEY_ID='...' AWS_SECRET_ACCESS_KEY='...' \ | ||
| ./clp-s c --single-file-archive --file-type kv-ir --auth s3 /mnt/data/archives \ | ||
| 'https://my-bucket.s3.us-east-2.amazonaws.com/kvir-log.clp' | ||
gibber9809 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ``` | ||
|
|
||
| **Set the target encoded size to 1 GiB and the compression level to 6 (3 by default)** | ||
|
|
||
| ```shell | ||
|
|
@@ -52,13 +76,14 @@ compression ratio and search performance. | |
| Usage: | ||
|
|
||
| ```shell | ||
| ./clp-s x [<options>] <archives-dir> <output-dir> | ||
| ./clp-s x [<options>] <archives-path> <output-dir> | ||
| ``` | ||
|
|
||
| * `archives-dir` is a directory containing archives. | ||
| * `archives-path` is a directory containing archives, a path to an archive, or a URL pointing to a | ||
| single-file archive. | ||
| * `output-dir` is the directory that decompressed logs should be written to. | ||
| * `options` allow you to specify things like a specific archive (from within `archives-dir`) to | ||
| decompress (`--archive-id <archive-id>`). | ||
| * `options` allow you to specify things like a specific archive (from within the directory | ||
| `archives-path`, if it is a directory) to decompress (`--archive-id <archive-id>`). | ||
gibber9809 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * For a complete list, run `./clp-s x --help` | ||
|
|
||
| ### Examples | ||
|
|
@@ -74,13 +99,14 @@ Usage: | |
| Usage: | ||
|
|
||
| ```shell | ||
| ./clp-s s [<options>] <archives-dir> <kql-query> | ||
| ./clp-s s [<options>] <archives-path> <kql-query> | ||
| ``` | ||
|
|
||
| * `archives-dir` is a directory containing archives. | ||
| * `archives-path` is a directory containing archives, a path to an archive, or a URL pointing to a | ||
| single-file archive. | ||
coderabbitai[bot] marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * `kql-query` is a [KQL](reference-json-search-syntax) query. | ||
| * `options` allow you to specify things like a specific archive (from within `archives-dir`) to | ||
| search (`--archive-id <archive-id>`). | ||
| * `options` allow you to specify things like a specific archive (from within the directory | ||
| `archives-path`, if it is a directory) to search (`--archive-id <archive-id>`). | ||
gibber9809 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * For a complete list, run `./clp-s s --help` | ||
|
|
||
| ### Examples | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The author of the ndjson spec has expressed willingness to deprecate the spec: https://www.github.com/ndjson/ndjson-spec/issues/35
Instead, JSON Lines (JSONL) was recommended. Shall we rename the references to JSONL?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. Right now I think we use ndjson very consistently throughout all of our documentation though, so it might be better to put this up as an issue and change all of the references at once in a separate PR so that the docs stay internally consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up as #1034