Skip to content

Commit 488e45f

Browse files
authored
Merge pull request #3547 from facebook/seekable_doc
added documentation for the seekable format
2 parents 91f4c23 + dd8cb5a commit 488e45f

File tree

2 files changed

+55
-4
lines changed

2 files changed

+55
-4
lines changed

contrib/seekable_format/README.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Zstandard Seekable Format
2+
3+
The seekable format splits compressed data into a series of independent "frames",
4+
each compressed individually,
5+
so that decompression of a section in the middle of an archive
6+
only requires zstd to decompress at most a frame's worth of extra data,
7+
instead of the entire archive.
8+
9+
The frames are appended, so that the decompression of the entire payload
10+
still regenerates the original content, using any compliant zstd decoder.
11+
12+
On top of that, the seekable format generates a jump table,
13+
which makes it possible to jump directly to the position of the relevant frame
14+
when requesting only a segment of the data.
15+
The jump table is simply ignored by zstd decoders unaware of the seekable format.
16+
17+
The format is delivered with an API to create seekable archives
18+
and to retrieve arbitrary segments inside the archive.
19+
20+
### Maximum Frame Size parameter
21+
22+
When creating a seekable archive, the main parameter is the maximum frame size.
23+
24+
At compression time, user can manually select the boundaries between segments,
25+
but they don't have to: long segments will be automatically split
26+
when larger than selected maximum frame size.
27+
28+
Small frame sizes reduce decompression cost when requesting small segments,
29+
because the decoder will nonetheless have to decompress an entire frame
30+
to recover just a single byte from it.
31+
32+
A good rule of thumb is to select a maximum frame size roughly equivalent
33+
to the access pattern when it's known.
34+
For example, if the application tends to request 4KB blocks,
35+
then it's a good idea to set a maximum frame size in the vicinity of 4 KB.
36+
37+
But small frame sizes also reduce compression ratio,
38+
and increase the cost for the jump table,
39+
so there is a balance to find.
40+
41+
In general, try to avoid really tiny frame sizes (<1 KB),
42+
which would have a large negative impact on compression ratio.

contrib/seekable_format/zstd_seekable.h

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -48,10 +48,19 @@ typedef struct ZSTD_seekTable_s ZSTD_seekTable;
4848
*
4949
* Use ZSTD_seekable_initCStream() to initialize a ZSTD_seekable_CStream object
5050
* for a new compression operation.
51-
* `maxFrameSize` indicates the size at which to automatically start a new
52-
* seekable frame. `maxFrameSize == 0` implies the default maximum size.
53-
* `checksumFlag` indicates whether or not the seek table should include frame
54-
* checksums on the uncompressed data for verification.
51+
* - `maxFrameSize` indicates the size at which to automatically start a new
52+
* seekable frame.
53+
* `maxFrameSize == 0` implies the default maximum size.
54+
* Smaller frame sizes allow faster decompression of small segments,
55+
* since retrieving a single byte requires decompression of
56+
* the full frame where the byte belongs.
57+
* In general, size the frames to roughly correspond to
58+
* the access granularity (when it's known).
59+
* But small sizes also reduce compression ratio.
60+
* Avoid really tiny frame sizes (< 1 KB),
61+
* that would hurt compression ratio considerably.
62+
* - `checksumFlag` indicates whether or not the seek table should include frame
63+
* checksums on the uncompressed data for verification.
5564
* @return : a size hint for input to provide for compression, or an error code
5665
* checkable with ZSTD_isError()
5766
*

0 commit comments

Comments
 (0)