Skip to content

Commit b29c880

Browse files
committed
1 parent 54efce3 commit b29c880

File tree

535 files changed

+22438
-6566
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

535 files changed

+22438
-6566
lines changed

CHANGELOG

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,63 @@
1+
v1.5.4 (Feb 2023)
2+
perf: +20% faster huffman decompression for targets that can't compile x64 assembly (#3449, @terrelln)
3+
perf: up to +10% faster streaming compression at levels 1-2 (#3114, @embg)
4+
perf: +4-13% for levels 5-12 by optimizing function generation (#3295, @terrelln)
5+
pref: +3-11% compression speed for `arm` target (#3199, #3164, #3145, #3141, #3138, @JunHe77 and #3139, #3160, @danlark1)
6+
perf: +5-30% faster dictionary compression at levels 1-4 (#3086, #3114, #3152, @embg)
7+
perf: +10-20% cold dict compression speed by prefetching CDict tables (#3177, @embg)
8+
perf: +1% faster compression by removing a branch in ZSTD_fast_noDict (#3129, @felixhandte)
9+
perf: Small compression ratio improvements in high compression mode (#2983, #3391, @Cyan4973 and #3285, #3302, @daniellerozenblit)
10+
perf: small speed improvement by better detecting `STATIC_BMI2` for `clang` (#3080, @TocarIP)
11+
perf: Improved streaming performance when `ZSTD_c_stableInBuffer` is set (#2974, @Cyan4973)
12+
cli: Asynchronous I/O for improved cli speed (#2975, #2985, #3021, #3022, @yoniko)
13+
cli: Change `zstdless` behavior to align with `zless` (#2909, @binhdvo)
14+
cli: Keep original file if `-c` or `--stdout` is given (#3052, @dirkmueller)
15+
cli: Keep original files when result is concatenated into a single output with `-o` (#3450, @Cyan4973)
16+
cli: Preserve Permissions and Ownership of regular files (#3432, @felixhandte)
17+
cli: Print zlib/lz4/lzma library versions with `-vv` (#3030, @terrelln)
18+
cli: Print checksum value for single frame files with `-lv` (#3332, @Cyan4973)
19+
cli: Print `dictID` when present with `-lv` (#3184, @htnhan)
20+
cli: when `stderr` is *not* the console, disable status updates, but preserve final summary (#3458, @Cyan4973)
21+
cli: support `--best` and `--no-name` in `gzip` compatibility mode (#3059, @dirkmueller)
22+
cli: support for `posix` high resolution timer `clock_gettime()`, for improved benchmark accuracy (#3423, @Cyan4973)
23+
cli: improved help/usage (`-h`, `-H`) formatting (#3094, @dirkmueller and #3385, @jonpalmisc)
24+
cli: Fix better handling of bogus numeric values (#3268, @ctkhanhly)
25+
cli: Fix input consists of multiple files _and_ `stdin` (#3222, @yoniko)
26+
cli: Fix tiny files passthrough (#3215, @cgbur)
27+
cli: Fix for `-r` on empty directory (#3027, @brailovich)
28+
cli: Fix empty string as argument for `--output-dir-*` (#3220, @embg)
29+
cli: Fix decompression memory usage reported by `-vv --long` (#3042, @u1f35c, and #3232, @zengyijing)
30+
cli: Fix infinite loop when empty input is passed to trainer (#3081, @terrelln)
31+
cli: Fix `--adapt` doesn't work when `--no-progress` is also set (#3354, @terrelln)
32+
api: Support for Block-Level Sequence Producer (#3333, @embg)
33+
api: Support for in-place decompression (#3432, @terrelln)
34+
api: New `ZSTD_CCtx_setCParams()` function, set all parameters defined in a `ZSTD_compressionParameters` structure (#3403, @Cyan4973)
35+
api: Streaming decompression detects incorrect header ID sooner (#3175, @Cyan4973)
36+
api: Window size resizing optimization for edge case (#3345, @daniellerozenblit)
37+
api: More accurate error codes for busy-loop scenarios (#3413, #3455, @Cyan4973)
38+
api: Fix limit overflow in `compressBound` and `decompressBound` (#3362, #3373, Cyan4973) reported by @nigeltao
39+
api: Deprecate several advanced experimental functions: streaming (#3408, @embg), copy (#3196, @mileshu)
40+
bug: Fix corruption that rarely occurs in 32-bit mode with wlog=25 (#3361, @terrelln)
41+
bug: Fix for block-splitter (#3033, @Cyan4973)
42+
bug: Fixes for Sequence Compression API (#3023, #3040, @Cyan4973)
43+
bug: Fix leaking thread handles on Windows (#3147, @animalize)
44+
bug: Fix timing issues with cmake/meson builds (#3166, #3167, #3170, @Cyan4973)
45+
build: Allow user to select legacy level for cmake (#3050, @shadchin)
46+
build: Enable legacy support by default in cmake (#3079, @niamster)
47+
build: Meson build script improvements (#3039, #3120, #3122, #3327, #3357, @eli-schwartz and #3276, @neheb)
48+
build: Add aarch64 to supported architectures for zstd_trace (#3054, @ooosssososos)
49+
build: support AIX architecture (#3219, @qiongsiwu)
50+
build: Fix `ZSTD_LIB_MINIFY` build macro, which now reduces static library size by half (#3366, @terrelln)
51+
build: Fix Windows issues with Multithreading translation layer (#3364, #3380, @yoniko) and ARM64 target (#3320, @cwoffenden)
52+
build: Fix `cmake` script (#3382, #3392, @terrelln and #3252 @Tachi107 and #3167 @Cyan4973)
53+
doc: Updated man page, providing more details for `--train` mode (#3112, @Cyan4973)
54+
doc: Add decompressor errata document (#3092, @terrelln)
55+
misc: Enable Intel CET (#2992, #2994, @hjl-tools)
56+
misc: Fix `contrib/` seekable format (#3058, @yhoogstrate and #3346, @daniellerozenblit)
57+
misc: Improve speed of the one-file library generator (#3241, @wahern and #3005, @cwoffenden)
58+
59+
v1.5.3 (dev version, unpublished)
60+
161
v1.5.2 (Jan, 2022)
262
perf: Regain Minimal memset()-ing During Reuse of Compression Contexts (@Cyan4973, #2969)
363
build: Build Zstd with `noexecstack` on All Architectures (@felixhandte, #2964)

CONTRIBUTING.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ New versions are being developed in the "dev" branch,
77
or in their own feature branch.
88
When they are deemed ready for a release, they are merged into "release".
99

10-
As a consequences, all contributions must stage first through "dev"
10+
As a consequence, all contributions must stage first through "dev"
1111
or their own feature branch.
1212

1313
## Pull Requests
@@ -134,11 +134,11 @@ It can be useful to look at additional static analyzers once in a while (and we
134134
- Static analyzers are full of false positive. The signal to noise ratio is actually pretty low.
135135
- A good CI policy is "zero-warning tolerance". That means that all issues must be solved, including false positives. This quickly becomes a tedious workload.
136136
- Multiple static analyzers will feature multiple kind of false positives, sometimes applying to the same code but in different ways leading to :
137-
+ torteous code, trying to please multiple constraints, hurting readability and therefore maintenance. Sometimes, such complexity introduce other more subtle bugs, that are just out of scope of the analyzers.
137+
+ tortuous code, trying to please multiple constraints, hurting readability and therefore maintenance. Sometimes, such complexity introduce other more subtle bugs, that are just out of scope of the analyzers.
138138
+ sometimes, these constraints are mutually exclusive : if one try to solve one, the other static analyzer will complain, they can't be both happy at the same time.
139139
- As if that was not enough, the list of false positives change with each version. It's hard enough to follow one static analyzer, but multiple ones with their own update agenda, this quickly becomes a massive velocity reducer.
140140
141-
This is different from running a static analyzer once in a while, looking at the output, and __cherry picking__ a few warnings that seem helpful, either because they detected a genuine risk of bug, or because it helps expressing the code in a way which is more readable or more difficult to misuse. These kind of reports can be useful, and are accepted.
141+
This is different from running a static analyzer once in a while, looking at the output, and __cherry picking__ a few warnings that seem helpful, either because they detected a genuine risk of bug, or because it helps expressing the code in a way which is more readable or more difficult to misuse. These kinds of reports can be useful, and are accepted.
142142
143143
## Continuous Integration
144144
CI tests run every time a pull request (PR) is created or updated. The exact tests
@@ -197,7 +197,7 @@ something subtle merged is extensive benchmarking. You will be doing us a great
197197
take the time to run extensive, long-duration, and potentially cross-(os, platform, process, etc)
198198
benchmarks on your end before submitting a PR. Of course, you will not be able to benchmark
199199
your changes on every single processor and os out there (and neither will we) but do that best
200-
you can:) We've adding some things to think about when benchmarking below in the Benchmarking
200+
you can:) We've added some things to think about when benchmarking below in the Benchmarking
201201
Performance section which might be helpful for you.
202202
3. Optimizing performance for a certain OS, processor vendor, compiler, or network system is a perfectly
203203
legitimate thing to do as long as it does not harm the overall performance health of Zstd.
@@ -273,7 +273,7 @@ for that options you have just provided. If you want to look at the internals of
273273
benchmarking script works, you can check out programs/benchzstd.c
274274
275275
For example: say you have made a change that you believe improves the speed of zstd level 1. The
276-
very first thing you should use to asses whether you actually achieved any sort of improvement
276+
very first thing you should use to assess whether you actually achieved any sort of improvement
277277
is `zstd -b`. You might try to do something like this. Note: you can use the `-i` option to
278278
specify a running time for your benchmark in seconds (default is 3 seconds).
279279
Usually, the longer the running time, the more stable your results will be.
@@ -299,7 +299,7 @@ this method of evaluation will not be sufficient.
299299
### Profiling
300300
There are a number of great profilers out there. We're going to briefly mention how you can
301301
profile your code using `instruments` on mac, `perf` on linux and `visual studio profiler`
302-
on windows.
302+
on Windows.
303303
304304
Say you have an idea for a change that you think will provide some good performance gains
305305
for level 1 compression on Zstd. Typically this means, you have identified a section of
@@ -315,8 +315,8 @@ might be).
315315
316316
Most profilers (including the profilers discussed below) will generate a call graph of
317317
functions for you. Your goal will be to find your function of interest in this call graph
318-
and then inspect the time spent inside of it. You might also want to to look at the
319-
annotated assembly which most profilers will provide you with.
318+
and then inspect the time spent inside of it. You might also want to look at the annotated
319+
assembly which most profilers will provide you with.
320320
321321
#### Instruments
322322
We will once again consider the scenario where you think you've identified a piece of code
@@ -330,7 +330,7 @@ Instruments.
330330
* You will want a benchmark that runs for at least a few seconds (5 seconds will
331331
usually be long enough). This way the profiler will have something to work with
332332
and you will have ample time to attach your profiler to this process:)
333-
* I will just use benchzstd as my bencharmking script for this example:
333+
* I will just use benchzstd as my benchmarmking script for this example:
334334
```
335335
$ zstd -b1 -i5 <my-data> # this will run for 5 seconds
336336
```
@@ -455,7 +455,7 @@ This design requirement is fundamental to preserve the portability of the code b
455455
Any variable that can be `const` (aka. read-only) **must** be `const`.
456456
Any pointer which content will not be modified must be `const`.
457457
This property is then controlled at compiler level.
458-
`const` variables are an important signal to readers that this variable isnt modified.
458+
`const` variables are an important signal to readers that this variable isn't modified.
459459
Conversely, non-const variables are a signal to readers to watch out for modifications later on in the function.
460460
* If a function must be inlined, mention it explicitly,
461461
using project's own portable macros, such as `FORCE_INLINE_ATTR`,

LICENSE

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ BSD License
22

33
For Zstandard software
44

5-
Copyright (c) 2016-present, Facebook, Inc. All rights reserved.
5+
Copyright (c) Meta Platforms, Inc. and affiliates. All rights reserved.
66

77
Redistribution and use in source and binary forms, with or without modification,
88
are permitted provided that the following conditions are met:
@@ -14,9 +14,9 @@ are permitted provided that the following conditions are met:
1414
this list of conditions and the following disclaimer in the documentation
1515
and/or other materials provided with the distribution.
1616

17-
* Neither the name Facebook nor the names of its contributors may be used to
18-
endorse or promote products derived from this software without specific
19-
prior written permission.
17+
* Neither the name Facebook, nor Meta, nor the names of its contributors may
18+
be used to endorse or promote products derived from this software without
19+
specific prior written permission.
2020

2121
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
2222
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED

Makefile

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# ################################################################
2-
# Copyright (c) 2015-2021, Yann Collet, Facebook, Inc.
2+
# Copyright (c) Meta Platforms, Inc. and affiliates.
33
# All rights reserved.
44
#
55
# This source code is licensed under both the BSD-style license (found in the
@@ -123,6 +123,7 @@ contrib: lib
123123
$(MAKE) -C contrib/seekable_format/examples all
124124
$(MAKE) -C contrib/seekable_format/tests test
125125
$(MAKE) -C contrib/largeNbDicts all
126+
$(MAKE) -C contrib/externalSequenceProducer all
126127
cd build/single_file_libs/ ; ./build_decoder_test.sh
127128
cd build/single_file_libs/ ; ./build_library_test.sh
128129

@@ -142,6 +143,7 @@ clean:
142143
$(Q)$(MAKE) -C contrib/seekable_format/examples $@ > $(VOID)
143144
$(Q)$(MAKE) -C contrib/seekable_format/tests $@ > $(VOID)
144145
$(Q)$(MAKE) -C contrib/largeNbDicts $@ > $(VOID)
146+
$(Q)$(MAKE) -C contrib/externalSequenceProducer $@ > $(VOID)
145147
$(Q)$(RM) zstd$(EXT) zstdmt$(EXT) tmp*
146148
$(Q)$(RM) -r lz4
147149
@echo Cleaning completed
@@ -157,7 +159,7 @@ MKDIR ?= mkdir -p
157159

158160
HAVE_COLORNEVER = $(shell echo a | egrep --color=never a > /dev/null 2> /dev/null && echo 1 || echo 0)
159161
EGREP_OPTIONS ?=
160-
ifeq ($HAVE_COLORNEVER, 1)
162+
ifeq ($(HAVE_COLORNEVER), 1)
161163
EGREP_OPTIONS += --color=never
162164
endif
163165
EGREP = egrep $(EGREP_OPTIONS)
@@ -334,6 +336,8 @@ tsan-%: clean
334336

335337
.PHONY: apt-install
336338
apt-install:
339+
# TODO: uncomment once issue 3011 is resolved and remove hack from Github Actions .yml
340+
# sudo apt-get update
337341
sudo apt-get -yq --no-install-suggests --no-install-recommends --force-yes install $(APT_PACKAGES)
338342

339343
.PHONY: apt-add-repo
@@ -400,7 +404,7 @@ cmakebuild:
400404

401405
c89build: clean
402406
$(CC) -v
403-
CFLAGS="-std=c89 -Werror -O0" $(MAKE) allmost # will fail, due to missing support for `long long`
407+
CFLAGS="-std=c89 -Werror -Wno-attributes -Wpedantic -Wno-long-long -Wno-variadic-macros -O0" $(MAKE) lib zstd
404408

405409
gnu90build: clean
406410
$(CC) -v

README.md

Lines changed: 24 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Zstandard's format is stable and documented in [RFC8878](https://datatracker.iet
88
This repository represents the reference implementation, provided as an open-source dual [BSD](LICENSE) and [GPLv2](COPYING) licensed **C** library,
99
and a command line utility producing and decoding `.zst`, `.gz`, `.xz` and `.lz4` files.
1010
Should your project require another programming language,
11-
a list of known ports and bindings is provided on [Zstandard homepage](http://www.zstd.net/#other-languages).
11+
a list of known ports and bindings is provided on [Zstandard homepage](https://facebook.github.io/zstd/#other-languages).
1212

1313
**Development branch status:**
1414

@@ -39,7 +39,7 @@ compiled with [gcc] 9.3.0,
3939
on the [Silesia compression corpus].
4040

4141
[lzbench]: https://github.com/inikep/lzbench
42-
[Silesia compression corpus]: http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
42+
[Silesia compression corpus]: https://sun.aei.polsl.pl//~sdeor/index.php?page=silesia
4343
[gcc]: https://gcc.gnu.org/
4444

4545
| Compressor name | Ratio | Compression| Decompress.|
@@ -56,8 +56,8 @@ on the [Silesia compression corpus].
5656
| lzf 3.6 -1 | 2.077 | 410 MB/s | 830 MB/s |
5757
| snappy 1.1.9 | 2.073 | 550 MB/s | 1750 MB/s |
5858

59-
[zlib]: http://www.zlib.net/
60-
[lz4]: http://www.lz4.org/
59+
[zlib]: https://www.zlib.net/
60+
[lz4]: https://lz4.github.io/lz4/
6161

6262
The negative compression levels, specified with `--fast=#`,
6363
offer faster compression and decompression speed
@@ -124,14 +124,27 @@ Dictionary gains are mostly effective in the first few KB. Then, the compression
124124

125125
## Build instructions
126126

127+
`make` is the officially maintained build system of this project.
128+
All other build systems are "compatible" and 3rd-party maintained,
129+
they may feature small differences in advanced options.
130+
When your system allows it, prefer using `make` to build `zstd` and `libzstd`.
131+
127132
### Makefile
128133

129134
If your system is compatible with standard `make` (or `gmake`),
130135
invoking `make` in root directory will generate `zstd` cli in root directory.
136+
It will also create `libzstd` into `lib/`.
131137

132138
Other available options include:
133139
- `make install` : create and install zstd cli, library and man pages
134-
- `make check` : create and run `zstd`, tests its behavior on local platform
140+
- `make check` : create and run `zstd`, test its behavior on local platform
141+
142+
The `Makefile` follows the [GNU Standard Makefile conventions](https://www.gnu.org/prep/standards/html_node/Makefile-Conventions.html),
143+
allowing staged install, standard flags, directory variables and command variables.
144+
145+
For advanced use cases, specialized compilation flags which control binary generation
146+
are documented in [`lib/README.md`](lib/README.md#modular-build) for the `libzstd` library
147+
and in [`programs/README.md`](programs/README.md#compilation-variables) for the `zstd` CLI.
135148

136149
### cmake
137150

@@ -178,13 +191,15 @@ The output binary will be in `buck-out/gen/programs/`.
178191

179192
## Testing
180193

181-
You can run quick local smoke tests by executing the `playTest.sh` script from the `src/tests` directory.
182-
Two env variables `$ZSTD_BIN` and `$DATAGEN_BIN` are needed for the test script to locate the zstd and datagen binary.
183-
For information on CI testing, please refer to TESTING.md
194+
You can run quick local smoke tests by running `make check`.
195+
If you can't use `make`, execute the `playTest.sh` script from the `src/tests` directory.
196+
Two env variables `$ZSTD_BIN` and `$DATAGEN_BIN` are needed for the test script to locate the `zstd` and `datagen` binary.
197+
For information on CI testing, please refer to `TESTING.md`.
184198

185199
## Status
186200

187-
Zstandard is currently deployed within Facebook. It is used continuously to compress large amounts of data in multiple formats and use cases.
201+
Zstandard is currently deployed within Facebook and many other large cloud infrastructures.
202+
It is run continuously to compress large amounts of data in multiple formats and use cases.
188203
Zstandard is considered safe for production environments.
189204

190205
## License

TESTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ They consist of the following tests:
2222
- `tests/playTests.sh --test-large-data`
2323
- Fuzzer tests: `tests/fuzzer.c`, `tests/zstreamtest.c`, and `tests/decodecorpus.c`
2424
- `tests/zstreamtest.c` under Tsan (streaming mode, including multithreaded mode)
25-
- Valgrind Test (`make -C tests valgrindTest`) (testing CLI and fuzzer under valgrind)
25+
- Valgrind Test (`make -C tests test-valgrind`) (testing CLI and fuzzer under `valgrind`)
2626
- Fuzzer tests (see above) on ARM, AArch64, PowerPC, and PowerPC64
2727

2828
Long Tests

build/.gitignore

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Visual C++
2+
.vs/
3+
*Copy
4+
*.db
5+
*.opensdf
6+
*.sdf
7+
*.suo
8+
*.user
9+
*.opendb
10+
11+
VS2008/bin/
12+
VS2010/bin/
13+
VS2010/zwrapbench/
14+
VS2012/
15+
VS2013/
16+
VS2015/
17+
Studio*
18+
19+
# CMake
20+
cmake/build/
21+
CMakeCache.txt
22+
CMakeFiles
23+
CMakeScripts
24+
Testing
25+
Makefile
26+
cmake_install.cmake
27+
install_manifest.txt
28+
compile_commands.json
29+
CTestTestfile.cmake
30+
build
31+
lib
32+
!cmake/lib
33+
!meson/lib

build/LICENSE

Whitespace-only changes.

0 commit comments

Comments
 (0)