-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Open
Labels
Description
using zstd 1.5.5, latest version as of writing
prepare an int array(each int occupies 4 bytes, little endian)
[0,30,60,90,...] 65536 ints, 65536*4 bytes
then compress it using various compression levels(simple compression, no dict):
2023-10-17 02:14:22.792 TRACE original size: 262144
2023-10-17 02:14:22.851 TRACE level 0 204162
2023-10-17 02:14:22.852 TRACE level 1 204103
2023-10-17 02:14:22.852 TRACE level 2 204118
2023-10-17 02:14:22.853 TRACE level 3 204162
2023-10-17 02:14:22.854 TRACE level 4 204136
2023-10-17 02:14:22.856 TRACE level 5 204147
2023-10-17 02:14:22.858 TRACE level 6 204141
2023-10-17 02:14:22.860 TRACE level 7 204161
2023-10-17 02:14:22.862 TRACE level 8 204161
2023-10-17 02:14:22.863 TRACE level 9 204161
2023-10-17 02:14:22.865 TRACE level 10 204161
2023-10-17 02:14:22.868 TRACE level 11 204165
2023-10-17 02:14:22.871 TRACE level 12 204161
2023-10-17 02:14:22.877 TRACE level 13 204143
2023-10-17 02:14:22.893 TRACE level 14 83240
2023-10-17 02:14:22.907 TRACE level 15 83240
2023-10-17 02:14:22.923 TRACE level 16 83242
2023-10-17 02:14:22.940 TRACE level 17 83242
2023-10-17 02:14:22.958 TRACE level 18 142849
2023-10-17 02:14:22.976 TRACE level 19 142849
2023-10-17 02:14:22.998 TRACE level 20 142849
2023-10-17 02:14:23.017 TRACE level 21 142849
2023-10-17 02:14:23.035 TRACE level 22 142849
as seen from the above output, higher compression level(18) starts resulting in larger compressed data
- is that in line with exceptions? I thought higher compression level should resulting in smaller compressed data, this one is over 70% larger. how can I produce the smallest output data(ignoring compress time and/or memory consumption)?
-- a search usingcompression level sizein issues results in no relative information in the first page, nor relative result in google :( sorry if this has already been brought up
and there's a related questions I'm putting into a same issue(forgive me :)
- the input data is relatively simple(low entropy), why isn't it compressed more? is there any tweak/flags that I should enable? the original data is a tsdb timestamp series, and I'd like not to change them(rearranging bytes or manually do delta compression), is there a recommend way to handle semi arithmetic progression/sequence case? (n.b. the delta is not always the same, it maybe 30,30,30,300,300,30,3600,3600,86400,30,30