@@ -16,7 +16,7 @@ Distribution of this document is unlimited.
1616
1717### Version
1818
19- 0.3.7 (2020-12-09 )
19+ 0.3.8 (2023-02-18 )
2020
2121
2222Introduction
@@ -470,6 +470,7 @@ This field uses 2 lowest bits of first byte, describing 4 different block types
470470 repeated ` Regenerated_Size ` times.
471471- ` Compressed_Literals_Block ` - This is a standard Huffman-compressed block,
472472 starting with a Huffman tree description.
473+ In this mode, there are at least 2 different literals represented in the Huffman tree description.
473474 See details below.
474475- ` Treeless_Literals_Block ` - This is a Huffman-compressed block,
475476 using Huffman tree _ from previous Huffman-compressed literals block_ .
@@ -566,6 +567,7 @@ or from a dictionary.
566567
567568### ` Huffman_Tree_Description `
568569This section is only present when ` Literals_Block_Type ` type is ` Compressed_Literals_Block ` (` 2 ` ).
570+ The tree describes the weights of all literals symbols that can be present in the literals block, at least 2 and up to 256.
569571The format of the Huffman tree description can be found at [ Huffman Tree description] ( #huffman-tree-description ) .
570572The size of ` Huffman_Tree_Description ` is determined during decoding process,
571573it must be used to determine where streams begin.
@@ -1197,7 +1199,7 @@ Huffman Coding
11971199--------------
11981200Zstandard Huffman-coded streams are read backwards,
11991201similar to the FSE bitstreams.
1200- Therefore, to find the start of the bitstream, it is therefore to
1202+ Therefore, to find the start of the bitstream, it is required to
12011203know the offset of the last byte of the Huffman-coded stream.
12021204
12031205After writing the last bit containing information, the compressor
@@ -1239,9 +1241,15 @@ Transformation from `Weight` to `Number_of_Bits` follows this formula :
12391241```
12401242Number_of_Bits = Weight ? (Max_Number_of_Bits + 1 - Weight) : 0
12411243```
1242- The last symbol's ` Weight ` is deduced from previously decoded ones,
1243- by completing to the nearest power of 2.
1244- This power of 2 gives ` Max_Number_of_Bits ` , the depth of the current tree.
1244+ When a literal value is not present, it receives a ` Weight ` of 0.
1245+ The least frequent symbol receives a ` Weight ` of 1.
1246+ Consequently, the ` Weight ` 1 is necessarily present.
1247+ The most frequent symbol receives a ` Weight ` anywhere between 1 and 11 (max).
1248+ The last symbol's ` Weight ` is deduced from previously retrieved Weights,
1249+ by completing to the nearest power of 2. It's necessarily non 0.
1250+ If it's not possible to reach a clean power of 2 with a single ` Weight ` value,
1251+ the Huffman Tree Description is considered invalid.
1252+ This final power of 2 gives ` Max_Number_of_Bits ` , the depth of the current tree.
12451253` Max_Number_of_Bits ` must be <= 11,
12461254otherwise the representation is considered corrupted.
12471255
@@ -1254,7 +1262,7 @@ Let's presume the following Huffman tree must be described :
12541262
12551263The tree depth is 4, since its longest elements uses 4 bits
12561264(longest elements are the one with smallest frequency).
1257- Value ` 5 ` will not be listed, as it can be determined from values for 0-4,
1265+ Literal value ` 5 ` will not be listed, as it can be determined from previous values 0-4,
12581266nor will values above ` 5 ` as they are all 0.
12591267Values from ` 0 ` to ` 4 ` will be listed using ` Weight ` instead of ` Number_of_Bits ` .
12601268Weight formula is :
@@ -1274,7 +1282,7 @@ The `Weight` of `5` can be determined by advancing to the next power of 2.
12741282The sum of ` 2^(Weight-1) ` (excluding 0's) is :
12751283` 8 + 4 + 2 + 0 + 1 = 15 ` .
12761284Nearest larger power of 2 value is 16.
1277- Therefore, ` Max_Number_of_Bits = 4 ` and ` Weight[5] = 16-15 = 1 ` .
1285+ Therefore, ` Max_Number_of_Bits = 4 ` and ` Weight[5] = log_2(16 - 15) + 1 = 1 ` .
12781286
12791287#### Huffman Tree header
12801288
@@ -1683,6 +1691,7 @@ or at least provide a meaningful error code explaining for which reason it canno
16831691
16841692Version changes
16851693---------------
1694+ - 0.3.8 : clarifications for Huffman Blocks and Huffman Tree descriptions.
16861695- 0.3.7 : clarifications for Repeat_Offsets, matching RFC8878
16871696- 0.3.6 : clarifications for Dictionary_ID
16881697- 0.3.5 : clarifications for Block_Maximum_Size
0 commit comments