Commit 4b3cc06
authored
huff0: Pass a single bitReader pointer to asm (#634)
This makes the context object smaller and frees up three registers,
which we can use to replace the limitPtr and bufferOrigin stack
variables.
Benchmark results show a tiny win (Go 1.19beta, Core i7-3770K):
name old speed new speed delta
Decompress1XTable/digits-8 347MB/s ± 0% 347MB/s ± 0% ~ (p=0.650 n=8+10)
Decompress1XTable/gettysburg-8 268MB/s ± 0% 268MB/s ± 0% ~ (p=0.400 n=9+9)
Decompress1XTable/twain-8 327MB/s ± 0% 327MB/s ± 1% ~ (p=0.339 n=7+9)
Decompress1XTable/low-ent.10k-8 385MB/s ± 0% 385MB/s ± 1% ~ (p=0.510 n=9+10)
Decompress1XTable/superlow-ent-10k-8 376MB/s ± 0% 376MB/s ± 0% ~ (p=0.712 n=8+10)
Decompress1XTable/crash2-8 17.3MB/s ± 1% 17.3MB/s ± 1% ~ (p=0.926 n=10+10)
Decompress1XTable/endzerobits-8 52.9MB/s ± 1% 52.4MB/s ± 0% -0.94% (p=0.000 n=10+10)
Decompress1XTable/endnonzero-8 11.4MB/s ± 0% 11.4MB/s ± 1% ~ (p=0.343 n=10+10)
Decompress1XTable/case1-8 22.0MB/s ± 0% 22.0MB/s ± 0% ~ (p=0.618 n=9+9)
Decompress1XTable/case2-8 18.1MB/s ± 0% 18.1MB/s ± 0% ~ (p=0.348 n=9+9)
Decompress1XTable/case3-8 19.1MB/s ± 0% 19.1MB/s ± 0% +0.21% (p=0.048 n=10+10)
Decompress1XTable/pngdata.001-8 374MB/s ± 0% 374MB/s ± 0% ~ (p=0.861 n=9+10)
Decompress1XTable/normcount2-8 54.3MB/s ± 1% 54.5MB/s ± 1% ~ (p=0.093 n=10+10)
Decompress1XNoTable/digits/100-8 279MB/s ± 0% 280MB/s ± 0% +0.30% (p=0.003 n=10+9)
Decompress1XNoTable/digits/10000-8 366MB/s ± 0% 365MB/s ± 0% ~ (p=0.113 n=10+9)
Decompress1XNoTable/digits/262143-8 347MB/s ± 0% 347MB/s ± 1% ~ (p=0.739 n=10+10)
Decompress1XNoTable/gettysburg/100-8 278MB/s ± 1% 277MB/s ± 1% ~ (p=0.676 n=10+9)
Decompress1XNoTable/gettysburg/10000-8 363MB/s ± 1% 362MB/s ± 0% -0.50% (p=0.001 n=10+9)
Decompress1XNoTable/gettysburg/262143-8 350MB/s ± 0% 347MB/s ± 0% -0.90% (p=0.000 n=10+8)
Decompress1XNoTable/twain/100-8 268MB/s ± 0% 267MB/s ± 0% ~ (p=0.384 n=9+8)
Decompress1XNoTable/twain/10000-8 363MB/s ± 0% 362MB/s ± 0% -0.32% (p=0.000 n=9+9)
Decompress1XNoTable/twain/262143-8 328MB/s ± 0% 329MB/s ± 0% ~ (p=0.063 n=9+10)
Decompress1XNoTable/low-ent.10k/100-8 180MB/s ± 0% 181MB/s ± 0% ~ (p=0.225 n=10+10)
Decompress1XNoTable/low-ent.10k/10000-8 385MB/s ± 0% 385MB/s ± 0% ~ (p=0.289 n=10+10)
Decompress1XNoTable/low-ent.10k/262143-8 389MB/s ± 1% 389MB/s ± 1% ~ (p=0.971 n=10+10)
Decompress1XNoTable/superlow-ent-10k/262143-8 389MB/s ± 0% 390MB/s ± 0% +0.27% (p=0.017 n=9+10)
Decompress1XNoTable/crash2/100-8 278MB/s ± 0% 279MB/s ± 1% ~ (p=0.163 n=9+10)
Decompress1XNoTable/crash2/10000-8 373MB/s ± 1% 373MB/s ± 0% ~ (p=0.370 n=10+8)
Decompress1XNoTable/crash2/262143-8 375MB/s ± 0% 375MB/s ± 0% ~ (p=0.604 n=9+10)
Decompress1XNoTable/endzerobits/100-8 180MB/s ± 0% 181MB/s ± 0% +0.26% (p=0.005 n=10+9)
Decompress1XNoTable/endzerobits/10000-8 384MB/s ± 0% 385MB/s ± 0% ~ (p=0.914 n=8+10)
Decompress1XNoTable/endzerobits/262143-8 389MB/s ± 0% 390MB/s ± 0% ~ (p=0.739 n=10+10)
Decompress1XNoTable/endnonzero/100-8 180MB/s ± 1% 180MB/s ± 1% ~ (p=0.926 n=10+10)
Decompress1XNoTable/endnonzero/10000-8 384MB/s ± 0% 384MB/s ± 0% ~ (p=0.965 n=10+8)
Decompress1XNoTable/endnonzero/262143-8 390MB/s ± 0% 390MB/s ± 0% ~ (p=0.633 n=8+10)
Decompress1XNoTable/case1/100-8 282MB/s ± 0% 283MB/s ± 0% +0.34% (p=0.005 n=10+10)
Decompress1XNoTable/case1/10000-8 372MB/s ± 0% 373MB/s ± 0% ~ (p=0.113 n=9+9)
Decompress1XNoTable/case1/262143-8 374MB/s ± 0% 374MB/s ± 0% ~ (p=0.448 n=10+10)
Decompress1XNoTable/case2/100-8 274MB/s ± 1% 274MB/s ± 0% ~ (p=0.927 n=10+10)
Decompress1XNoTable/case2/10000-8 376MB/s ± 0% 376MB/s ± 0% ~ (p=0.408 n=10+8)
Decompress1XNoTable/case2/262143-8 376MB/s ± 1% 377MB/s ± 0% ~ (p=1.000 n=10+10)
Decompress1XNoTable/case3/100-8 266MB/s ± 0% 265MB/s ± 0% ~ (p=0.113 n=9+10)
Decompress1XNoTable/case3/10000-8 372MB/s ± 0% 372MB/s ± 0% ~ (p=0.075 n=10+9)
Decompress1XNoTable/case3/262143-8 374MB/s ± 0% 374MB/s ± 0% ~ (p=0.172 n=10+10)
Decompress1XNoTable/pngdata.001/100-8 238MB/s ± 0% 238MB/s ± 0% ~ (p=0.438 n=9+8)
Decompress1XNoTable/pngdata.001/10000-8 384MB/s ± 0% 384MB/s ± 0% ~ (p=0.448 n=10+10)
Decompress1XNoTable/pngdata.001/262143-8 378MB/s ± 0% 378MB/s ± 0% ~ (p=0.836 n=10+10)
Decompress1XNoTable/normcount2/100-8 281MB/s ± 0% 282MB/s ± 1% ~ (p=0.122 n=8+10)
Decompress1XNoTable/normcount2/10000-8 369MB/s ± 1% 369MB/s ± 0% ~ (p=0.912 n=10+10)
Decompress1XNoTable/normcount2/262143-8 370MB/s ± 0% 370MB/s ± 1% ~ (p=0.342 n=10+10)
Decompress4XNoTable/digits/100-8 197MB/s ± 0% 197MB/s ± 1% ~ (p=0.764 n=10+9)
Decompress4XNoTable/digits/10000-8 594MB/s ± 0% 602MB/s ± 1% +1.35% (p=0.000 n=10+10)
Decompress4XNoTable/digits/262143-8 570MB/s ± 1% 578MB/s ± 0% +1.30% (p=0.000 n=10+8)
Decompress4XNoTable/gettysburg/100-8 258MB/s ± 1% 260MB/s ± 0% +0.59% (p=0.001 n=10+10)
Decompress4XNoTable/gettysburg/10000-8 638MB/s ± 0% 641MB/s ± 0% +0.44% (p=0.000 n=9+9)
Decompress4XNoTable/gettysburg/262143-8 573MB/s ± 1% 574MB/s ± 0% ~ (p=0.353 n=10+10)
Decompress4XNoTable/twain/100-8 214MB/s ± 2% 214MB/s ± 2% ~ (p=0.853 n=10+10)
Decompress4XNoTable/twain/10000-8 634MB/s ± 1% 638MB/s ± 0% +0.62% (p=0.000 n=10+10)
Decompress4XNoTable/twain/262143-8 513MB/s ± 1% 517MB/s ± 0% +0.85% (p=0.000 n=10+10)
Decompress4XNoTable/low-ent.10k/100-8 195MB/s ± 0% 194MB/s ± 0% ~ (p=0.130 n=9+9)
Decompress4XNoTable/low-ent.10k/10000-8 635MB/s ± 0% 642MB/s ± 0% +1.19% (p=0.000 n=10+10)
Decompress4XNoTable/low-ent.10k/262143-8 675MB/s ± 0% 685MB/s ± 0% +1.51% (p=0.000 n=10+10)
Decompress4XNoTable/superlow-ent-10k/262143-8 673MB/s ± 1% 684MB/s ± 0% +1.70% (p=0.000 n=10+10)
Decompress4XNoTable/case1/100-8 206MB/s ± 1% 206MB/s ± 0% ~ (p=0.189 n=10+9)
Decompress4XNoTable/case1/10000-8 593MB/s ± 0% 601MB/s ± 0% +1.47% (p=0.000 n=10+10)
Decompress4XNoTable/case1/262143-8 603MB/s ± 0% 613MB/s ± 0% +1.64% (p=0.000 n=10+10)
Decompress4XNoTable/case2/100-8 201MB/s ± 0% 202MB/s ± 1% ~ (p=0.053 n=9+10)
Decompress4XNoTable/case2/10000-8 610MB/s ± 0% 618MB/s ± 0% +1.30% (p=0.000 n=9+10)
Decompress4XNoTable/case2/262143-8 622MB/s ± 1% 634MB/s ± 0% +1.90% (p=0.000 n=9+8)
Decompress4XNoTable/case3/100-8 197MB/s ± 1% 198MB/s ± 0% +0.53% (p=0.001 n=9+10)
Decompress4XNoTable/case3/10000-8 606MB/s ± 0% 615MB/s ± 0% +1.49% (p=0.000 n=8+10)
Decompress4XNoTable/case3/262143-8 613MB/s ± 1% 622MB/s ± 0% +1.48% (p=0.000 n=10+10)
Decompress4XNoTable/pngdata.001/100-8 212MB/s ± 1% 211MB/s ± 0% ~ (p=0.136 n=9+9)
Decompress4XNoTable/pngdata.001/10000-8 645MB/s ± 1% 649MB/s ± 1% +0.65% (p=0.000 n=9+10)
Decompress4XNoTable/pngdata.001/262143-8 640MB/s ± 1% 649MB/s ± 0% +1.44% (p=0.000 n=10+10)
Decompress4XNoTable/normcount2/100-8 260MB/s ± 1% 261MB/s ± 1% ~ (p=0.211 n=10+9)
Decompress4XNoTable/normcount2/10000-8 584MB/s ± 1% 591MB/s ± 0% +1.33% (p=0.000 n=9+9)
Decompress4XNoTable/normcount2/262143-8 588MB/s ± 1% 596MB/s ± 1% +1.39% (p=0.000 n=10+9)
Decompress4XNoTableTableLog8/digits-8 583MB/s ± 1% 592MB/s ± 0% +1.48% (p=0.000 n=10+10)
Decompress4XTable/digits-8 580MB/s ± 0% 588MB/s ± 0% +1.33% (p=0.000 n=8+10)
Decompress4XTable/gettysburg-8 368MB/s ± 1% 370MB/s ± 0% +0.59% (p=0.017 n=10+9)
Decompress4XTable/twain-8 510MB/s ± 0% 515MB/s ± 0% +0.99% (p=0.000 n=9+10)
Decompress4XTable/low-ent.10k-8 657MB/s ± 0% 665MB/s ± 0% +1.24% (p=0.000 n=10+10)
Decompress4XTable/superlow-ent-10k-8 608MB/s ± 0% 617MB/s ± 1% +1.48% (p=0.000 n=8+10)
Decompress4XTable/case1-8 21.1MB/s ± 1% 21.0MB/s ± 2% ~ (p=0.223 n=10+10)
Decompress4XTable/case2-8 17.6MB/s ± 0% 17.6MB/s ± 0% ~ (p=0.199 n=9+10)
Decompress4XTable/case3-8 18.7MB/s ± 0% 18.7MB/s ± 0% ~ (p=0.557 n=10+8)
Decompress4XTable/pngdata.001-8 633MB/s ± 1% 645MB/s ± 0% +1.90% (p=0.000 n=9+10)
Decompress4XTable/normcount2-8 49.9MB/s ± 1% 49.5MB/s ± 1% -0.64% (p=0.002 n=10+10)
[Geo mean] 270MB/s 271MB/s +0.36%1 parent b16a9af commit 4b3cc06
File tree
3 files changed
+382
-422
lines changed- huff0
- _generate
3 files changed
+382
-422
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
52 | | - | |
| 52 | + | |
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
| 60 | + | |
64 | 61 | | |
65 | 62 | | |
66 | 63 | | |
67 | 64 | | |
68 | 65 | | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
| 66 | + | |
| 67 | + | |
73 | 68 | | |
74 | 69 | | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
| 70 | + | |
79 | 71 | | |
80 | 72 | | |
81 | 73 | | |
82 | 74 | | |
83 | 75 | | |
84 | 76 | | |
85 | 77 | | |
86 | | - | |
| 78 | + | |
87 | 79 | | |
88 | | - | |
| 80 | + | |
89 | 81 | | |
90 | | - | |
| 82 | + | |
91 | 83 | | |
92 | | - | |
| 84 | + | |
93 | 85 | | |
94 | | - | |
| 86 | + | |
95 | 87 | | |
96 | 88 | | |
97 | 89 | | |
| |||
100 | 92 | | |
101 | 93 | | |
102 | 94 | | |
103 | | - | |
104 | | - | |
105 | | - | |
106 | | - | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
107 | 98 | | |
108 | 99 | | |
109 | 100 | | |
| |||
118 | 109 | | |
119 | 110 | | |
120 | 111 | | |
| 112 | + | |
121 | 113 | | |
122 | 114 | | |
123 | 115 | | |
| |||
157 | 149 | | |
158 | 150 | | |
159 | 151 | | |
160 | | - | |
161 | | - | |
162 | | - | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
163 | 156 | | |
164 | 157 | | |
165 | 158 | | |
| |||
171 | 164 | | |
172 | 165 | | |
173 | 166 | | |
174 | | - | |
175 | | - | |
| 167 | + | |
| 168 | + | |
176 | 169 | | |
177 | 170 | | |
178 | 171 | | |
179 | 172 | | |
180 | 173 | | |
181 | 174 | | |
182 | | - | |
183 | | - | |
184 | | - | |
185 | | - | |
| 175 | + | |
186 | 176 | | |
187 | 177 | | |
188 | 178 | | |
189 | 179 | | |
190 | 180 | | |
191 | | - | |
192 | | - | |
193 | | - | |
194 | | - | |
| 181 | + | |
| 182 | + | |
195 | 183 | | |
196 | 184 | | |
197 | | - | |
198 | | - | |
199 | | - | |
200 | | - | |
| 185 | + | |
201 | 186 | | |
202 | 187 | | |
203 | 188 | | |
204 | 189 | | |
205 | 190 | | |
206 | 191 | | |
207 | 192 | | |
208 | | - | |
| 193 | + | |
209 | 194 | | |
210 | | - | |
| 195 | + | |
211 | 196 | | |
212 | | - | |
| 197 | + | |
213 | 198 | | |
214 | | - | |
| 199 | + | |
215 | 200 | | |
216 | | - | |
| 201 | + | |
217 | 202 | | |
218 | 203 | | |
219 | 204 | | |
| |||
222 | 207 | | |
223 | 208 | | |
224 | 209 | | |
225 | | - | |
226 | | - | |
227 | | - | |
228 | | - | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
229 | 213 | | |
230 | 214 | | |
231 | 215 | | |
| |||
234 | 218 | | |
235 | 219 | | |
236 | 220 | | |
237 | | - | |
| 221 | + | |
238 | 222 | | |
239 | 223 | | |
240 | 224 | | |
| |||
269 | 253 | | |
270 | 254 | | |
271 | 255 | | |
272 | | - | |
273 | | - | |
274 | | - | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
275 | 260 | | |
276 | 261 | | |
277 | 262 | | |
| |||
281 | 266 | | |
282 | 267 | | |
283 | 268 | | |
284 | | - | |
285 | | - | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
286 | 272 | | |
287 | 273 | | |
288 | 274 | | |
289 | 275 | | |
290 | 276 | | |
291 | | - | |
| 277 | + | |
292 | 278 | | |
293 | 279 | | |
294 | 280 | | |
| |||
297 | 283 | | |
298 | 284 | | |
299 | 285 | | |
300 | | - | |
| 286 | + | |
301 | 287 | | |
302 | 288 | | |
303 | 289 | | |
| |||
306 | 292 | | |
307 | 293 | | |
308 | 294 | | |
309 | | - | |
| 295 | + | |
310 | 296 | | |
311 | 297 | | |
312 | 298 | | |
| |||
474 | 460 | | |
475 | 461 | | |
476 | 462 | | |
477 | | - | |
478 | | - | |
479 | | - | |
480 | | - | |
481 | | - | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
482 | 466 | | |
483 | 467 | | |
484 | 468 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
| 30 | + | |
34 | 31 | | |
35 | 32 | | |
36 | 33 | | |
| |||
89 | 86 | | |
90 | 87 | | |
91 | 88 | | |
92 | | - | |
93 | | - | |
94 | | - | |
95 | | - | |
| 89 | + | |
96 | 90 | | |
97 | 91 | | |
98 | 92 | | |
| |||
0 commit comments