-
-
Notifications
You must be signed in to change notification settings - Fork 887
Webp encoding: Reduce allocations. #1799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1799 +/- ##
==========================================
+ Coverage 87.17% 87.18% +0.01%
==========================================
Files 936 936
Lines 47848 47920 +72
Branches 6010 6012 +2
==========================================
+ Hits 41710 41778 +68
- Misses 5145 5148 +3
- Partials 993 994 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
JimBobSquarePants
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement 👍
| Span<HuffmanTree> treeSlice = tree.AsSpan().Slice(0, treeSize); | ||
| treeSlice.Sort(HuffmanTree.Compare); | ||
| #else | ||
| HuffmanTree[] treeCopy = tree.AsSpan().Slice(0, treeSize).ToArray(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unfortunate that .NET Core 3.1 doesn't give us sort here. We could copy ArraySortHelper or even use something like SortUtility from Drawing but I don't know if it's worth the effort. @antonfirsov what do you think?
| /// <summary> | ||
| /// Scratch buffer to reduce allocations. | ||
| /// </summary> | ||
| private readonly int[] scratch = new int[256]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of the int[] one could use fixed buffer wrapped in a private struct. See this example.
(Note: the encapsulating class must be unsafe in order to make it work)
So scratch becomes a strong type and can have methods too 😄
It avoids the extra array allocation, but at the cost of a bigger Vp8LEncoder -- I don't know what is better.
For code elegance I'd prefer the fixed buffer / strong type approach, as there could be a method like Span<int> AsSpan(int start, int length, bool clear = true) and that like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or rent this array from an array pool / memory pool?
(Maybe that overkill here?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the array is too small to utilize the MemoryAllocator here.
On the fixed buffer wrapped in a private struct idea, I am undecided. @JimBobSquarePants what do you think should we go that way or should i leave it as it is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use that trick in the jpeg decoder so there's precedence. The input to the enclosing type is always sanitized since your encoding so I'd go for it if you have the time.
| [StructLayout(LayoutKind.Sequential)] | |
| internal unsafe struct HuffmanTable | |
| { | |
| private bool isConfigured; | |
| /// <summary> | |
| /// Derived from the DHT marker. Sizes[k] = # of symbols with codes of length k bits; Sizes[0] is unused. | |
| /// </summary> | |
| public fixed byte Sizes[17]; | |
| /// <summary> | |
| /// Derived from the DHT marker. Contains the symbols, in order of incremental code length. | |
| /// </summary> | |
| public fixed byte Values[256]; | |
| /// <summary> | |
| /// Contains the largest code of length k (0 if none). MaxCode[17] is a sentinel to | |
| /// ensure <see cref="HuffmanScanBuffer.DecodeHuffman"/> terminates. | |
| /// </summary> | |
| public fixed ulong MaxCode[18]; | |
| /// <summary> | |
| /// Values[] offset for codes of length k ValOffset[k] = Values[] index of 1st symbol of code length | |
| /// k, less the smallest code of length k; so given a code of length k, the corresponding symbol is | |
| /// Values[code + ValOffset[k]]. | |
| /// </summary> | |
| public fixed int ValOffset[19]; | |
| /// <summary> | |
| /// Contains the length of bits for the given k value. | |
| /// </summary> | |
| public fixed byte LookaheadSize[JpegConstants.Huffman.LookupSize]; | |
| /// <summary> | |
| /// Lookahead table: indexed by the next <see cref="JpegConstants.Huffman.LookupBits"/> bits of | |
| /// the input data stream. If the next Huffman code is no more | |
| /// than <see cref="JpegConstants.Huffman.LookupBits"/> bits long, we can obtain its length and | |
| /// the corresponding symbol directly from this tables. | |
| /// | |
| /// The lower 8 bits of each table entry contain the number of | |
| /// bits in the corresponding Huffman code, or <see cref="JpegConstants.Huffman.LookupBits"/> + 1 | |
| /// if too long. The next 8 bits of each entry contain the symbol. | |
| /// </summary> | |
| public fixed byte LookaheadValue[JpegConstants.Huffman.LookupSize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mhm MemoryMarshal.CreateSpan does not exist in net472, netstandard1.3 and netstandard2.0. What would be the right way to get the Span there? I think, i miss something obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's too bad. One would need to create the span via the pointer overload, but that needs a fixed-block which isn't feasible here (? or is it).
But: do we need to create a span at all? The span is needed to call Clear, and use indexed access to the elements. This can be done in the Scratch-type / buffer directly. So the buffer could be passed around instead of the span.
Note: the buffer is now a type, so can have methods too.
In order to clear effeciently for .NET Core+ one could create a span and call clear (as it's vectorized), for older runtimes just use a manual loop (maybe unrolled).
Or even better just rely on the runtime to zero-init (iif the [SkipLocalsInit] isn't applied, which isn't at the moment) the fixed sized buffer, so instead of clearing create a new buffer.
I'll think about this a bit more, if there is maybe any better approach to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to change it to a struct, but without being able to create a Span from MemoryMarshal.CreateSpan it seems to make things more complicated then before. For example all methods in PredictorEncoder operate on Spans.
Maybe there is a way of doing it without overcomplicating things, but i dont see it.
I would like to keep it as it is for now.
| Span<int> symbols = this.scratch.AsSpan(0, 2); | ||
| symbols.Clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. with the comment above this could be
Span<int> symbols = this.scratch.AsSpan(0, 2, clear: true);| var huffmanTables = new HuffmanCode[numHTreeGroups * tableSize]; | ||
| var hTreeGroups = new HTreeGroup[numHTreeGroups]; | ||
| Span<HuffmanCode> huffmanTable = huffmanTables.AsSpan(); | ||
| int[] codeLengths = new int[maxAlphabetSize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rent from pool?
| this.BitCount = new long[4, 3]; | ||
| this.Scratch = new byte[WebpConstants.Bps * 16]; | ||
| this.Scratch2 = new short[17 * 16]; | ||
| this.Scratch3 = new int[16]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm kind of fan a fixed sized buffers -- so would try to use these here too.
Yeah they make the object bigger (by the size of the buffer), but it avoids further objects that the GC needs to track, so most of the time it's advantageous.
Note: it's unsafe, so there are no bound checks emitted!
| /// <summary> | ||
| /// Scratch buffer to reduce allocations. | ||
| /// </summary> | ||
| private readonly int[] scratch = new int[16]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too.
(I'll stop commenting on these buffer for the rest of the code).
Co-authored-by: Günther Foidl <[email protected]>
|
@JimBobSquarePants any objections that we merge this PR as it is? I really would like to see how it does in combination with the other optimizations. I think the reduced allocations can be beneficial to performance, too. |
|
@brianpopow no objections, rock on🤘 |
Prerequisites
Description
This PR is meant to reduce the allocations during webp encoding significantly.
Related to #1786
Before:
After