Skip to content

Conversation

@brianpopow
Copy link
Collaborator

@brianpopow brianpopow commented Oct 31, 2021

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

This PR is meant to reduce the allocations during webp encoding significantly.

Related to #1786

Before:

BenchmarkDotNet=v0.13.0, OS=Windows 10.0.19043.1288 (21H1/May2021Update)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.100-rc.2.21505.57
  [Host]     : .NET 5.0.11 (5.0.1121.47308), X64 RyuJIT
  Job-OKVXPJ : .NET 5.0.11 (5.0.1121.47308), X64 RyuJIT
  Job-PBVPYV : .NET Core 3.1.20 (CoreCLR 4.700.21.47003, CoreFX 4.700.21.47101), X64 RyuJIT
  Job-SLAEGF : .NET Framework 4.8 (4.8.4420.0), X64 RyuJIT

IterationCount=3  LaunchCount=1  WarmupCount=3
|                     Method |        Job |              Runtime |             Arguments |    TestImage |      Mean |      Error |    StdDev | Ratio | RatioSD |       Gen 0 |     Gen 1 |     Gen 2 |  Allocated |
|--------------------------- |----------- |--------------------- |---------------------- |------------- |----------:|-----------:|----------:|------:|--------:|------------:|----------:|----------:|-----------:|
|        'Magick Webp Lossy' | Job-OKVXPJ |             .NET 5.0 | /p:DebugType=portable | Png/Bike.png |  26.78 ms |   4.110 ms |  0.225 ms |  0.15 |    0.00 |           - |         - |         - |      68 KB |
|    'ImageSharp Webp Lossy' | Job-OKVXPJ |             .NET 5.0 | /p:DebugType=portable | Png/Bike.png | 285.95 ms |  33.692 ms |  1.847 ms |  1.56 |    0.03 | 135000.0000 |         - |         - | 552,714 KB |
|     'Magick Webp Lossless' | Job-OKVXPJ |             .NET 5.0 | /p:DebugType=portable | Png/Bike.png | 183.18 ms |  34.303 ms |  1.880 ms |  1.00 |    0.00 |           - |         - |         - |     520 KB |
| 'ImageSharp Webp Lossless' | Job-OKVXPJ |             .NET 5.0 | /p:DebugType=portable | Png/Bike.png | 347.68 ms |  67.105 ms |  3.678 ms |  1.90 |    0.02 |  34000.0000 | 5000.0000 | 2000.0000 | 161,675 KB |
|                            |            |                      |                       |              |           |            |           |       |         |             |           |           |            |
|        'Magick Webp Lossy' | Job-PBVPYV |        .NET Core 3.1 |               Default | Png/Bike.png |  26.77 ms |   8.193 ms |  0.449 ms |  0.15 |    0.00 |           - |         - |         - |      68 KB |
|    'ImageSharp Webp Lossy' | Job-PBVPYV |        .NET Core 3.1 |               Default | Png/Bike.png | 294.26 ms | 156.747 ms |  8.592 ms |  1.62 |    0.08 | 135000.0000 |         - |         - | 552,713 KB |
|     'Magick Webp Lossless' | Job-PBVPYV |        .NET Core 3.1 |               Default | Png/Bike.png | 181.80 ms |  69.170 ms |  3.791 ms |  1.00 |    0.00 |           - |         - |         - |     520 KB |
| 'ImageSharp Webp Lossless' | Job-PBVPYV |        .NET Core 3.1 |               Default | Png/Bike.png | 357.09 ms | 433.254 ms | 23.748 ms |  1.97 |    0.16 |  34000.0000 | 5000.0000 | 2000.0000 | 161,668 KB |
|                            |            |                      |                       |              |           |            |           |       |         |             |           |           |            |
|        'Magick Webp Lossy' | Job-SLAEGF | .NET Framework 4.7.2 |               Default | Png/Bike.png |  26.63 ms |   2.530 ms |  0.139 ms |  0.14 |    0.00 |           - |         - |         - |      68 KB |
|    'ImageSharp Webp Lossy' | Job-SLAEGF | .NET Framework 4.7.2 |               Default | Png/Bike.png | 429.41 ms |  38.772 ms |  2.125 ms |  2.33 |    0.01 | 135000.0000 |         - |         - | 554,351 KB |
|     'Magick Webp Lossless' | Job-SLAEGF | .NET Framework 4.7.2 |               Default | Png/Bike.png | 183.96 ms |  33.354 ms |  1.828 ms |  1.00 |    0.00 |           - |         - |         - |     523 KB |
| 'ImageSharp Webp Lossless' | Job-SLAEGF | .NET Framework 4.7.2 |               Default | Png/Bike.png | 428.93 ms |  60.292 ms |  3.305 ms |  2.33 |    0.04 |  34000.0000 | 5000.0000 | 2000.0000 | 162,125 KB |

After

|                     Method |        Job |              Runtime |    TestImage |      Mean |      Error |    StdDev | Ratio | RatioSD |     Gen 0 |     Gen 1 |     Gen 2 | Allocated |
|--------------------------- |----------- |--------------------- |------------- |----------:|-----------:|----------:|------:|--------:|----------:|----------:|----------:|----------:|
|        'Magick Webp Lossy' | Job-TRNEBU |             .NET 5.0 | Png/Bike.png |  26.38 ms |   4.031 ms |  0.221 ms |  0.14 |    0.00 |         - |         - |         - |     68 KB |
|    'ImageSharp Webp Lossy' | Job-TRNEBU |             .NET 5.0 | Png/Bike.png | 232.16 ms |  50.122 ms |  2.747 ms |  1.27 |    0.02 | 4000.0000 |         - |         - | 16,704 KB |
|     'Magick Webp Lossless' | Job-TRNEBU |             .NET 5.0 | Png/Bike.png | 182.79 ms |   8.574 ms |  0.470 ms |  1.00 |    0.00 |         - |         - |         - |    520 KB |
| 'ImageSharp Webp Lossless' | Job-TRNEBU |             .NET 5.0 | Png/Bike.png | 350.50 ms |  66.007 ms |  3.618 ms |  1.92 |    0.02 | 8000.0000 | 3000.0000 | 1000.0000 | 61,556 KB |
|                            |            |                      |              |           |            |           |       |         |           |           |           |           |
|        'Magick Webp Lossy' | Job-XLXDYZ |        .NET Core 3.1 | Png/Bike.png |  25.11 ms |   8.152 ms |  0.447 ms |  0.14 |    0.01 |         - |         - |         - |     68 KB |
|    'ImageSharp Webp Lossy' | Job-XLXDYZ |        .NET Core 3.1 | Png/Bike.png | 241.75 ms | 245.192 ms | 13.440 ms |  1.36 |    0.11 | 4000.0000 |         - |         - | 16,703 KB |
|     'Magick Webp Lossless' | Job-XLXDYZ |        .NET Core 3.1 | Png/Bike.png | 178.20 ms | 102.407 ms |  5.613 ms |  1.00 |    0.00 |         - |         - |         - |    520 KB |
| 'ImageSharp Webp Lossless' | Job-XLXDYZ |        .NET Core 3.1 | Png/Bike.png | 332.18 ms | 112.166 ms |  6.148 ms |  1.87 |    0.07 | 8000.0000 | 3000.0000 | 1000.0000 | 61,555 KB |
|                            |            |                      |              |           |            |           |       |         |           |           |           |           |
|        'Magick Webp Lossy' | Job-XXCPPX | .NET Framework 4.7.2 | Png/Bike.png |  26.47 ms |   1.311 ms |  0.072 ms |  0.14 |    0.00 |         - |         - |         - |     68 KB |
|    'ImageSharp Webp Lossy' | Job-XXCPPX | .NET Framework 4.7.2 | Png/Bike.png | 473.98 ms |  87.433 ms |  4.793 ms |  2.59 |    0.02 | 4000.0000 |         - |         - | 16,753 KB |
|     'Magick Webp Lossless' | Job-XXCPPX | .NET Framework 4.7.2 | Png/Bike.png | 182.86 ms |  26.630 ms |  1.460 ms |  1.00 |    0.00 |         - |         - |         - |    523 KB |
| 'ImageSharp Webp Lossless' | Job-XXCPPX | .NET Framework 4.7.2 | Png/Bike.png | 518.88 ms | 128.756 ms |  7.058 ms |  2.84 |    0.05 | 8000.0000 | 3000.0000 | 1000.0000 | 61,697 KB |

@brianpopow brianpopow linked an issue Oct 31, 2021 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Oct 31, 2021

Codecov Report

Merging #1799 (e97c364) into master (5a7c1f4) will increase coverage by 0.01%.
The diff coverage is 98.97%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1799      +/-   ##
==========================================
+ Coverage   87.17%   87.18%   +0.01%     
==========================================
  Files         936      936              
  Lines       47848    47920      +72     
  Branches     6010     6012       +2     
==========================================
+ Hits        41710    41778      +68     
- Misses       5145     5148       +3     
- Partials      993      994       +1     
Flag Coverage Δ
unittests 87.18% <98.97%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/ImageSharp/Formats/Webp/WebpCommonUtils.cs 94.93% <95.23%> (-0.13%) ⬇️
src/ImageSharp/Formats/Webp/Lossy/QuantEnc.cs 96.69% <96.07%> (+0.25%) ⬆️
.../Formats/Webp/Lossless/BackwardReferenceEncoder.cs 92.89% <100.00%> (+0.07%) ⬆️
...ageSharp/Formats/Webp/Lossless/HistogramEncoder.cs 96.57% <100.00%> (+0.12%) ⬆️
...rc/ImageSharp/Formats/Webp/Lossless/HuffmanTree.cs 100.00% <100.00%> (ø)
...c/ImageSharp/Formats/Webp/Lossless/HuffmanUtils.cs 96.61% <100.00%> (ø)
.../ImageSharp/Formats/Webp/Lossless/LosslessUtils.cs 90.81% <100.00%> (ø)
src/ImageSharp/Formats/Webp/Lossless/PixOrCopy.cs 100.00% <100.00%> (ø)
...ageSharp/Formats/Webp/Lossless/PredictorEncoder.cs 92.81% <100.00%> (+0.08%) ⬆️
...rc/ImageSharp/Formats/Webp/Lossless/Vp8LEncoder.cs 97.39% <100.00%> (+0.02%) ⬆️
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a7c1f4...e97c364. Read the comment docs.

@brianpopow brianpopow requested a review from a team October 31, 2021 23:09
Copy link
Member

@JimBobSquarePants JimBobSquarePants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement 👍

Span<HuffmanTree> treeSlice = tree.AsSpan().Slice(0, treeSize);
treeSlice.Sort(HuffmanTree.Compare);
#else
HuffmanTree[] treeCopy = tree.AsSpan().Slice(0, treeSize).ToArray();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate that .NET Core 3.1 doesn't give us sort here. We could copy ArraySortHelper or even use something like SortUtility from Drawing but I don't know if it's worth the effort. @antonfirsov what do you think?

/// <summary>
/// Scratch buffer to reduce allocations.
/// </summary>
private readonly int[] scratch = new int[256];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the int[] one could use fixed buffer wrapped in a private struct. See this example.
(Note: the encapsulating class must be unsafe in order to make it work)

So scratch becomes a strong type and can have methods too 😄
It avoids the extra array allocation, but at the cost of a bigger Vp8LEncoder -- I don't know what is better.

For code elegance I'd prefer the fixed buffer / strong type approach, as there could be a method like Span<int> AsSpan(int start, int length, bool clear = true) and that like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or rent this array from an array pool / memory pool?
(Maybe that overkill here?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the array is too small to utilize the MemoryAllocator here.

On the fixed buffer wrapped in a private struct idea, I am undecided. @JimBobSquarePants what do you think should we go that way or should i leave it as it is?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use that trick in the jpeg decoder so there's precedence. The input to the enclosing type is always sanitized since your encoding so I'd go for it if you have the time.

[StructLayout(LayoutKind.Sequential)]
internal unsafe struct HuffmanTable
{
private bool isConfigured;
/// <summary>
/// Derived from the DHT marker. Sizes[k] = # of symbols with codes of length k bits; Sizes[0] is unused.
/// </summary>
public fixed byte Sizes[17];
/// <summary>
/// Derived from the DHT marker. Contains the symbols, in order of incremental code length.
/// </summary>
public fixed byte Values[256];
/// <summary>
/// Contains the largest code of length k (0 if none). MaxCode[17] is a sentinel to
/// ensure <see cref="HuffmanScanBuffer.DecodeHuffman"/> terminates.
/// </summary>
public fixed ulong MaxCode[18];
/// <summary>
/// Values[] offset for codes of length k ValOffset[k] = Values[] index of 1st symbol of code length
/// k, less the smallest code of length k; so given a code of length k, the corresponding symbol is
/// Values[code + ValOffset[k]].
/// </summary>
public fixed int ValOffset[19];
/// <summary>
/// Contains the length of bits for the given k value.
/// </summary>
public fixed byte LookaheadSize[JpegConstants.Huffman.LookupSize];
/// <summary>
/// Lookahead table: indexed by the next <see cref="JpegConstants.Huffman.LookupBits"/> bits of
/// the input data stream. If the next Huffman code is no more
/// than <see cref="JpegConstants.Huffman.LookupBits"/> bits long, we can obtain its length and
/// the corresponding symbol directly from this tables.
///
/// The lower 8 bits of each table entry contain the number of
/// bits in the corresponding Huffman code, or <see cref="JpegConstants.Huffman.LookupBits"/> + 1
/// if too long. The next 8 bits of each entry contain the symbol.
/// </summary>
public fixed byte LookaheadValue[JpegConstants.Huffman.LookupSize];

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mhm MemoryMarshal.CreateSpan does not exist in net472, netstandard1.3 and netstandard2.0. What would be the right way to get the Span there? I think, i miss something obvious.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's too bad. One would need to create the span via the pointer overload, but that needs a fixed-block which isn't feasible here (? or is it).

But: do we need to create a span at all? The span is needed to call Clear, and use indexed access to the elements. This can be done in the Scratch-type / buffer directly. So the buffer could be passed around instead of the span.
Note: the buffer is now a type, so can have methods too.

In order to clear effeciently for .NET Core+ one could create a span and call clear (as it's vectorized), for older runtimes just use a manual loop (maybe unrolled).
Or even better just rely on the runtime to zero-init (iif the [SkipLocalsInit] isn't applied, which isn't at the moment) the fixed sized buffer, so instead of clearing create a new buffer.

I'll think about this a bit more, if there is maybe any better approach to that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to change it to a struct, but without being able to create a Span from MemoryMarshal.CreateSpan it seems to make things more complicated then before. For example all methods in PredictorEncoder operate on Spans.
Maybe there is a way of doing it without overcomplicating things, but i dont see it.

I would like to keep it as it is for now.

Comment on lines +796 to +797
Span<int> symbols = this.scratch.AsSpan(0, 2);
symbols.Clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. with the comment above this could be

Span<int> symbols = this.scratch.AsSpan(0, 2, clear: true);

var huffmanTables = new HuffmanCode[numHTreeGroups * tableSize];
var hTreeGroups = new HTreeGroup[numHTreeGroups];
Span<HuffmanCode> huffmanTable = huffmanTables.AsSpan();
int[] codeLengths = new int[maxAlphabetSize];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rent from pool?

this.BitCount = new long[4, 3];
this.Scratch = new byte[WebpConstants.Bps * 16];
this.Scratch2 = new short[17 * 16];
this.Scratch3 = new int[16];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kind of fan a fixed sized buffers -- so would try to use these here too.
Yeah they make the object bigger (by the size of the buffer), but it avoids further objects that the GC needs to track, so most of the time it's advantageous.

Note: it's unsafe, so there are no bound checks emitted!

/// <summary>
/// Scratch buffer to reduce allocations.
/// </summary>
private readonly int[] scratch = new int[16];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too.
(I'll stop commenting on these buffer for the rest of the code).

@brianpopow
Copy link
Collaborator Author

@JimBobSquarePants any objections that we merge this PR as it is? I really would like to see how it does in combination with the other optimizations. I think the reduced allocations can be beneficial to performance, too.

@JimBobSquarePants
Copy link
Member

@brianpopow no objections, rock on🤘

@brianpopow brianpopow merged commit a06011a into master Nov 7, 2021
@brianpopow brianpopow deleted the bp/reduceallocations branch November 7, 2021 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants