Dither and Quantize API and Performance Improvements. #1138

JimBobSquarePants · 2020-03-03T13:43:52Z

Prerequisites

I have written a descriptive pull-request title
I have verified that there are no overlapping pull-requests open
I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
I have provided test coverage for my change (where applicable)

Description

The IDither interface contained implementation specific parameters in the method signatures. This has been simplified to the absolute bare minimum possible parameters. I also managed to remove boxing from the palette dither type.

In addition I simplified the IFrameQuantizer interface to remove palette reference ambiguity in the API.

I noticed some performance problems with the GifEncoder during development and as part of my refactoring I tackled some of those issues. Mean performance is around 3-4X faster with 25% less allocations on 2 out of 3 targets. (I have no idea what is so different in NET Core 2.1 memory usage)

Note: Allocations are due to usage of ConcurrentDictionary<TPixel,int> in the pixel map which cannot be avoided. I normalized the concurrency count across target frameworks to at least ensure allocations do not go wild on NETFX.

I've introduces some Unsafe.Add in the dithering and quantization loops but safe in the knowledge that bounds are carefully constrained long before those methods are called.

Before

Method	Runtime	Mean	Error	StdDev	Ratio	RatioSD	Gen 0	Gen 1	Gen 2	Allocated
'System.Drawing Gif'	.NET 4.7.2	12.93 ms	23.910 ms	1.311 ms	1.00	0.00	234.3750	-	-	1093.11 KB
'ImageSharp Gif'	.NET 4.7.2	86.65 ms	14.582 ms	0.799 ms	6.74	0.60	1142.8571	714.2857	285.7143	8155.4 KB

'System.Drawing Gif'	.NET Core 2.1	12.09 ms	2.494 ms	0.137 ms	1.00	0.00	234.3750	-	-	1089.88 KB
'ImageSharp Gif'	.NET Core 2.1	97.67 ms	128.094 ms	7.021 ms	8.08	0.55	800.0000	200.0000	-	737.43 KB

'System.Drawing Gif'	.NET Core 3.1	12.01 ms	1.054 ms	0.058 ms	1.00	0.00	15.6250	-	-	198.06 KB
'ImageSharp Gif'	.NET Core 3.1	71.03 ms	26.512 ms	1.453 ms	5.91	0.13	1142.8571	714.2857	285.7143	8123.59 KB

After

Method	Runtime	Mean	Error	StdDev	Ratio	RatioSD	Gen 0	Gen 1	Gen 2	Allocated
'System.Drawing Gif'	.NET 4.7.2	10.66 ms	1.325 ms	0.073 ms	1.00	0.00	234.3750	-	-	1093.01 KB
'ImageSharp Gif'	.NET 4.7.2	27.68 ms	7.814 ms	0.428 ms	2.60	0.05	1031.2500	625.0000	281.2500	6509.24 KB

'System.Drawing Gif'	.NET Core 2.1	11.01 ms	10.751 ms	0.589 ms	1.00	0.00	234.3750	-	-	1089.88 KB
'ImageSharp Gif'	.NET Core 2.1	24.18 ms	3.775 ms	0.207 ms	2.20	0.10	1031.2500	656.2500	312.5000	740.78 KB

'System.Drawing Gif'	.NET Core 3.1	10.49 ms	1.687 ms	0.092 ms	1.00	0.00	15.6250	-	-	197.95 KB
'ImageSharp Gif'	.NET Core 3.1	27.27 ms	41.280 ms	2.263 ms	2.60	0.23	1031.2500	625.0000	281.2500	6453.76 KB

A slight fix to the color matching code was implemented also with led to more accurate matching and distribution of error.

codecov · 2020-03-03T13:58:33Z

Codecov Report

Merging #1138 into master will increase coverage by 0.03%.
The diff coverage is 96.59%.

@@            Coverage Diff             @@
##           master    #1138      +/-   ##
==========================================
+ Coverage   82.23%   82.27%   +0.03%     
==========================================
  Files         678      678              
  Lines       29192    29216      +24     
  Branches     3284     3278       -6     
==========================================
+ Hits        24007    24038      +31     
+ Misses       4488     4483       -5     
+ Partials      697      695       -2

Flag	Coverage Δ
#unittests	`82.27% <96.59%> (+0.03%)`	⬆️

Impacted Files	Coverage Δ
src/ImageSharp/Formats/Gif/LzwEncoder.cs	`98.19% <ø> (ø)`	⬆️
...g/Processors/Dithering/OrderedDither.KnownTypes.cs	`100% <ø> (ø)`	⬆️
...ImageSharp/Formats/Png/PngEncoderOptionsHelpers.cs	`86.07% <ø> (ø)`	⬆️
...y/Allocators/ArrayPoolMemoryAllocator.Buffer{T}.cs	`80% <0%> (ø)`	⬆️
...ocessors/Quantization/QuantizeProcessor{TPixel}.cs	`100% <100%> (ø)`	⬆️
...cessing/Processors/Quantization/OctreeQuantizer.cs	`100% <100%> (ø)`	⬆️
...essing/Processors/Quantization/PaletteQuantizer.cs	`100% <100%> (ø)`	⬆️
...sors/Quantization/PaletteFrameQuantizer{TPixel}.cs	`100% <100%> (ø)`	⬆️
.../Processing/Processors/Quantization/WuQuantizer.cs	`100% <100%> (ø)`	⬆️
...p/Processing/Processors/Dithering/OrderedDither.cs	`98.76% <100%> (-0.14%)`	⬇️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ce8c213...71e5a6d. Read the comment docs.

antonfirsov · 2020-03-04T00:32:42Z

PaletteFrameQuantizer<T> and EuclideanPixelMap<T> were being around here for a while, but unfortunately I did not have a chance to have a deeper look at the implementation before, so sorry for bringing this question to the table this late, but we need to carefully think this over.

Is the "all memory in for speed" approach really what we want with Gif? Just pushed a change to also run the benchmark against rgb-48bpp.png, which is an image of wider color variety:

|               Method |         TestImage | Ratio |     Gen 0 |     Gen 1 |    Gen 2 |   Allocated |
|--------------------- |------------------ |------:|----------:|----------:|---------:|------------:|
| 'System.Drawing Gif' |       Bmp/Car.bmp |  1.00 |   15.6250 |         - |        - |   197.95 KB |
|     'ImageSharp Gif' |       Bmp/Car.bmp |  2.33 | 1093.7500 |  656.2500 | 312.5000 |  6453.27 KB |
|                      |                   |       |           |           |          |             |
| 'System.Drawing Gif' | Png/rgb-48bpp.png |  1.00 |         - |         - |        - |   393.23 KB |
|     'ImageSharp Gif' | Png/rgb-48bpp.png |  2.91 | 3250.0000 | 1500.0000 | 500.0000 | 23105.69 KB |

Note that the theoretical maximum distanceCache for Rgba32 is 256^4 ~= 4 billion (!) elements.

There are alternative approaches that would sacrifice some of the speed but allocate much less. The most naive thing I can think of: binary search, hopefully less than 10x slower, could be done without going down the rabbit hole researching literature. Do you remember how much slower was the linear search without caching? Can't find the PR introducing the cache, must be older than history itself :)

This reverts commit 8732717, reversing changes made to d9596ef.

antonfirsov · 2020-03-04T01:07:11Z

Ok, I realized binary search won't be enough, so I guess we can't find a good solution in a reasonable time :(

Sergio0694 · 2020-03-04T02:41:50Z

Looks like this could be the perfect use case scenario for a ConcurrentDictionarySlim<TKey, TValue> collection, if it existed. If we care about memory usage, I'm wondering whether maybe it'd be worth it to run some tests with a DictionarySlim<TKey, TValue> (see here) that is accessed either by a lock, or maybe even with just a shared SpinLock? Memory usage should go down, especially because DictionarySlim<TKey, TValue> doesn't box value types, plus it's meant for types with a fast GetHashCode implementation, so it doesn't store a copy of the code in the internal table, which saves an additional 4 bytes per entry. Could be worth looking into maybe? 🤔

antonfirsov · 2020-03-04T04:18:13Z

These micro-optimizations do not worth the efforts now IMO, we should rather focus on finding a better algorithm. An efficient k-d tree implementagion may do the job. This one looks promising. We can actually combine it with other techniques. I have an idea now, and I may be able to code it in a couple of hours. But in this PR I'll focus on API review from now on.

antonfirsov · 2020-03-04T04:33:02Z

Regarding locking vs ConcurrentDictionary: we can get rid of locking entirely, we just need a first pass to fill the dictionary.

JimBobSquarePants · 2020-03-04T07:20:36Z

@antonfirsov @Sergio0694 I would personally love to see a better implementation. I deliberately kept the EuclideanPixelMap<T> internal because I was not happy with the approach.

Looking forward to seeing what you can come up with!

antonfirsov

Mostly good, only one remark

antonfirsov · 2020-03-04T22:12:18Z

tests/ImageSharp.Tests/Formats/Gif/GifEncoderTests.cs


+        [Theory]
+        [WithFile(TestImages.Bmp.Car, PixelTypes.Rgba32)]
+        public void EncodeAllocationCheck<TPixel>(TestImageProvider<TPixel> provider)


I usually prefer to put these under ImageSharp.Tests/ProfilingBenchmarks, and keep them skipped.

I didn't actually mean to check that in!

antonfirsov · 2020-03-04T22:22:19Z

src/ImageSharp/Processing/Processors/Quantization/IFrameQuantizer{TPixel}.cs

-        /// <returns>The <see cref="ReadOnlyMemory{TPixel}"/> palette.</returns>
-        ReadOnlyMemory<TPixel> BuildPalette(ImageFrame<TPixel> source, Rectangle bounds);
+        /// <returns>The <see cref="ReadOnlySpan{TPixel}"/> palette.</returns>
+        ReadOnlySpan<TPixel> BuildPalette(ImageFrame<TPixel> source, Rectangle bounds);


Can't find any related guideline point, but I have a bad feeling returning a Span as a "result" of an operation, because the ownership of the backing memory is not clear from the API.

I think we should return void here, and publish a separate Palette property instead.

You've hit upon a point that led to me coding about 5 different version of this.

I struggled with a Palette property for a couple of reasons:

The result is temporally coupled since it is not assigned on construction. The palette could be null (If using ReadOnlyMemory<TPixel>) or empty ReadOnlySpan<TPixel>

Using ReadOnlyMemory<TPixel> can lead to a larger palette length than actually required.

The OctreeFrameQuantizer doesn't return the final number of entries until you build the palette. You have to build it then slice the result reducing the length by the determined index count, returning ReadOnlySpan allowed this.

https://github.com/SixLabors/ImageSharp/pull/1138/files#diff-2e327e0f312f1aae943b78b1145bd8adR90

If you don't do this you can end up with a larger Gif than you actually need. The WuFrameQuantizer is similar, but at least you know the length early on.

I'll give everything on last look over and see if I can improve things.

The palette could be null (If using ReadOnlyMemory<TPixel>) or empty ReadOnlySpan<TPixel>

We can also throw an InvalidOperationException if the palette has not yet been built.

Using ReadOnlyMemory<TPixel> can lead to a larger palette length than actually required. [...] You have to build it then slice the result reducing the length by the determined index count, returning ReadOnlySpan allowed this.

You can also slice a ReadOnlyMemory<T>.

This is no strong opinion however. Alternatively we can clarify the ownership model in the docs of BuildPalette() without changing the API.

You can also slice a ReadOnlyMemory.

Why oh why has that never occurred to me before?

Quick question....

Would you consider is ok API wise for the QuantizedFrame to reference the FrameQuantizer palette rather than copying it? Currently I pass the span, copy it and use it for the lifetime of the frame.

Scratch that. It leads to messy code in the Gif Encoder.

JimBobSquarePants · 2020-03-05T06:47:12Z

@antonfirsov I gave it another pass and cleaned it up quite a lot.

Still not happy with perf and allocations but it's incrementally improving.

antonfirsov

Looks good now! Just let's be more careful with stackalloc maybe.

antonfirsov · 2020-03-06T01:32:02Z

src/ImageSharp/Formats/Png/PngEncoderCore.cs

+            // Bulk convert our palette to RGBA to allow assignment to tables.
+            // Palette length maxes out at 256 so safe to stackalloc.
+            Span<Rgba32> rgbaPaletteSpan = stackalloc Rgba32[paletteLength];


In theory, yes, but I can't spot any safety mechanism to enforce this. We should at least Guard against it, stackallocking a non-const buffer is super unsafe.

Another option is to stackalloc when paletteLength < 64 and create an array otherwise. We should not stackalloc 256 * sizeof(Rgba32) == 1KB data anyways.

Also small sidenote, keep in mind that stackalloc zeroes out memory right now, plus the current implementation is quite inefficient since it's basically just a busy loop with a series of push 0x0, plus additional code to handle stack overflows etc.

Eg. doing stackalloc int[256] produces (among other things) this loop:

L001c: mov eax, 0x40 L0021: push 0x0 L0023: push 0x0 L0025: dec rax L0028: jnz L0021

Check out the complete sample code here, it actually doesn't look that great right now: https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIGYACMhgYQYG8aHunHjykDAJYA7DAwCyACgCUXHp2o9lDAMoAHbCIA8ojAD4GuTSIYBeIxmxgA1tgA29iGGFiA2qQCsSALoBueRUGQJViAHYjEzcABn8QgF8aeKA==

You might want to consider just renting an array (which also skips the memory clear), especially if using stackalloc could introduce other security/safety concerns.

There's a proposal for a [SkipLocalsInit] attribute for C# 9.0 I think, but it's still under prototyping. And the zeroing optimization Ben Adams is doing is still an open PR possibly coming with .NET 5.

Palette size is limited in a couple of places...

In the setter for QuantizerOptions.

In the constructor for IndexedImageFrame.

I can use the constant again and slice though if you want...

I thought 1Kb was the general rule for a max amount, should we be more conservative?

From the linked blog post.

I won’t prescribe anything specific, but anything larger than a kilobyte is a point of concern.

Ok, I see now that IndexedImageFrame constructor should prevent anything going wrong.

1Kb is the maximum, but probably not an optimum, or at least not always. But it doesn't really matter in WritePaletteChunk, it's not a hot path, just converting & copying a short slice of data.

I think I'll just revert stackalloc for now looking at the codegen.

I knew that would've scared you 🤣

You can always make things even more scary with the Legacy JIT 😆

I've run that legacy asm through a decompiler, I got this:

public static int M() { Span<int> span = stackalloc int[256]; CalculateMassOfTheSun(); return span[0]; }

antonfirsov · 2020-03-06T01:34:35Z

src/ImageSharp/Processing/Processors/Quantization/WuFrameQuantizer{TPixel}.cs

        private void BuildCube()
        {
-            Span<double> vv = stackalloc double[this.colors];
+            Span<double> vv = stackalloc double[this.maxColors];


See previous comment.

Ah yes, that's too big! 2048!!

antonfirsov · 2020-03-09T02:05:14Z

We should probably merge this as-is, and continue further experiments from here. It's likely that we can introduce all necessary improvements without breaking the current API.

antonfirsov and others added 9 commits January 26, 2020 22:16

use only netcoreapp3.1

60008fc

Simplify API

c69eb2b

Merge branch 'master' into js/dither-quantize-updates

e1d662a

Update IPaletteDitherImageProcessor{TPixel}.cs

9d73689

Dump progress so far

d113c78

Merge branch 'master' into js/dither-quantize-updates

cc02140

Better performance

8952954

Simplify, fix color mapping, and refactor for performance.

63f277b

Cleanup and update references.

d9596ef

JimBobSquarePants added API area:performance labels Mar 3, 2020

JimBobSquarePants added this to the 1.0.0-rc1 milestone Mar 3, 2020

JimBobSquarePants requested review from a team and Sergio0694 March 3, 2020 13:44

antonfirsov added 2 commits March 3, 2020 23:45

Merge branch 'af/fast-dev-hack' into js/dither-quantize-updates

8732717

extend EncodeGif benchmark

590777c

Revert "Merge branch 'af/fast-dev-hack' into js/dither-quantize-updates"

f0d55ca

This reverts commit 8732717, reversing changes made to d9596ef.

antonfirsov reviewed Mar 4, 2020

View reviewed changes

JimBobSquarePants added 5 commits March 5, 2020 10:32

Merge branch 'master' into js/dither-quantize-updates

161b753

Clean up quantized frame API

1837f6a

Introduce palette property

89c5fe2

Faster png palette encoding.

01efb09

Faster Gif transparency lookup.

fb26d0e

Cleanup

5ca7408

antonfirsov approved these changes Mar 6, 2020

View reviewed changes

JimBobSquarePants added 2 commits March 6, 2020 21:54

Merge branch 'master' into js/dither-quantize-updates

834ebdf

Remove stackalloc

dd32cdf

Merge branch 'master' into js/dither-quantize-updates

71e5a6d

JimBobSquarePants merged commit fab07ab into master Mar 9, 2020

JimBobSquarePants deleted the js/dither-quantize-updates branch March 9, 2020 08:07

dependabot bot mentioned this pull request Sep 23, 2025

Bump the nuget group with 5 updates norschel/enterJSWebSecurity2025-Demo1#8

Open

dependabot bot mentioned this pull request Oct 19, 2025

Bump the nuget group with 1 update ewdlop/bepuphysics2Fork#1

Open

Uh oh!

Dither and Quantize API and Performance Improvements. #1138

Dither and Quantize API and Performance Improvements. #1138

Uh oh!

Conversation

JimBobSquarePants commented Mar 3, 2020

Prerequisites

Description

Uh oh!

codecov bot commented Mar 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

antonfirsov commented Mar 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antonfirsov commented Mar 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sergio0694 commented Mar 4, 2020

Uh oh!

antonfirsov commented Mar 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antonfirsov commented Mar 4, 2020

Uh oh!

JimBobSquarePants commented Mar 4, 2020

Uh oh!

antonfirsov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antonfirsov Mar 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antonfirsov Mar 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JimBobSquarePants commented Mar 5, 2020

Uh oh!

antonfirsov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antonfirsov commented Mar 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

codecov bot commented Mar 3, 2020 •

edited

Loading

antonfirsov commented Mar 4, 2020 •

edited

Loading

antonfirsov commented Mar 4, 2020 •

edited

Loading

antonfirsov commented Mar 4, 2020 •

edited

Loading

antonfirsov Mar 4, 2020 •

edited

Loading

antonfirsov Mar 4, 2020 •

edited

Loading

antonfirsov commented Mar 9, 2020 •

edited

Loading