Skip to content

Optimize LodePNG#106

Open
woot000 wants to merge 10 commits intofhanau:masterfrom
woot000:master
Open

Optimize LodePNG#106
woot000 wants to merge 10 commits intofhanau:masterfrom
woot000:master

Conversation

@woot000
Copy link

@woot000 woot000 commented Sep 10, 2022

Refactor and optimize existing LodePNG functions

These changes decrease the size of the compiled LodePNG code and speed up optimizing PNGs using the --allfilters switch

On MSYS2 MINGW64 gcc 12.2.0, LodePNG code is

  • 10.62% smaller (170386 bytes -> 152288 bytes, 17.67kB smaller)
  • ~11.17% faster, using the command ect -4096 -strip --allfilters "neat image.png" (47.2513s -> 41.9741s)

On MSYS2 CLANG64 clang 15.0.0, LodePNG code is

  • 6.67% smaller (169356 bytes -> 158052 bytes, 11.04kB smaller)
  • ~11.83% faster, using the command ect -4096 -strip --allfilters "neat image.png" (42.8687s -> 37.7980s)

There's a ~0.5% speed increase for the --allfilters-b switch, otherwise there is no measurable speed difference optimizing PNGs not using --allfilters or --allfilters-b

Small speed boost in PNG filtering, 2kB reduction in LodePNG code size using Clang 15.0.0
Very small speed boost, 272 byte reduction in LodePNG code size using Clang 15.0.0
Small speed boost, 1.5kB reduction in LodePNG code size using Clang 15.0.0
10% speedup using --allfilters, 4.2kB reduction in LodePNG code size using Clang 15.0.0
Small speed boost, 3.3kB reduction in LodePNG code size using Clang 15.0.0
else { /*check if image is white only if no error is detected in previous function*/
unsigned char r = 0, g = 0, b = 0, a = 0;
getPixelColorRGBA8(&r, &g, &b, &a, image, 0, &state->info_raw);
stats.white = stats.numcolors == 1 && stats.colored == 0 && r == 255 && w > 20 && h > 20
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this get moved?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reduces the number of variables passed to lodepng_compute_color_stats by one, this change also gets around calculating numpixels twice

for(i = bytewidth; i < length; ++i) out[i] = scanline[i] - scanline[i - bytewidth];
case 1: { /*Sub*/
size_t j = 0;
memcpy(out, scanline, bytewidth);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since bytewidth will be at most 8, memcpy shouldn't make a difference here

#if defined(LODEPNG_COMPILE_DECODER) || defined(LODEPNG_COMPILE_PNG)
static unsigned lodepng_read32bitInt(const unsigned char* buffer) {
return (((unsigned)buffer[0] << 24u) | ((unsigned)buffer[1] << 16u) |
((unsigned)buffer[2] << 8u) | (unsigned)buffer[3]);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and a few other lines) are how it is formatted right now in mainline lodepng, so I will leave it as so that this lodepng does not differ more from it just based on formatting.

short pa = abs(b - c);
short pb = abs(a - c);
short pc = abs(a + b - c - c);
short pa = LODEPNG_ABS(b - c);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK abs is sometimes faster than doing a direct comparison depending on how smart the compiler is, so I will leave it as-is.


static unsigned getNumColorChannels(LodePNGColorType colortype) {
switch(colortype) {
case LCT_GREY: return 1;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, going to keep lodepng's formatting.

@fhanau
Copy link
Owner

fhanau commented Sep 11, 2022

Most other changes look fine, I'll try to go through them in the coming days, might take longer though due to other obligations

lodepng_compute_color_stats doesn't return any errors as is, so the function type is changed to void to reflect this behavior
--allfilters is ~1.5% faster from previous commit, --pal_sort=120 is ~2.5% faster from previous commit, LodePNG code size is reduced by 3kB using Clang 15.0.0
~2% faster decoding of RGB images, LodePNG code size decreases by 0.75kB using Clang 15.0.0
*r = *g = *b = in[i];
if(mode->key_defined && *r == mode->key_r) *a = 0;
else *a = 255;
*a = 255 * !(mode->key_defined && *r == mode->key_r);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these changes are not a good fit for ECT as I'm trying to keep the differences with mainline lodepng small. Since lodepng is not responsible for much of ECT’s runtime for most use cases, small improvements to lodepng code size or run time will not make much of a difference while making it harder to maintain.
In particular, the changes where you introduce multiplication may be more difficult to read/understand and adding const statements should not really affect code generation, so I will keep the upstream version. 
If you feel like a change makes a meaningful difference, you can also try to submit a pull request to mainline lodepng. 
I will review the changes to the ECT-specific code (e.g. the changes to filtering) later, although I’m currently swamped with school work so it might be a while.

If multiple PNGs are being processed, returning errors for bad PNG files only would be better than exiting the program entirely
~0.25s faster PNG encoding using Clang 15.0.0
@fhanau
Copy link
Owner

fhanau commented Dec 16, 2022

Worked on this a while ago but only recently managed to clean it up given the large scale of changes within one pull request: With the latest commit, many of the changes are integrated. This includes all changes I find useful outside of small cosmetic changes and changes to filter() and optimize_palette().
While there are several changes in these functions that may be useful, reviewing them would be lengthy and complex work while only promising to gain small code size improvements and a tiny speedup in some scenarios. For now, I will leave this pull request open and might integrate filter/palette changes later on, but it is not a priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants