Conversation
Small speed boost in PNG filtering, 2kB reduction in LodePNG code size using Clang 15.0.0
Very small speed boost, 272 byte reduction in LodePNG code size using Clang 15.0.0
Small speed boost, 1.5kB reduction in LodePNG code size using Clang 15.0.0
10% speedup using --allfilters, 4.2kB reduction in LodePNG code size using Clang 15.0.0
Small speed boost, 3.3kB reduction in LodePNG code size using Clang 15.0.0
| else { /*check if image is white only if no error is detected in previous function*/ | ||
| unsigned char r = 0, g = 0, b = 0, a = 0; | ||
| getPixelColorRGBA8(&r, &g, &b, &a, image, 0, &state->info_raw); | ||
| stats.white = stats.numcolors == 1 && stats.colored == 0 && r == 255 && w > 20 && h > 20 |
There was a problem hiding this comment.
This reduces the number of variables passed to lodepng_compute_color_stats by one, this change also gets around calculating numpixels twice
src/lodepng/lodepng.cpp
Outdated
| for(i = bytewidth; i < length; ++i) out[i] = scanline[i] - scanline[i - bytewidth]; | ||
| case 1: { /*Sub*/ | ||
| size_t j = 0; | ||
| memcpy(out, scanline, bytewidth); |
There was a problem hiding this comment.
Since bytewidth will be at most 8, memcpy shouldn't make a difference here
| #if defined(LODEPNG_COMPILE_DECODER) || defined(LODEPNG_COMPILE_PNG) | ||
| static unsigned lodepng_read32bitInt(const unsigned char* buffer) { | ||
| return (((unsigned)buffer[0] << 24u) | ((unsigned)buffer[1] << 16u) | | ||
| ((unsigned)buffer[2] << 8u) | (unsigned)buffer[3]); |
There was a problem hiding this comment.
This (and a few other lines) are how it is formatted right now in mainline lodepng, so I will leave it as so that this lodepng does not differ more from it just based on formatting.
src/lodepng/lodepng.cpp
Outdated
| short pa = abs(b - c); | ||
| short pb = abs(a - c); | ||
| short pc = abs(a + b - c - c); | ||
| short pa = LODEPNG_ABS(b - c); |
There was a problem hiding this comment.
AFAIK abs is sometimes faster than doing a direct comparison depending on how smart the compiler is, so I will leave it as-is.
|
|
||
| static unsigned getNumColorChannels(LodePNGColorType colortype) { | ||
| switch(colortype) { | ||
| case LCT_GREY: return 1; |
There was a problem hiding this comment.
Same here, going to keep lodepng's formatting.
|
Most other changes look fine, I'll try to go through them in the coming days, might take longer though due to other obligations |
lodepng_compute_color_stats doesn't return any errors as is, so the function type is changed to void to reflect this behavior
--allfilters is ~1.5% faster from previous commit, --pal_sort=120 is ~2.5% faster from previous commit, LodePNG code size is reduced by 3kB using Clang 15.0.0
~2% faster decoding of RGB images, LodePNG code size decreases by 0.75kB using Clang 15.0.0
| *r = *g = *b = in[i]; | ||
| if(mode->key_defined && *r == mode->key_r) *a = 0; | ||
| else *a = 255; | ||
| *a = 255 * !(mode->key_defined && *r == mode->key_r); |
There was a problem hiding this comment.
Some of these changes are not a good fit for ECT as I'm trying to keep the differences with mainline lodepng small. Since lodepng is not responsible for much of ECT’s runtime for most use cases, small improvements to lodepng code size or run time will not make much of a difference while making it harder to maintain.
In particular, the changes where you introduce multiplication may be more difficult to read/understand and adding const statements should not really affect code generation, so I will keep the upstream version.
If you feel like a change makes a meaningful difference, you can also try to submit a pull request to mainline lodepng.
I will review the changes to the ECT-specific code (e.g. the changes to filtering) later, although I’m currently swamped with school work so it might be a while.
If multiple PNGs are being processed, returning errors for bad PNG files only would be better than exiting the program entirely
~0.25s faster PNG encoding using Clang 15.0.0
|
Worked on this a while ago but only recently managed to clean it up given the large scale of changes within one pull request: With the latest commit, many of the changes are integrated. This includes all changes I find useful outside of small cosmetic changes and changes to filter() and optimize_palette(). |
Refactor and optimize existing LodePNG functions
These changes decrease the size of the compiled LodePNG code and speed up optimizing PNGs using the --allfilters switch
On MSYS2 MINGW64 gcc 12.2.0, LodePNG code is
On MSYS2 CLANG64 clang 15.0.0, LodePNG code is
There's a ~0.5% speed increase for the --allfilters-b switch, otherwise there is no measurable speed difference optimizing PNGs not using --allfilters or --allfilters-b