Summary
I am seeing reproducible segmentation faults in c-blosc 1.21.6 when building with GCC 15.2.0 and GCC auto-vectorization enabled. The failures are in the BloscLZ + shuffle/bitshuffle paths.
The problem disappears when building with -fno-tree-vectorize.
This appears to be CPU-target/codegen-dependent. In my environment, the failures occur on some CPU targets, while other CPU targets pass the full test suite without disabling vectorization.
Environment
Software:
- c-blosc: 1.21.6
- Compiler: GCC 15.2.0
- OS: RHEL9
- Build system: CMake
- Build wrapper/environment: EasyBuild, but the failure is reproducible with a small standalone C program linked against the built
libblosc.so
Relevant C flags in the failing build:
-O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -std=gnu99
c-blosc/CMake also appends its own release flags, so compile lines include something like:
-O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -std=gnu99 -O3 -DNDEBUG ... -msse2
Workaround flags that avoid the failure:
In EasyBuild terms, this corresponds to:
toolchainopts = {
'pic': True,
'cstd': 'gnu99',
'vectorize': False,
}
CPU targets tested
After removing any pre-existing installation directories and rebuilding cleanly before each test, I observed the following pattern:
Affected: tests fail unless auto-vectorization is disabled
- Intel Xeon Gold 6240, Cascade Lake
- Intel Xeon Gold 6242, Cascade Lake
- AMD EPYC 9654, Zen 4 / Genoa
Unaffected in my testing: tests pass even with auto-vectorization enabled
- AMD EPYC 7552, Rome
- Intel Alder Lake system reported by a colleague
So this does not seem to be a universal GCC 15.2.0 failure, but rather CPU-target/codegen-dependent.
For the Cascade Lake systems, -march=native resolves to:
-march=cascadelake
-mtune=cascadelake
Example CPU details for an affected Intel node:
Model name: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
CPU family: 6
Model: 85
Stepping: 7
Microcode: 0x5003901
The relevant CPU flags include:
avx avx2 avx512f avx512dq avx512cd avx512bw avx512vl avx512_vnni
Failing tests
With auto-vectorization enabled, the full c-blosc test suite reports failures such as:
The following tests FAILED:
1 - test_api (SEGFAULT)
672 - test_noinit (SEGFAULT)
673 - test_nolock (SEGFAULT)
674 - test_nthreads (SEGFAULT)
1631 - fuzz_compress (SEGFAULT)
1633 - test_blosclz_shuffle_1 (SEGFAULT)
1635 - test_blosclz_shuffle_n (SEGFAULT)
1640 - test_blosclz_bitshuffle_1 (SEGFAULT)
1642 - test_blosclz_bitshuffle_n (SEGFAULT)
The most consistent minimal failing area is:
BloscLZ + shuffle/bitshuffle
For example, this upstream benchmark command segfaults on affected builds:
cd build/bench
./bench blosclz shuffle test 1
Output ends with:
Blosc version: 1.21.6 ($Date:: 2024-06-24 #$)
List of supported compressors in this build: blosclz,lz4,lz4hc,zlib,zstd
Supported compression libraries:
BloscLZ: 2.5.1
LZ4: 1.9.4
Zlib: 1.3.1
Zstd: 1.5.6
Using compressor: blosclz
Using shuffle type: shuffle
Running suite: test
--> 1, 4194304, 8, 19, blosclz, shuffle
********************** Run info ******************************
Blosc version: 1.21.6 ($Date:: 2024-06-24 #$)
Using synthetic data with 19 significant bits (out of 32)
Dataset size: 4194304 bytes Type size: 8 bytes
Working set: 128.0 MB Number of threads: 1
********************** Running benchmarks *********************
memcpy(write): 923.2 us, 4332.8 MB/s
memcpy(read): 454.8 us, 8795.1 MB/s
Compression level: 0
comp(write): 699.5 us, 5718.7 MB/s Final bytes: 4194320 Ratio: 1.00
decomp(read): 514.8 us, 7770.3 MB/s OK
Compression level: 1
Segmentation fault (core dumped)
Minimal C reproducer
The following standalone reproducer segfaults on affected builds when linked against the affected libblosc.so.
It uses:
compressor = blosclz
shuffle = BLOSC_SHUFFLE
clevel = 1
typesize = 2
nbytes = 4194304
nthreads = 1
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <blosc.h>
int main(void) {
const size_t nbytes = 4194304;
const size_t typesize = 2;
uint8_t *src = malloc(nbytes);
uint8_t *compressed = malloc(nbytes + BLOSC_MAX_OVERHEAD);
uint8_t *dest = malloc(nbytes);
if (!src || !compressed || !dest) {
fprintf(stderr, "malloc failed\n");
return 2;
}
for (size_t i = 0; i < nbytes / sizeof(uint16_t); i++) {
((uint16_t *)src)[i] = (uint16_t)(i & 0xffff);
}
blosc_init();
printf("compressing: blosclz + shuffle, clevel=1, typesize=2, nbytes=4194304\n");
int cbytes = blosc_compress_ctx(
1, /* clevel */
BLOSC_SHUFFLE, /* shuffle */
typesize,
nbytes,
src,
compressed,
nbytes + BLOSC_MAX_OVERHEAD,
"blosclz",
0, /* blocksize */
1 /* nthreads */
);
printf("cbytes = %d\n", cbytes);
if (cbytes <= 0) {
fprintf(stderr, "compression failed: %d\n", cbytes);
return 3;
}
int dbytes = blosc_decompress(compressed, dest, nbytes);
printf("dbytes = %d\n", dbytes);
if (dbytes != (int)nbytes) {
fprintf(stderr, "decompression failed: %d\n", dbytes);
return 4;
}
if (memcmp(src, dest, nbytes) != 0) {
fprintf(stderr, "roundtrip mismatch\n");
return 5;
}
blosc_destroy();
printf("OK\n");
return 0;
}
Example build command:
gcc blosc_bench_like_t2.c \
-I/path/to/c-blosc-1.21.6/blosc \
-L/path/to/build/blosc \
-Wl,-rpath,/path/to/build/blosc \
-lblosc \
-o blosc_bench_like_t2
On an affected build, the output is:
compressing: blosclz + shuffle, clevel=1, typesize=2, nbytes=4194304
Segmentation fault (core dumped)
The same reproducer passes when the library is built with -fno-tree-vectorize:
compressing: blosclz + shuffle, clevel=1, typesize=2, nbytes=4194304
cbytes = 75792
dbytes = 4194304
OK
Additional narrowing
The issue is sensitive to typesize.
On an affected build:
blosclz + shuffle, clevel=1, nbytes=4194304, nthreads=1
typesize=1 -> passes
typesize=2 -> segfaults
typesize=4 -> segfaults
typesize=8 -> segfaults
The bitshuffle path also fails:
blosclz + bitshuffle, clevel=1, typesize=2, nbytes=4194304, nthreads=1 -> segfaults
Other compressor/filter combinations are not affected in the same way in my tests. For example:
blosclz + noshuffle -> passes
lz4 + shuffle -> passes
Workaround
Building with auto-vectorization disabled avoids the issue:
In EasyBuild this is:
toolchainopts = {
'pic': True,
'cstd': 'gnu99',
'vectorize': False,
}
The cstd='gnu99' part is mainly to avoid GCC 15's default C23/GNU23 mode. There is already a separate GCC 15/C23 compile-side issue around bool, but this report is about a runtime segfault that remains after forcing GNU99.
Expected behavior
The BloscLZ + shuffle/bitshuffle tests should not segfault when built with GCC 15.2.0 and auto-vectorization enabled.
Actual behavior
On affected CPU targets/builds, the BloscLZ + shuffle/bitshuffle paths segfault with GCC 15.2.0 auto-vectorization enabled. The failures disappear with -fno-tree-vectorize.
This could be either:
- a GCC 15.2.0 miscompilation, or
- undefined behavior / aliasing / alignment / bounds issue in c-blosc that GCC 15.2.0 auto-vectorization exposes.
I am reporting this here first because the failure is reproducible through c-blosc’s own tests and a small c-blosc API reproducer.
Summary
I am seeing reproducible segmentation faults in c-blosc 1.21.6 when building with GCC 15.2.0 and GCC auto-vectorization enabled. The failures are in the BloscLZ + shuffle/bitshuffle paths.
The problem disappears when building with
-fno-tree-vectorize.This appears to be CPU-target/codegen-dependent. In my environment, the failures occur on some CPU targets, while other CPU targets pass the full test suite without disabling vectorization.
Environment
Software:
libblosc.soRelevant C flags in the failing build:
c-blosc/CMake also appends its own release flags, so compile lines include something like:
Workaround flags that avoid the failure:
In EasyBuild terms, this corresponds to:
CPU targets tested
After removing any pre-existing installation directories and rebuilding cleanly before each test, I observed the following pattern:
Affected: tests fail unless auto-vectorization is disabled
Unaffected in my testing: tests pass even with auto-vectorization enabled
So this does not seem to be a universal GCC 15.2.0 failure, but rather CPU-target/codegen-dependent.
For the Cascade Lake systems,
-march=nativeresolves to:Example CPU details for an affected Intel node:
The relevant CPU flags include:
Failing tests
With auto-vectorization enabled, the full c-blosc test suite reports failures such as:
The most consistent minimal failing area is:
For example, this upstream benchmark command segfaults on affected builds:
Output ends with:
Minimal C reproducer
The following standalone reproducer segfaults on affected builds when linked against the affected
libblosc.so.It uses:
Example build command:
gcc blosc_bench_like_t2.c \ -I/path/to/c-blosc-1.21.6/blosc \ -L/path/to/build/blosc \ -Wl,-rpath,/path/to/build/blosc \ -lblosc \ -o blosc_bench_like_t2On an affected build, the output is:
The same reproducer passes when the library is built with
-fno-tree-vectorize:Additional narrowing
The issue is sensitive to
typesize.On an affected build:
The bitshuffle path also fails:
Other compressor/filter combinations are not affected in the same way in my tests. For example:
Workaround
Building with auto-vectorization disabled avoids the issue:
In EasyBuild this is:
The
cstd='gnu99'part is mainly to avoid GCC 15's default C23/GNU23 mode. There is already a separate GCC 15/C23 compile-side issue aroundbool, but this report is about a runtime segfault that remains after forcing GNU99.Expected behavior
The BloscLZ + shuffle/bitshuffle tests should not segfault when built with GCC 15.2.0 and auto-vectorization enabled.
Actual behavior
On affected CPU targets/builds, the BloscLZ + shuffle/bitshuffle paths segfault with GCC 15.2.0 auto-vectorization enabled. The failures disappear with
-fno-tree-vectorize.This could be either:
I am reporting this here first because the failure is reproducible through c-blosc’s own tests and a small c-blosc API reproducer.