Skip to content

c-blosc 1.21.6: BloscLZ shuffle/bitshuffle tests segfault with GCC 15.2.0 auto-vectorization on some CPU targets #402

@pavelToman

Description

@pavelToman

Summary

I am seeing reproducible segmentation faults in c-blosc 1.21.6 when building with GCC 15.2.0 and GCC auto-vectorization enabled. The failures are in the BloscLZ + shuffle/bitshuffle paths.

The problem disappears when building with -fno-tree-vectorize.

This appears to be CPU-target/codegen-dependent. In my environment, the failures occur on some CPU targets, while other CPU targets pass the full test suite without disabling vectorization.

Environment

Software:

  • c-blosc: 1.21.6
  • Compiler: GCC 15.2.0
  • OS: RHEL9
  • Build system: CMake
  • Build wrapper/environment: EasyBuild, but the failure is reproducible with a small standalone C program linked against the built libblosc.so

Relevant C flags in the failing build:

-O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -std=gnu99

c-blosc/CMake also appends its own release flags, so compile lines include something like:

-O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -std=gnu99 -O3 -DNDEBUG ... -msse2

Workaround flags that avoid the failure:

-fno-tree-vectorize

In EasyBuild terms, this corresponds to:

toolchainopts = {
    'pic': True,
    'cstd': 'gnu99',
    'vectorize': False,
}

CPU targets tested

After removing any pre-existing installation directories and rebuilding cleanly before each test, I observed the following pattern:

Affected: tests fail unless auto-vectorization is disabled

  • Intel Xeon Gold 6240, Cascade Lake
  • Intel Xeon Gold 6242, Cascade Lake
  • AMD EPYC 9654, Zen 4 / Genoa

Unaffected in my testing: tests pass even with auto-vectorization enabled

  • AMD EPYC 7552, Rome
  • Intel Alder Lake system reported by a colleague

So this does not seem to be a universal GCC 15.2.0 failure, but rather CPU-target/codegen-dependent.

For the Cascade Lake systems, -march=native resolves to:

-march=cascadelake
-mtune=cascadelake

Example CPU details for an affected Intel node:

Model name: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
CPU family: 6
Model: 85
Stepping: 7
Microcode: 0x5003901

The relevant CPU flags include:

avx avx2 avx512f avx512dq avx512cd avx512bw avx512vl avx512_vnni

Failing tests

With auto-vectorization enabled, the full c-blosc test suite reports failures such as:

The following tests FAILED:
          1 - test_api (SEGFAULT)
        672 - test_noinit (SEGFAULT)
        673 - test_nolock (SEGFAULT)
        674 - test_nthreads (SEGFAULT)
       1631 - fuzz_compress (SEGFAULT)
       1633 - test_blosclz_shuffle_1 (SEGFAULT)
       1635 - test_blosclz_shuffle_n (SEGFAULT)
       1640 - test_blosclz_bitshuffle_1 (SEGFAULT)
       1642 - test_blosclz_bitshuffle_n (SEGFAULT)

The most consistent minimal failing area is:

BloscLZ + shuffle/bitshuffle

For example, this upstream benchmark command segfaults on affected builds:

cd build/bench
./bench blosclz shuffle test 1

Output ends with:

Blosc version: 1.21.6 ($Date:: 2024-06-24 #$)
List of supported compressors in this build: blosclz,lz4,lz4hc,zlib,zstd
Supported compression libraries:
  BloscLZ: 2.5.1
  LZ4: 1.9.4
  Zlib: 1.3.1
  Zstd: 1.5.6
Using compressor: blosclz
Using shuffle type: shuffle
Running suite: test
--> 1, 4194304, 8, 19, blosclz, shuffle
********************** Run info ******************************
Blosc version: 1.21.6 ($Date:: 2024-06-24 #$)
Using synthetic data with 19 significant bits (out of 32)
Dataset size: 4194304 bytes     Type size: 8 bytes
Working set: 128.0 MB           Number of threads: 1
********************** Running benchmarks *********************
memcpy(write):            923.2 us, 4332.8 MB/s
memcpy(read):             454.8 us, 8795.1 MB/s
Compression level: 0
comp(write):      699.5 us, 5718.7 MB/s   Final bytes: 4194320  Ratio: 1.00
decomp(read):     514.8 us, 7770.3 MB/s   OK
Compression level: 1
Segmentation fault (core dumped)

Minimal C reproducer

The following standalone reproducer segfaults on affected builds when linked against the affected libblosc.so.

It uses:

compressor = blosclz
shuffle    = BLOSC_SHUFFLE
clevel     = 1
typesize   = 2
nbytes     = 4194304
nthreads   = 1
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <blosc.h>

int main(void) {
    const size_t nbytes = 4194304;
    const size_t typesize = 2;

    uint8_t *src = malloc(nbytes);
    uint8_t *compressed = malloc(nbytes + BLOSC_MAX_OVERHEAD);
    uint8_t *dest = malloc(nbytes);

    if (!src || !compressed || !dest) {
        fprintf(stderr, "malloc failed\n");
        return 2;
    }

    for (size_t i = 0; i < nbytes / sizeof(uint16_t); i++) {
        ((uint16_t *)src)[i] = (uint16_t)(i & 0xffff);
    }

    blosc_init();

    printf("compressing: blosclz + shuffle, clevel=1, typesize=2, nbytes=4194304\n");

    int cbytes = blosc_compress_ctx(
        1,                  /* clevel */
        BLOSC_SHUFFLE,      /* shuffle */
        typesize,
        nbytes,
        src,
        compressed,
        nbytes + BLOSC_MAX_OVERHEAD,
        "blosclz",
        0,                  /* blocksize */
        1                   /* nthreads */
    );

    printf("cbytes = %d\n", cbytes);

    if (cbytes <= 0) {
        fprintf(stderr, "compression failed: %d\n", cbytes);
        return 3;
    }

    int dbytes = blosc_decompress(compressed, dest, nbytes);
    printf("dbytes = %d\n", dbytes);

    if (dbytes != (int)nbytes) {
        fprintf(stderr, "decompression failed: %d\n", dbytes);
        return 4;
    }

    if (memcmp(src, dest, nbytes) != 0) {
        fprintf(stderr, "roundtrip mismatch\n");
        return 5;
    }

    blosc_destroy();

    printf("OK\n");
    return 0;
}

Example build command:

gcc blosc_bench_like_t2.c \
    -I/path/to/c-blosc-1.21.6/blosc \
    -L/path/to/build/blosc \
    -Wl,-rpath,/path/to/build/blosc \
    -lblosc \
    -o blosc_bench_like_t2

On an affected build, the output is:

compressing: blosclz + shuffle, clevel=1, typesize=2, nbytes=4194304
Segmentation fault (core dumped)

The same reproducer passes when the library is built with -fno-tree-vectorize:

compressing: blosclz + shuffle, clevel=1, typesize=2, nbytes=4194304
cbytes = 75792
dbytes = 4194304
OK

Additional narrowing

The issue is sensitive to typesize.

On an affected build:

blosclz + shuffle, clevel=1, nbytes=4194304, nthreads=1

typesize=1  -> passes
typesize=2  -> segfaults
typesize=4  -> segfaults
typesize=8  -> segfaults

The bitshuffle path also fails:

blosclz + bitshuffle, clevel=1, typesize=2, nbytes=4194304, nthreads=1 -> segfaults

Other compressor/filter combinations are not affected in the same way in my tests. For example:

blosclz + noshuffle -> passes
lz4 + shuffle       -> passes

Workaround

Building with auto-vectorization disabled avoids the issue:

-fno-tree-vectorize

In EasyBuild this is:

toolchainopts = {
    'pic': True,
    'cstd': 'gnu99',
    'vectorize': False,
}

The cstd='gnu99' part is mainly to avoid GCC 15's default C23/GNU23 mode. There is already a separate GCC 15/C23 compile-side issue around bool, but this report is about a runtime segfault that remains after forcing GNU99.

Expected behavior

The BloscLZ + shuffle/bitshuffle tests should not segfault when built with GCC 15.2.0 and auto-vectorization enabled.

Actual behavior

On affected CPU targets/builds, the BloscLZ + shuffle/bitshuffle paths segfault with GCC 15.2.0 auto-vectorization enabled. The failures disappear with -fno-tree-vectorize.

This could be either:

  1. a GCC 15.2.0 miscompilation, or
  2. undefined behavior / aliasing / alignment / bounds issue in c-blosc that GCC 15.2.0 auto-vectorization exposes.

I am reporting this here first because the failure is reproducible through c-blosc’s own tests and a small c-blosc API reproducer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions