Skip to content

versioned-hdf5 Linux wheels #25

@crusaderky

Description

@crusaderky

versioned-hdf5 is a Cython module on top of h5py that also directly links to libhdf5 for performance reasons.

This is not problematic when compiling from sources, nor for conda binaries, where both h5py and versioned-hdf5 link against the hdf5 conda package.
It becomes a major pain point however to release Linux wheels, as h5py already embeds libhdf5.so and you need to have a single version of it in memory at one time.

versioned-hdf5 currently solves this by matching the cibuildwheels image from h5py:

manylinux-x86_64-image = "ghcr.io/h5py/manylinux_2_28_x86_64-hdf5"
manylinux-aarch64-image = "ghcr.io/h5py/manylinux_2_28_aarch64-hdf5"

versioned-hdf5 CI just broke, and will remain broken (i.e. we cannot produce new releases that include Linux wheels) until the next h5py release. The images above just got overwritten with a different version of libhdf5, but the latest published h5py release still links to and ships the previous image.

versioned-hdf5 is now built against hdf5 2.0 (as shipped by the new images), and then links at runtime against 1.4 (as shipped by h5py-3.15.0 and 3.15.1), which causes it to crash.

Solution 1

Don't ship versioned-hdf5 Linux wheels and tell the users to either use conda or compile both h5py and versioned-hdf5 from sources. The latter can be very painful if you didn't install hdf5 with conda.

Solution 2

The h5py team could

  1. tag the images in a way that removes ambiguity on which libhdf5 binaries they contain,
    e.g. call them ghcr.io/h5py/manylinux_2_28_x86_64-hdf5-2_0_0-0 (where the final -0 is the hdf5-manylinux incremental build number, like in conda-forge recipes)
  2. preserve old images, so that one doesn't need to wait for the next h5py release to produce working downstream releases again,
  3. give a heads up to the versioned-hdf5 team whenever the next release of h5py will ship a different libhdf5.so. Typically you'd do that whenever you update h5py/pyproject.toml to point to the new images.

The versioned-hdf5 team would respond by releasing a new version of versioned-hdf5, with Linux wheels pinned to be >= the first h5py release with the new binaries and <= the latest released h5py version at the time of building the versioned-hdf5 wheels.

We already had to recently introduce a system where the linux wheels depend on a higher minimum version of h5py (https://github.com/deshaw/versioned-hdf5/blob/1c1814cb139f3c7cee07761cb6dc5d7779590dd8/.github/workflows/wheels.yml#L42-L54) due to your recent change from manylinux2014 to manylinux_2_28.

Solution 3

versioned-hdf5 could avoid using hdf5-manylinux wheels. Instead, it could extract libhdf5.so from the latest released h5py wheels, find the libhdf5 version, and figure out how to get the right .h files from there. This is also painful.

Solution 4 (rejected)

versioned-hdf5 could vendor h5py as versioned_hdf5.h5py.
This is going to be problematic in real life, as most versioned-hdf5 users flexibly switch between versioned-hdf5 and h5py and it's very realistic that one will import h5py, versioned_hdf5 which will cause conflicting versions of libhdf5.so to be pulled in memory at the same time.

Solution 5 (rejected)

versioned-hdf5 wheels could include both h5py/ and the versioned_hdf5/ top level directories, both built by versioned-hdf5 CI from sources.
This would also not work as I am not aware of any way of telling pip that versioned-hdf5 wheel is mutually exclusive with h5py, so users would end up overwriting the h5py/ directory installed by versioned-hdf5 as soon as for whatever reason they pip install h5py.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions