Skip to content

Conversation

@Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Apr 25, 2025

(created using eb --new-pr)

Fixes #22764

I tested that the CMake config files indeed reference the static libs instead of the shared libs so they'll be used with CMakes find_package(Abseil)

We cannot remove the shared libraries as that would break existing modules linked against the shared libraries.
Changing all EasyConfigs at once so that if anyone updates any of them to a new version the information will be present for him and on CI. The intention is that new Abseil Easyconfigs should only use the static variant. See the issue for motivation

After rebuilding protobuf I don't see any runtime dependencies on Abseil for neither protobuf nor PyTorch

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in total)
login1.barnard.hpc.tu-dresden.de - Linux RHEL 8.9 (Ootpa), x86_64, Intel(R) Xeon(R) Platinum 8470 (icelake), Python 3.8.17
See https://gist.github.com/Flamefire/316a8ec9bffece10dd466d2bd7d2a539 for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in total)
i7025 - Linux Rocky Linux 8.9 (Green Obsidian), x86_64, AMD EPYC 7702 64-Core Processor (zen2), Python 3.8.17
See https://gist.github.com/Flamefire/ccea672551069f140a55f83aa2566942 for a full test report.

@lexming
Copy link
Contributor

lexming commented Apr 25, 2025

@boegelbot please test @ jsc-zen3

@boegelbot
Copy link
Collaborator

@lexming: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=22805 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_22805 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 6241

Test results coming soon (I hope)...

Details

- notification for comment with ID 2830488500 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/a1d6a9ff75c029d0e032c40f5ef7fbc1 for a full test report.

Copy link
Contributor

@lexming lexming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lexming
Copy link
Contributor

lexming commented Apr 25, 2025

Merging, thanks @Flamefire !

@lexming lexming merged commit 7a16ce8 into easybuilders:develop Apr 25, 2025
8 checks passed
@lexming lexming added this to the 5.0.1 milestone Apr 25, 2025
@Flamefire Flamefire deleted the 20250425104115_new_pr_Abseil202103242 branch April 25, 2025 14:36
@stevenvdb
Copy link
Contributor

@Flamefire Did you by any chance test the impact on PyTorch-bundle? What I see is that PyTorch-bundle-2.1.2-foss-2023a-CUDA-12.1.1.eb builds fail when the underlying Abseil build includes static libraries. Importing torchtext gives missing symbol errors (for example _ZN4absl12lts_2023012519str_format_internal13FormatArgImpl8DispatchIlEEb) and indeed those symbols are missing from libprotobuf.so in protobuf/24.0-GCCcore-12.3.0 when Abseil includes static libraries, while they are included when Abseil only included dynamic libraries

@Flamefire
Copy link
Contributor Author

Can you add more information how to reproduce? I guess you rebuild Abseil, protobuf and PyTorch-bundle and the last one fails to build?
Does any library still link to dynamic abseil libs?

those symbols are missing from libprotobuf.so in protobuf/24.0-GCCcore-12.3.0 when Abseil includes static libraries

That is weird: How can the symbols only be there when using dynamic libraries? Then they should be in Abseil not protobuf shouldn't they?

Any hint what exactly uses that symbol? Could be an issue with torchtext specifically or a missing definition making the symbol private/hidden

@stevenvdb
Copy link
Contributor

At the moment I can't reproduce it anymore. I still see less symbols in libprotobuf.so when Abseil includes static libraries, but this does not seem to cause problems. Perhaps the problem originally occurred because I switched EasyBuild versions somewhere halfway through building the stack.

@Flamefire
Copy link
Contributor Author

I still see less symbols in libprotobuf.so when Abseil includes static libraries

Can you post those / a comparison so we can verify that the diff is OK and/or come back to this via search if it becomes an issue later.

@boegel boegel changed the title Build static libraries of Abseil also build static libraries of Abseil Aug 27, 2025
@stevenvdb
Copy link
Contributor

I still see less symbols in libprotobuf.so when Abseil includes static libraries

Can you post those / a comparison so we can verify that the diff is OK and/or come back to this via search if it becomes an issue later.

I'll attach the output of nm -D protobuf/24.0-GCCcore-12.3.0/lib64/libprotobuf.so for both cases. If you search for symbols containing str_format_internal, you can see that some of them are not in the dynamic symbol table when the Abseil dependency has static libraries.
libprotobuf_with_static_abseil.txt
libprotobuf_without_static_abseil.txt

@Flamefire
Copy link
Contributor Author

Flamefire commented Aug 29, 2025

Those are listed as e.g.

                  U _ZN4absl12lts_2023012519str_format_internal13FormatArgImpl8DispatchIjEEbNS2_4DataENS1_24FormatConversionSpecImplEPv

IIRC the "U" means that this is a reference to an external symbol to be provided by a shared library at runtime.

So this is expected as far as I can tell: Linking the static Abseil libraries resolves the symbols during linking.

What I can imagine:

  • protobuf links static abseil
  • software X builds against abseil as-if it was dynamic but doesn't link it, e.g. due to it being a static library linked into something else where the dependency is not propagated

when protobuf linked dynamic abseil the dependencies of X were resolved because libprotobuf pulled in libabseil at runtime. With protobuf not having a dependency on libabseil due to static linking the dependencies of X are not resolved anymore.

In any case this is a bug in the build process of X, so we need to find the exact component that breaks this.

@Flamefire
Copy link
Contributor Author

Flamefire commented Sep 9, 2025

Turns out it was how we patch torchtext to use our prebuild RE2. PR to fix it:

And it was indeed the case that torchtext ended up linking a library that depends on Abseil without also linking the (static) Abseil libs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Build static Abseil libs

5 participants