Skip to content

Conversation

@trz42
Copy link
Collaborator

@trz42 trz42 commented Mar 29, 2025

This PR needs to be reviewed carefully.

It contains three changes:

  1. Adds the capability to pass through arguments to the launch of the container (new arg --pass-through for eessi_container.sh)
  2. Adds extra bind paths for $software_layer_dir and /dev when running eessi_container.sh. Also makes use of --pass-through to run the container with --contain. The latter is needed to prevent system level scripts from the host to be executed when the container is launched.
  3. Adds easyconfigs originally built with EB 4.9.2 with the tool chain foss/2023b (see {2023.06}[2023b,sapphirerapids] Add EB 4.9.2 easystack for 2023b #939). Then used [include-easyblocks-]from-{pr,commit} have been removed since all of them were included in EB 4.9.3

IMPORTANT The (ReFrame) test step is not functioning as expected. Even after some changes to the bot configuration (adding 'processor' information to reframe_config.py to work around non working CPU auto-detection) and using of --contain the test step seems to block/stall. Thus, for the time being, we skip the test step by using exportvariable:SKIP_TESTS=yes for bot build commands.

@trz42 trz42 added 2023.06-software.eessi.io 2023.06 version of software.eessi.io grace NVIDIA Grace CPU labels Mar 29, 2025
@eessi-bot
Copy link

eessi-bot bot commented Mar 29, 2025

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/sapphirerapids, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-software, eessi.io-2023.06-compat

@eessi-bot
Copy link

eessi-bot bot commented Mar 29, 2025

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software

@eessi-bot-trz42
Copy link

Instance trz42-GH200-jr is configured to build for:

  • architectures: aarch64/nvidia/grace
  • repositories: eessi.io-2023.06-software

@eessi-bot-toprichard
Copy link

Instance rt-Grace-jr is configured to build for:

  • architectures: aarch64/nvidia/grace
  • repositories: eessi.io-2023.06-software

@trz42
Copy link
Collaborator Author

trz42 commented Mar 29, 2025

Note the additional arg to skip running tests...
bot: build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace exportvariable:SKIP_TESTS=yes

@eessi-bot
Copy link

eessi-bot bot commented Mar 29, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • parsing the bot command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace exportvariable:SKIP_TESTS=yes, received from sender trz42, failed

@eessi-bot
Copy link

eessi-bot bot commented Mar 29, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • parsing the bot command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace exportvariable:SKIP_TESTS=yes, received from sender trz42, failed

@eessi-bot-trz42
Copy link

eessi-bot-trz42 bot commented Mar 29, 2025

Updates by the bot instance trz42-GH200-jr (click for details)
  • received bot command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace exportvariable:SKIP_TESTS=yes from trz42

    • expanded format: build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace exportvariable:SKIP_TESTS=yes
  • handling command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace exportvariable:SKIP_TESTS=yes resulted in:

@eessi-bot-toprichard
Copy link

Updates by the bot instance rt-Grace-jr (click for details)
  • account trz42 has NO permission to send commands to the bot

@eessi-bot-trz42
Copy link

eessi-bot-trz42 bot commented Mar 29, 2025

New job on instance trz42-GH200-jr for CPU micro-architecture aarch64-nvidia-grace for repository eessi.io-2023.06-software in job dir /p/project1/ceasybuilders/bot-trz42/jobs/2025.03/pr_989/13545294

date job status comment
Mar 29 08:30:26 UTC 2025 submitted job id 13545294 awaits release by job manager
Mar 29 08:30:39 UTC 2025 released job awaits launch by Slurm scheduler
Mar 29 08:31:42 UTC 2025 running job 13545294 is running
Mar 29 09:26:08 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-13545294.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-nvidia-grace-1743240191.tar.gzsize: 77 MiB (81004850 bytes)
entries: 9027
modules under 2023.06/software/linux/aarch64/nvidia/grace/modules/all
Extrae/4.2.0-gompi-2023b.lua
GMP/6.3.0-GCCcore-13.2.0.lua
IPython/8.17.2-GCCcore-13.2.0.lua
MPC/1.3.1-GCCcore-13.2.0.lua
MPFR/4.2.1-GCCcore-13.2.0.lua
OpenPGM/5.2.122-GCCcore-13.2.0.lua
PAPI/7.1.0-GCCcore-13.2.0.lua
Pint/0.24-GCCcore-13.2.0.lua
PyYAML/6.0.1-GCCcore-13.2.0.lua
ZeroMQ/4.3.5-GCCcore-13.2.0.lua
dlb/3.4-gompi-2023b.lua
elfutils/0.190-GCCcore-13.2.0.lua
gmpy2/2.1.5-GCC-13.2.0.lua
jedi/0.19.1-GCCcore-13.2.0.lua
libdwarf/0.9.2-GCCcore-13.2.0.lua
libsodium/1.0.19-GCCcore-13.2.0.lua
libxslt/1.1.38-GCCcore-13.2.0.lua
libyaml/0.2.5-GCCcore-13.2.0.lua
lxml/4.9.3-GCCcore-13.2.0.lua
pystencils/1.3.4-gfbf-2023b.lua
sympy/1.12-gfbf-2023b.lua
software under 2023.06/software/linux/aarch64/nvidia/grace/software
Extrae/4.2.0-gompi-2023b
GMP/6.3.0-GCCcore-13.2.0
IPython/8.17.2-GCCcore-13.2.0
MPC/1.3.1-GCCcore-13.2.0
MPFR/4.2.1-GCCcore-13.2.0
OpenPGM/5.2.122-GCCcore-13.2.0
PAPI/7.1.0-GCCcore-13.2.0
Pint/0.24-GCCcore-13.2.0
PyYAML/6.0.1-GCCcore-13.2.0
ZeroMQ/4.3.5-GCCcore-13.2.0
dlb/3.4-gompi-2023b
elfutils/0.190-GCCcore-13.2.0
gmpy2/2.1.5-GCC-13.2.0
jedi/0.19.1-GCCcore-13.2.0
libdwarf/0.9.2-GCCcore-13.2.0
libsodium/1.0.19-GCCcore-13.2.0
libxslt/1.1.38-GCCcore-13.2.0
libyaml/0.2.5-GCCcore-13.2.0
lxml/4.9.3-GCCcore-13.2.0
pystencils/1.3.4-gfbf-2023b
sympy/1.12-gfbf-2023b
other under 2023.06/software/linux/aarch64/nvidia/grace
no other files in tarball
Mar 29 09:26:08 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-13545294.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@trz42
Copy link
Collaborator Author

trz42 commented Mar 29, 2025

Next attempt with using from-commit for Boost.MPI...
bot: build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace exportvariable:SKIP_TESTS=yes

@eessi-bot
Copy link

eessi-bot bot commented Mar 29, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • parsing the bot command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace exportvariable:SKIP_TESTS=yes, received from sender trz42, failed

@eessi-bot
Copy link

eessi-bot bot commented Mar 29, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • parsing the bot command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace exportvariable:SKIP_TESTS=yes, received from sender trz42, failed

@eessi-bot-toprichard
Copy link

Updates by the bot instance rt-Grace-jr (click for details)
  • account trz42 has NO permission to send commands to the bot

@eessi-bot-trz42
Copy link

eessi-bot-trz42 bot commented Mar 29, 2025

Updates by the bot instance trz42-GH200-jr (click for details)
  • received bot command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace exportvariable:SKIP_TESTS=yes from trz42

    • expanded format: build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace exportvariable:SKIP_TESTS=yes
  • handling command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace exportvariable:SKIP_TESTS=yes resulted in:

@eessi-bot-trz42
Copy link

eessi-bot-trz42 bot commented Mar 29, 2025

New job on instance trz42-GH200-jr for CPU micro-architecture aarch64-nvidia-grace for repository eessi.io-2023.06-software in job dir /p/project1/ceasybuilders/bot-trz42/jobs/2025.03/pr_989/13545363

date job status comment
Mar 29 09:34:22 UTC 2025 submitted job id 13545363 awaits release by job manager
Mar 29 09:35:13 UTC 2025 released job awaits launch by Slurm scheduler
Mar 29 09:36:16 UTC 2025 running job 13545363 is running
Mar 29 10:27:38 UTC 2025 finished
🤷 UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job13545363.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
Mar 29 10:27:38 UTC 2025 test result
🤷 UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job13545363.test does not exist in job directory, or parsing it failed.

@trz42
Copy link
Collaborator Author

trz42 commented Mar 29, 2025

Next try after removing wrong sources for Boost.MPI from caching directory...
bot: build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace exportvariable:SKIP_TESTS=yes

@eessi-bot
Copy link

eessi-bot bot commented Mar 29, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • parsing the bot command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace exportvariable:SKIP_TESTS=yes, received from sender trz42, failed

@eessi-bot
Copy link

eessi-bot bot commented Mar 29, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • parsing the bot command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace exportvariable:SKIP_TESTS=yes, received from sender trz42, failed

@eessi-bot-trz42
Copy link

eessi-bot-trz42 bot commented Mar 29, 2025

Updates by the bot instance trz42-GH200-jr (click for details)
  • received bot command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace exportvariable:SKIP_TESTS=yes from trz42

    • expanded format: build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace exportvariable:SKIP_TESTS=yes
  • handling command build instance:trz42-GH200-jr repository:eessi.io-2023.06-software architecture:aarch64/nvidia/grace exportvariable:SKIP_TESTS=yes resulted in:

@eessi-bot-toprichard
Copy link

Updates by the bot instance rt-Grace-jr (click for details)
  • account trz42 has NO permission to send commands to the bot

@eessi-bot-trz42
Copy link

eessi-bot-trz42 bot commented Mar 29, 2025

New job on instance trz42-GH200-jr for CPU micro-architecture aarch64-nvidia-grace for repository eessi.io-2023.06-software in job dir /p/project1/ceasybuilders/bot-trz42/jobs/2025.03/pr_989/13545377

date job status comment
Mar 29 10:28:40 UTC 2025 submitted job id 13545377 awaits release by job manager
Mar 29 10:29:43 UTC 2025 released job awaits launch by Slurm scheduler
Mar 29 10:30:46 UTC 2025 running job 13545377 is running
Mar 29 16:08:05 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-13545377.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-nvidia-grace-1743264090.tar.gzsize: 2012 MiB (2110692995 bytes)
entries: 117265
modules under 2023.06/software/linux/aarch64/nvidia/grace/modules/all
ATK/2.38.0-GCCcore-13.2.0.lua
Boost.MPI/1.83.0-gompi-2023b.lua
Brunsli/0.1-GCCcore-13.2.0.lua
ESPResSo/4.2.2-foss-2023b.lua
Extrae/4.2.0-gompi-2023b.lua
FLAC/1.4.3-GCCcore-13.2.0.lua
FriBidi/1.0.13-GCCcore-13.2.0.lua
GDAL/3.9.0-foss-2023b.lua
GEOS/3.12.1-GCC-13.2.0.lua
GLPK/5.0-GCCcore-13.2.0.lua
GMP/6.3.0-GCCcore-13.2.0.lua
GTK3/3.24.39-GCCcore-13.2.0.lua
Gdk-Pixbuf/2.42.10-GCCcore-13.2.0.lua
Ghostscript/10.02.1-GCCcore-13.2.0.lua
HDF/4.2.16-2-GCCcore-13.2.0.lua
HPL/2.3-foss-2023b.lua
HTSlib/1.19.1-GCC-13.2.0.lua
IPython/8.17.2-GCCcore-13.2.0.lua
ImageMagick/7.1.1-34-GCCcore-13.2.0.lua
Imath/3.1.9-GCCcore-13.2.0.lua
LAME/3.100-GCCcore-13.2.0.lua
LERC/4.0.0-GCCcore-13.2.0.lua
MPC/1.3.1-GCCcore-13.2.0.lua
MPFR/4.2.1-GCCcore-13.2.0.lua
NLopt/2.7.1-GCCcore-13.2.0.lua
OpenEXR/3.2.0-GCCcore-13.2.0.lua
OpenPGM/5.2.122-GCCcore-13.2.0.lua
PAPI/7.1.0-GCCcore-13.2.0.lua
Pango/1.51.0-GCCcore-13.2.0.lua
Pint/0.24-GCCcore-13.2.0.lua
PostgreSQL/16.1-GCCcore-13.2.0.lua
PyYAML/6.0.1-GCCcore-13.2.0.lua
R-bundle-CRAN/2024.06-foss-2023b.lua
R/4.4.1-gfbf-2023b.lua
STAR/2.7.11b-GCC-13.2.0.lua
SWIG/4.1.1-GCCcore-13.2.0.lua
Xerces-C++/3.2.5-GCCcore-13.2.0.lua
Xvfb/21.1.9-GCCcore-13.2.0.lua
ZeroMQ/4.3.5-GCCcore-13.2.0.lua
at-spi2-atk/2.38.0-GCCcore-13.2.0.lua
at-spi2-core/2.50.0-GCCcore-13.2.0.lua
dlb/3.4-gompi-2023b.lua
elfutils/0.190-GCCcore-13.2.0.lua
gmpy2/2.1.5-GCC-13.2.0.lua
jedi/0.19.1-GCCcore-13.2.0.lua
json-c/0.17-GCCcore-13.2.0.lua
libdwarf/0.9.2-GCCcore-13.2.0.lua
libepoxy/1.5.10-GCCcore-13.2.0.lua
libgeotiff/1.7.3-GCCcore-13.2.0.lua
libgit2/1.7.2-GCCcore-13.2.0.lua
libogg/1.3.5-GCCcore-13.2.0.lua
libopus/1.5.2-GCCcore-13.2.0.lua
libsndfile/1.2.2-GCCcore-13.2.0.lua
libsodium/1.0.19-GCCcore-13.2.0.lua
libtirpc/1.3.4-GCCcore-13.2.0.lua
libvorbis/1.3.7-GCCcore-13.2.0.lua
libxslt/1.1.38-GCCcore-13.2.0.lua
libyaml/0.2.5-GCCcore-13.2.0.lua
lxml/4.9.3-GCCcore-13.2.0.lua
nettle/3.9.1-GCCcore-13.2.0.lua
pyMBE/0.8.0-foss-2023b.lua
pystencils/1.3.4-gfbf-2023b.lua
sympy/1.12-gfbf-2023b.lua
xxd/9.1.0307-GCCcore-13.2.0.lua
software under 2023.06/software/linux/aarch64/nvidia/grace/software
ATK/2.38.0-GCCcore-13.2.0
Boost.MPI/1.83.0-gompi-2023b
Brunsli/0.1-GCCcore-13.2.0
ESPResSo/4.2.2-foss-2023b
Extrae/4.2.0-gompi-2023b
FLAC/1.4.3-GCCcore-13.2.0
FriBidi/1.0.13-GCCcore-13.2.0
GDAL/3.9.0-foss-2023b
GEOS/3.12.1-GCC-13.2.0
GLPK/5.0-GCCcore-13.2.0
GMP/6.3.0-GCCcore-13.2.0
GTK3/3.24.39-GCCcore-13.2.0
Gdk-Pixbuf/2.42.10-GCCcore-13.2.0
Ghostscript/10.02.1-GCCcore-13.2.0
HDF/4.2.16-2-GCCcore-13.2.0
HPL/2.3-foss-2023b
HTSlib/1.19.1-GCC-13.2.0
IPython/8.17.2-GCCcore-13.2.0
ImageMagick/7.1.1-34-GCCcore-13.2.0
Imath/3.1.9-GCCcore-13.2.0
LAME/3.100-GCCcore-13.2.0
LERC/4.0.0-GCCcore-13.2.0
MPC/1.3.1-GCCcore-13.2.0
MPFR/4.2.1-GCCcore-13.2.0
NLopt/2.7.1-GCCcore-13.2.0
OpenEXR/3.2.0-GCCcore-13.2.0
OpenPGM/5.2.122-GCCcore-13.2.0
PAPI/7.1.0-GCCcore-13.2.0
Pango/1.51.0-GCCcore-13.2.0
Pint/0.24-GCCcore-13.2.0
PostgreSQL/16.1-GCCcore-13.2.0
PyYAML/6.0.1-GCCcore-13.2.0
R-bundle-CRAN/2024.06-foss-2023b
R/4.4.1-gfbf-2023b
STAR/2.7.11b-GCC-13.2.0
SWIG/4.1.1-GCCcore-13.2.0
Xerces-C++/3.2.5-GCCcore-13.2.0
Xvfb/21.1.9-GCCcore-13.2.0
ZeroMQ/4.3.5-GCCcore-13.2.0
at-spi2-atk/2.38.0-GCCcore-13.2.0
at-spi2-core/2.50.0-GCCcore-13.2.0
dlb/3.4-gompi-2023b
elfutils/0.190-GCCcore-13.2.0
gmpy2/2.1.5-GCC-13.2.0
jedi/0.19.1-GCCcore-13.2.0
json-c/0.17-GCCcore-13.2.0
libdwarf/0.9.2-GCCcore-13.2.0
libepoxy/1.5.10-GCCcore-13.2.0
libgeotiff/1.7.3-GCCcore-13.2.0
libgit2/1.7.2-GCCcore-13.2.0
libogg/1.3.5-GCCcore-13.2.0
libopus/1.5.2-GCCcore-13.2.0
libsndfile/1.2.2-GCCcore-13.2.0
libsodium/1.0.19-GCCcore-13.2.0
libtirpc/1.3.4-GCCcore-13.2.0
libvorbis/1.3.7-GCCcore-13.2.0
libxslt/1.1.38-GCCcore-13.2.0
libyaml/0.2.5-GCCcore-13.2.0
lxml/4.9.3-GCCcore-13.2.0
nettle/3.9.1-GCCcore-13.2.0
pyMBE/0.8.0-foss-2023b
pystencils/1.3.4-gfbf-2023b
sympy/1.12-gfbf-2023b
xxd/9.1.0307-GCCcore-13.2.0
other under 2023.06/software/linux/aarch64/nvidia/grace
no other files in tarball
Mar 29 16:08:05 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
Failed for unknown reason
Details
✅ job output file slurm-13545377.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Mar 31 09:07:41 UTC 2025 uploaded transfer of eessi-2023.06-software-linux-aarch64-nvidia-grace-1743264090.tar.gz to S3 bucket succeeded

Copy link
Collaborator

@TopRichard TopRichard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ran some tests and the PR lgtm, we have to be cautious for the unforeseen effects of --contain

@trz42 trz42 added the bot:deploy Ask bot to deploy missing software installations to EESSI label Mar 31, 2025
@eessi-bot-toprichard
Copy link

Label bot:deploy has been set by user trz42, but this person does not have permission to trigger deployments

@trz42
Copy link
Collaborator Author

trz42 commented Mar 31, 2025

Smee client crashed. Re-setting label.

@trz42 trz42 added bot:deploy Ask bot to deploy missing software installations to EESSI and removed bot:deploy Ask bot to deploy missing software installations to EESSI labels Mar 31, 2025
@eessi-bot-toprichard
Copy link

Label bot:deploy has been set by user trz42, but this person does not have permission to trigger deployments

@trz42
Copy link
Collaborator Author

trz42 commented Mar 31, 2025

Tarball ingested and software available via /cvmfs

@TopRichard TopRichard merged commit 7a0f96e into EESSI:2023.06-software.eessi.io Mar 31, 2025
52 of 60 checks passed
@eessi-bot
Copy link

eessi-bot bot commented Mar 31, 2025

PR merged! Moved [] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.03.31

1 similar comment
@eessi-bot
Copy link

eessi-bot bot commented Mar 31, 2025

PR merged! Moved [] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.03.31

@eessi-bot-trz42
Copy link

PR merged! Moved ['/p/project1/ceasybuilders/bot-trz42/jobs/2025.03/pr_989/13545363', '/p/project1/ceasybuilders/bot-trz42/jobs/2025.03/pr_989/13545294', '/p/project1/ceasybuilders/bot-trz42/jobs/2025.03/pr_989/13545377'] to /p/project1/ceasybuilders/bot-trz42/trash_bin/EESSI/software-layer/2025.03.31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2023.06-software.eessi.io 2023.06 version of software.eessi.io bot:deploy Ask bot to deploy missing software installations to EESSI grace NVIDIA Grace CPU

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants