Skip to content

Conversation

@boegel
Copy link
Member

@boegel boegel commented Sep 23, 2020

(created using eb --new-pr)

This is a candidate for intel/2020b...

@boegel boegel added the update label Sep 23, 2020
@boegel boegel added this to the next release (4.3.1) milestone Sep 23, 2020
@boegel
Copy link
Member Author

boegel commented Sep 23, 2020

@boegelbot please test @ generoso

@easybuilders easybuilders deleted a comment from boegelbot Sep 23, 2020
@easybuilders easybuilders deleted a comment from boegelbot Sep 23, 2020
@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on generoso

PR test command 'EB_PR=11337 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_11337 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 7877

Test results coming soon (I hope)...

- notification for comment with ID 697345114 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

boegelbot commented Sep 23, 2020

Test report by @boegelbot
FAILED
Build succeeded for 1 out of 6 (6 easyconfigs in this PR)
generoso-x-5 - Linux centos linux 8.2.2004, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/95e4fe185f22b9baf64958f52eaf4ae9 for a full test report.

edit (by @boegel): error setting up the boostrap proxies was caused by not having srun available via $PATH, see also #11425 (comment)

@boegel
Copy link
Member Author

boegel commented Sep 23, 2020

Test report by @boegel
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in this PR)
node2633.swalot.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz (haswell), Python 2.7.5
See https://gist.github.com/d2bf8fcb45f6233ad636b1c6d017f099 for a full test report.

@lexming
Copy link
Contributor

lexming commented Sep 24, 2020

Test report by @lexming
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in this PR)
node128.hydra.os - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, Python 2.7.5
See https://gist.github.com/2b0051c80f8b9bfd47ce6e4f2ca315e4 for a full test report.

@lexming
Copy link
Contributor

lexming commented Sep 24, 2020

Test report by @lexming
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in this PR)
node376.hydra.os - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, Python 2.7.5
See https://gist.github.com/361d250b9c5df62ceb1b04cbc0f2f5f6 for a full test report.

@boegel boegel added the 2020b issues & PRs related to 2020b label Sep 24, 2020
…und impi bug, no longer relevant for impi 2019 update 8

Co-authored-by: Alex Domingo <[email protected]>
@boegel
Copy link
Member Author

boegel commented Oct 5, 2020

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on generoso

PR test command 'EB_PR=11337 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_11337 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 8031

Test results coming soon (I hope)...

- notification for comment with ID 703803892 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

boegelbot commented Oct 5, 2020

Test report by @boegelbot
FAILED
Build succeeded for 1 out of 6 (6 easyconfigs in this PR)
generoso-x-3 - Linux centos linux 8.2.2004, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/ad9264af3c8e83ca7e353947fb5b5e09 for a full test report.

edit (by @boegel):

[1601922345.426920] [generoso-x-3:3774850:0]         select.c:444  UCX  ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable
Abort(1091215) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(136)........: 
MPID_Init(1138)..............: 
MPIDI_OFI_mpi_init_hook(1541): OFI get address vector map failed

@jhein32
Copy link
Collaborator

jhein32 commented Oct 9, 2020

Hmm, doing a dry-run, it wants two versions of bison

-bash-4.2$ eb HPL-2.3-intel-2020.09.eb --robot --dry-run --use-existing-modules --from-pr=11337
== temporary log file in case of crash /tmp/eb-7Oh6BJ/easybuild-QdalMh.log
== found valid index for /sw/easybuild/software/EasyBuild/4.3.0/easybuild/easyconfigs, so using it...
Dry run: printing build status of easyconfigs and dependencies
 * [x] /sw/easybuild/software/EasyBuild/4.3.0/easybuild/easyconfigs/m/M4/M4-1.4.18.eb (module: Core | M4/1.4.18)
 * [ ] /sw/easybuild/software/EasyBuild/4.3.0/easybuild/easyconfigs/b/Bison/Bison-3.7.1.eb (module: Core | Bison/3.7.1)
 * [x] /sw/easybuild/software/EasyBuild/4.3.0/easybuild/easyconfigs/b/Bison/Bison-3.3.2.eb (module: Core | Bison/3.3.2)
 * [x] /sw/easybuild/software/EasyBuild/4.3.0/easybuild/easyconfigs/z/zlib/zlib-1.2.11.eb (module: Core | zlib/1.2.11)
 * [x] /sw/easybuild/software/EasyBuild/4.3.0/easybuild/easyconfigs/h/help2man/help2man-1.47.4.eb (module: Core | help2man/1.47.4)
 * [x] /sw/easybuild/software/EasyBuild/4.3.0/easybuild/easyconfigs/f/flex/flex-2.6.4.eb (module: Core | flex/2.6.4)
 * [ ] /sw/easybuild/software/EasyBuild/4.3.0/easybuild/easyconfigs/b/binutils/binutils-2.35.eb (module: Core | binutils/2.35)
 * [ ] /sw/easybuild/software/EasyBuild/4.3.0/easybuild/easyconfigs/g/GCCcore/GCCcore-10.2.0.eb (module: Core | GCCcore/10.2.0)
 * [ ] /sw/easybuild/software/EasyBuild/4.3.0/easybuild/easyconfigs/z/zlib/zlib-1.2.11-GCCcore-10.2.0.eb (module: Compiler/GCCcore/10.2.0 | zlib/1.2.11)
 * [ ] /sw/easybuild/software/EasyBuild/4.3.0/easybuild/easyconfigs/h/help2man/help2man-1.47.16-GCCcore-10.2.0.eb (module: Compiler/GCCcore/10.2.0 | help2man/1.47.16)
 * [ ] /sw/easybuild/software/EasyBuild/4.3.0/easybuild/easyconfigs/m/M4/M4-1.4.18-GCCcore-10.2.0.eb (module: Compiler/GCCcore/10.2.0 | M4/1.4.18)
 * [ ] /sw/easybuild/software/EasyBuild/4.3.0/easybuild/easyconfigs/b/Bison/Bison-3.7.1-GCCcore-10.2.0.eb (module: Compiler/GCCcore/10.2.0 | Bison/3.7.1)
 * [ ] /sw/easybuild/software/EasyBuild/4.3.0/easybuild/easyconfigs/f/flex/flex-2.6.4-GCCcore-10.2.0.eb (module: Compiler/GCCcore/10.2.0 | flex/2.6.4)
 * [ ] /sw/easybuild/software/EasyBuild/4.3.0/easybuild/easyconfigs/b/binutils/binutils-2.35-GCCcore-10.2.0.eb (module: Compiler/GCCcore/10.2.0 | binutils/2.35)
 * [ ] /tmp/eb-7Oh6BJ/files_pr11337/i/iccifort/iccifort-2020.3.275.eb (module: Core | iccifort/2020.3.275)
 * [ ] /tmp/eb-7Oh6BJ/files_pr11337/p/pkg-config/pkg-config-0.29.2-GCCcore-10.2.0.eb (module: Compiler/GCCcore/10.2.0 | pkg-config/0.29.2)
 * [ ] /tmp/eb-7Oh6BJ/files_pr11337/l/libtool/libtool-2.4.6-GCCcore-10.2.0.eb (module: Compiler/GCCcore/10.2.0 | libtool/2.4.6)
 * [ ] /sw/easybuild/software/EasyBuild/4.3.0/easybuild/easyconfigs/e/expat/expat-2.2.9-GCCcore-10.2.0.eb (module: Compiler/GCCcore/10.2.0 | expat/2.2.9)
 * [ ] /sw/easybuild/software/EasyBuild/4.3.0/easybuild/easyconfigs/n/ncurses/ncurses-6.2-GCCcore-10.2.0.eb (module: Compiler/GCCcore/10.2.0 | ncurses/6.2)
 * [ ] /sw/easybuild/software/EasyBuild/4.3.0/easybuild/easyconfigs/l/libreadline/libreadline-8.0-GCCcore-10.2.0.eb (module: Compiler/GCCcore/10.2.0 | libreadline/8.0)
 * [ ] /sw/easybuild/software/EasyBuild/4.3.0/easybuild/easyconfigs/p/Perl/Perl-5.32.0-GCCcore-10.2.0.eb (module: Compiler/GCCcore/10.2.0 | Perl/5.32.0)
 * [ ] /tmp/eb-7Oh6BJ/files_pr11337/a/Autoconf/Autoconf-2.69-GCCcore-10.2.0.eb (module: Compiler/GCCcore/10.2.0 | Autoconf/2.69)
 * [ ] /tmp/eb-7Oh6BJ/files_pr11337/a/Automake/Automake-1.16.2-GCCcore-10.2.0.eb (module: Compiler/GCCcore/10.2.0 | Automake/1.16.2)
 * [ ] /tmp/eb-7Oh6BJ/files_pr11337/a/Autotools/Autotools-20200321-GCCcore-10.2.0.eb (module: Compiler/GCCcore/10.2.0 | Autotools/20200321)
 * [ ] /tmp/eb-7Oh6BJ/files_pr11337/n/numactl/numactl-2.0.13-GCCcore-10.2.0.eb (module: Compiler/GCCcore/10.2.0 | numactl/2.0.13)
 * [ ] /tmp/eb-7Oh6BJ/files_pr11337/u/UCX/UCX-1.9.0-GCCcore-10.2.0.eb (module: Compiler/GCCcore/10.2.0 | UCX/1.9.0)
 * [ ] /tmp/eb-7Oh6BJ/files_pr11337/i/impi/impi-2019.8.254-iccifort-2020.3.275.eb (module: Compiler/intel/2020.3.275 | impi/2019.8.254)
 * [ ] /tmp/eb-7Oh6BJ/files_pr11337/i/iimpi/iimpi-2020.09.eb (module: Core | iimpi/2020.09)
 * [ ] /tmp/eb-7Oh6BJ/files_pr11337/i/imkl/imkl-2020.3.279-iimpi-2020.09.eb (module: MPI/intel/2020.3.275/impi/2019.8.254 | imkl/2020.3.279)
 * [ ] /tmp/eb-7Oh6BJ/files_pr11337/i/intel/intel-2020.09.eb (module: Core | intel/2020.09)
 * [ ] /tmp/eb-7Oh6BJ/files_pr11337/h/HPL/HPL-2.3-intel-2020.09.eb (module: MPI/intel/2020.3.275/impi/2019.8.254 | HPL/2.3)

@boegel
Copy link
Member Author

boegel commented Oct 9, 2020

Hmm, doing a dry-run, it wants two versions of bison

@jhein32 That's because of the bootstrapping mechanism that is done for binutils+GCCcore. We can consider bumping the Bison version that is used as an indirect build dep for the initial binutils, but that'll only fix the issue for recent toolchains.

@jhein32
Copy link
Collaborator

jhein32 commented Oct 12, 2020

I reported in #10899 that this set-up allows multi node running without any issues regarding UCX. Performance of the hpl is very good.

@zao
Copy link
Contributor

zao commented Oct 16, 2020

Test report by @zao
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in total)
freja - Linux Ubuntu 20.04, x86_64, Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz (skylake), Python 3.8.5
See https://gist.github.com/ed1de9ab3adf96a59f6ee68ba47690f6 for a full test report.

@jhein32
Copy link
Collaborator

jhein32 commented Oct 26, 2020

One of our users reported a floating point exception in MKL with v2020 u1. The issue is still present in v2020 u3. She was advised by intel that this is fixed in v2020 u4. We did a test install of that version and v2020 u4 resolved her issues.

I am preparing a PR to move MKL to 2020 u4

jhein32 and others added 2 commits October 26, 2020 16:25
MKL component released in Oct 2020
move imkl to v2020.4.304 + bump version to intel/2020.10
@boegel boegel changed the title {toolchain} intel/2020.09 {toolchain} intel/2020.10 (candidate for intel/2020b) [WIP] Oct 26, 2020
@boegel
Copy link
Member Author

boegel commented Oct 26, 2020

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on generoso

PR test command 'EB_PR=11337 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_11337 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 8202

Test results coming soon (I hope)...

- notification for comment with ID 716797442 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 1 out of 6 (6 easyconfigs in total)
generoso-x-1 - Linux centos linux 8.2.2004, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/dc99f0a7684c755269bed5baef93a698 for a full test report.

@boegel
Copy link
Member Author

boegel commented Oct 27, 2020

Test report by @boegel
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in total)
node3163.skitty.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, Python 3.6.8
See https://gist.github.com/e9a89aec64220f4c4fe4d241e2df1b1e for a full test report.

@boegel
Copy link
Member Author

boegel commented Oct 27, 2020

The problem that is causing the failing test on generoso can be fixed by defining $UCX_TLS, as follows:

export UCX_TLS=rc,ud,sm,self

See also the discussion in #10899 .

@bartoldeman Any idea if it's safe to always set this, or should I implement a hook on generoso to only inject that variable for specific UCX versions?

@akesandgren
Copy link
Contributor

That's something that will be highly site specific, so please don't always set it.

@bartoldeman
Copy link
Contributor

It's safe but not good for performance

@jhein32
Copy link
Collaborator

jhein32 commented Oct 27, 2020

When does it fail? We had issues with multi node running of MPI codes. Everything build fine.

I think UCX_TLS should be set in the Intel mpi module, since e.g. OpenMPI seems to handle this fine, when ÙCX-TLS`is unset. So setting this in the UCX modules for all MPI libs seems not the done thing to me.

@akesandgren
Copy link
Contributor

It should not be set at all by upstream, as I said before, this is a site config and must remain so and, if at all, it should be handled in the site hooks.

@boegel
Copy link
Member Author

boegel commented Oct 28, 2020

@akesandgren Don't worry, I wasn't going to set it for everyone, just implement a hook on generoso to inject it into the impi module there...

generoso is a VM cluster that is only used for testing EasyBuild, MPI performance doesn't matter much there.

@jhein32 The problem occurs during the sanity check for impi, which is an mpirun of a trivial MPI hello world C program (the test/test.c that is included in the impi installation).

@akesandgren
Copy link
Contributor

@boegel I know, it was more a comment to @jhein32

@jhein32
Copy link
Collaborator

jhein32 commented Oct 28, 2020

@boegel I know, it was more a comment to @jhein32

I understood, that disabling dc in general is a bad idea. We are singing from the same hymnsheet.

But for some of these things there are hints in easyconfigs what one might want do if things don't work. I suggested to put this into impi instead of the ucx module. But I realise that others here think differently.

@jhein32
Copy link
Collaborator

jhein32 commented Oct 28, 2020

@jhein32 The problem occurs during the sanity check for impi, which is an mpirun of a trivial MPI hello world C program (the test/test.c that is included in the impi installation).

@boegel Hmm, I never had that fail when we had issues with intel/2020a. I assume that test was in there already. On our system this tests inside the build node and will not use our stone-age IB. So if the non-existence of dc makes it trip, I am wondering whether it is going out of the (virtual) node. Does your "virtual cluster" have something resembling multiple nodes?

@lexming
Copy link
Contributor

lexming commented Oct 29, 2020

Test report by @lexming
SUCCESS
Build succeeded for 8 out of 8 (6 easyconfigs in total)
node101.hydra.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, Python 2.7.5
See https://gist.github.com/f8468568e84b6b65cedd55a819ce6113 for a full test report.

@boegel boegel changed the title {toolchain} intel/2020.10 (candidate for intel/2020b) [WIP] {toolchain} intel/2020b Nov 6, 2020
@boegel
Copy link
Member Author

boegel commented Nov 6, 2020

I tested OpenFOAM 8 on top of intel/2020b after updating impi to 2019 update 9, works fine.

CP2K 7.1 test installation is still under way...

@boegel
Copy link
Member Author

boegel commented Nov 7, 2020

CP2K/7.1-intel-2020b worked fine on Intel Cascade Lake (CentOS 7), with good results for the regression test (correct: 3253 / 3270; new: 8; wrong: 8; failed: 1)

@boegel
Copy link
Member Author

boegel commented Nov 8, 2020

Also got WRF 3.9.1.1 to install with intel/2020b.

@boegel
Copy link
Member Author

boegel commented Nov 8, 2020

Test report by @boegel
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in total)
node3404.kirlia.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz (cascadelake), Python 2.7.5
See https://gist.github.com/2d2567a33db9ad87f83f2b28970e0cde for a full test report.

@boegel
Copy link
Member Author

boegel commented Nov 8, 2020

Test report by @boegel
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in total)
node3108.skitty.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, Python 3.6.8
See https://gist.github.com/4a9101ca528d0069c3d70c50a0ec5a58 for a full test report.

@Micket

This comment has been minimized.

@Micket
Copy link
Contributor

Micket commented Nov 8, 2020

Test report by @Micket
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in total)
vera-c1 - Linux centos linux 7.8.2003, x86_64, Intel Xeon Processor (Skylake), Python 2.7.5
See https://gist.github.com/c93a2ef57c8e521481e966529e56bb31 for a full test report.

Copy link
Contributor

@Micket Micket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Micket
Copy link
Contributor

Micket commented Nov 9, 2020

Going in, thanks @boegel!

@Micket Micket merged commit 3ee8e5a into easybuilders:develop Nov 9, 2020
@boegel boegel deleted the 20200923141429_new_pr_HPL23 branch November 9, 2020 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2020b issues & PRs related to 2020b update

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants