-
Notifications
You must be signed in to change notification settings - Fork 772
in-place update of TensorFlow 2.3.0 easyconfigs to version 2.3.1 #11375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Contains security fixes Removes the superflous keras-applications package and scipy package
|
Test report by @Flamefire |
|
Test report by @Flamefire |
|
@Flamefire Again trouble on POWER? |
|
Yes -.- Although it looks like a flake in the filesystem. Restarted the build but it's been running for 2 hrs now |
|
Test report by @lexming |
|
@boegelbot please test @ generoso |
|
@boegel: Request for testing this PR well received on generoso PR test command '
Test results coming soon (I hope)... - notification for comment with ID 699013665 processed Message to humans: this is just bookkeeping information for me, |
|
@lexming please rebuild double-conversion |
|
Test report by @Flamefire |
|
Test report by @boegel |
|
Test report by @boegel |
|
Test report by @boegel |
|
Test report by @boegelbot |
|
Test report by @lexming |
That is strange. You can try loading the build environment and check that this works? If it doesn't there is likely another Check each of those for a |
|
@Flamefire Is this related to #11143? @lexming Try re-installing |
|
Ah, yes exactly. That was the reason for the breaking change with the downloaded archive |
|
@Flamefire @boegel thanks for the feedback, testing again |
lexming
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Horovod fails to detect NCCL in our system
-- Linking against static NCCL library
CMake Error at /theia/home/apps/CO7/skylake/software/CMake/3.15.3-GCCcore-8.3.0/share/cmake-3.15/Modules/FindPackageHandleStandardArgs .cmake:137 (message):
Could NOT find NCCL (missing: NCCL_INCLUDE_DIR NCCL_LIBRARY)
Call Stack (most recent call first):
/theia/home/apps/CO7/skylake/software/CMake/3.15.3-GCCcore-8.3.0/share/cmake-3.15/Modules/FindPackageHandleStandardArgs.cmake:378 (_ FPHSA_FAILURE_MESSAGE)
cmake/Modules/FindNCCL.cmake:42 (find_package_handle_standard_args)
CMakeLists.txt:174 (find_package)
-- Configuring incomplete, errors occurred!
I fixed it by adding pkg-config as a build dependency of Horovod
|
Hm, that doesn't make sense, that module doesn't use pkg-config at all so adding it shouldn't change anything. Can you check PS: Just seen that there have been 2 bugfix releases to Horovod by now. @boegel Can I updated that too in this PR or a follow up? In particular it includes horovod/horovod#2272 which seems to make certain use cases working again, otherwise an import of a horovod file will fail |
|
Test report by @lexming |
|
Test report by @lexming |
|
@Flamefire I would update Horovod in a follow-up PR, let's try and get this one merged first... |
|
@lexming The failing test report on node375 is due to a missing |
|
@Flamefire @boegel the real issue is indeed the lack of |
|
Test report by @lexming |
lexming
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Going in, thanks @Flamefire ! |
Contains security fixes
Removes the superflous keras-applications package and scipy package
edit (@boegel): OK because TensorFlow 2.3.0 easyconfigs have only been merged very recently into
developvia #11040...