Skip to content

Comments

Correct the install command, static library name and typo in nccl.cmake.#5048

Merged
Xreki merged 2 commits intoPaddlePaddle:developfrom
Xreki:fix_nccl_typo
Oct 25, 2017
Merged

Correct the install command, static library name and typo in nccl.cmake.#5048
Xreki merged 2 commits intoPaddlePaddle:developfrom
Xreki:fix_nccl_typo

Conversation

@Xreki
Copy link
Contributor

@Xreki Xreki commented Oct 24, 2017

I try to build nccl manually, using make CUDA_HOME=$CUDA_ROOT, and I get the following outputs:

$ make CUDA_HOME=$CUDA_ROOT
Compiling src/libwrap.cu                      > /home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/obj/libwrap.o
Compiling src/core.cu                         > /home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/obj/core.o
Compiling src/all_gather.cu                   > /home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/obj/all_gather.o
Compiling src/all_reduce.cu                   > /home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/obj/all_reduce.o
Compiling src/broadcast.cu                    > /home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/obj/broadcast.o
Compiling src/reduce.cu                       > /home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/obj/reduce.o
Compiling src/reduce_scatter.cu               > /home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/obj/reduce_scatter.o
Linking   libnccl.so.1.3.4                    > /home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/lib/libnccl.so.1.3.4
Archiving libnccl_static.a                    > /home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/lib/libnccl_static.a

The static library's name is libnccl_static.a.
I try make install to install the library, and get errors:

$ make install
`/home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/lib/libnccl.so' -> `/usr/local/lib/libnccl.so'
cp: cannot create symbolic link `/usr/local/lib/libnccl.so': Permission denied
`/home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/lib/libnccl.so.1' -> `/usr/local/lib/libnccl.so.1'
cp: cannot create symbolic link `/usr/local/lib/libnccl.so.1': Permission denied
`/home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/lib/libnccl.so.1.3.4' -> `/usr/local/lib/libnccl.so.1.3.4'
cp: cannot create regular file `/usr/local/lib/libnccl.so.1.3.4': Permission denied
`/home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/lib/libnccl_static.a' -> `/usr/local/lib/libnccl_static.a'
cp: cannot create regular file `/usr/local/lib/libnccl_static.a': Permission denied
make: *** [install] Error 1

We need to specify the install directory as make install PREFIX=install:

$ make install PREFIX=install
`/home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/lib/libnccl.so' -> `install/lib/libnccl.so'
`/home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/lib/libnccl.so.1' -> `install/lib/libnccl.so.1'
`/home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/lib/libnccl.so.1.3.4' -> `install/lib/libnccl.so.1.3.4'
`/home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/lib/libnccl_static.a' -> `install/lib/libnccl_static.a'
`/home/liuyiqun01/github/Paddle/build_paddle/third_party/nccl/src/extern_nccl/build/include/nccl.h' -> `install/include/nccl.h'

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

else()
# otherwise, we build nccl and link it.
set(NCCL_INSTALL_DIR ${THIRD_PARTY_PATH}/install/nccl)
# Note: cuda 8.0 is needed to make nccl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么一定需要cuda 8.0呢?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为nccl的Makefile里面设定了NVCC_GENCODE变量,-gencode=arch=compute_60,code=sm_60以上在cuda 8.0才支持。

https://github.com/NVIDIA/nccl/blob/master/Makefile#L20-L25

不过,NVCC_GENCODE这个变量应该也可以在make传进去,比如:

set(NVCC_GENCODE -gencode=arch=compute_35,code=sm_35
                 -gencode=arch=compute_50,code=sm_50 
                 -gencode=arch=compute_52,code=sm_52)
set(NCCL_BUILD_COMMAND "make NVCC_GENCODE=${NVCC_GENCODE}")

@Xreki Xreki merged commit 288ffdd into PaddlePaddle:develop Oct 25, 2017
@Xreki Xreki deleted the fix_nccl_typo branch November 14, 2018 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants