port allocation from majel to paddle#2217
port allocation from majel to paddle#2217QiJune wants to merge 23 commits intoPaddlePaddle:developfrom
Conversation
paddle/majel/allocation.cc
Outdated
|
|
||
| #include "allocation.h" | ||
| #include "hl_cuda.h" | ||
| #include "paddle/utils/Logging.h" |
There was a problem hiding this comment.
I think we made an agreement yesterday that Majel shouldn't depend on Paddle? if so, for logging, we should just use google log directly?
cmake/generic.cmake
Outdated
| endif() | ||
| add_dependencies(${TARGET_NAME} ${cc_library_DEPS} ${external_project_dependencies}) | ||
| if(cc_library_DEPS) | ||
| target_link_libraries(${TARGET_NAME} ${cc_library_DEPS}) |
There was a problem hiding this comment.
Do we need to call target_link_librareis when we build a library? Or, all we need here is add_dependencies and the linking will happen when we build shared objects or executable binaries?
There was a problem hiding this comment.
Done. Remove the DEPS to add_test
paddle/majel/allocation.cc
Outdated
| #include <boost/variant.hpp> | ||
|
|
||
| #include "allocation.h" | ||
| #include "hl_cuda.h" |
There was a problem hiding this comment.
I think we should create malloc.{h,cc}, which calls C runtime malloc and cudaMalloc, but doesn't depend on hl_cuda. In that way, we don't make Majel depend on Paddle.
| inline bool operator!=(const GpuPlace& o) const { return !(*this == o); } | ||
|
|
||
| GpuPlace() : GpuPlace(0) {} | ||
| int device; |
There was a problem hiding this comment.
I don't get it -- why we should make GpuPlace no longer distinguish GPUs? Is that because we want to use the CUDA context to determine the current GPU?
If so, I think what we need to do is not removing int device; from GpuPlace, but to redefine get_place to call cuCtxGetDevice?
There was a problem hiding this comment.
One CPU thread is binding to one GPU card. Every cpu thread will set the GPU card first, using cudaSetDevice. There is no need for the tensor hold the place which the specific GPU card is.
There was a problem hiding this comment.
I am afraid that we cannot assume this? How if we are going to support OpenCL/FPGA other than CUDA? Would this assumption become a bug?
There was a problem hiding this comment.
I think communications between devices are costful, one neural network will be run on one device. If we are going to support OpenCL/FPGA, we can just define Place as follows:
typedef boost::variant<CpuPlace, CudaPlace, OpenclPlace, FpgaPlace> Place;
Then, we implement methods, such as malloc/free of corresponding device.
There was a problem hiding this comment.
I agree that we can run a neural network on one device, but when we aggregate gradients/parameters from these devices, it seems that we need to copy data from/to exact places?
There was a problem hiding this comment.
We will create one Context for one device, and then set device id to specific Context. The device id will be handled in Paddle, not in tensor library. It's sure that we can copy data between different GPU cards. Following is a example:
cudaSetDevice(1);
cudaDeviceEnablePeerAccess(2,flags); //flags=0
cudaSetDevice(2);
cudaDeviceEnablePeerAccess(1,flags); //flags=0
// Allocate some data
float *gpu1data, *gpu2data;
cudaSetDevice(1);
cudaMalloc(&gpu1data, nbytes);
cudaSetDevice(2);
cudaMalloc(&gpu2data, nbytes);
// Do the p2p copy!
cudaMemcpy(gpu1data, gpu2data, cudaMemcpyDefault);
The gpu data block does not hold device id information, but the Context does.
paddle/majel/allocation.cc
Outdated
| } | ||
|
|
||
| void* operator()(const GpuPlace& p) const { | ||
| void* address = hl_malloc_device(size_); |
There was a problem hiding this comment.
I think we can move the definition of hl_maclloc_device to paddle/majel/malloc.cc. The definition has only 4 lines of code. In this way, Majel doesn't rely on paddle/cuda.
7b08aad to
7822c86
Compare
3064514 to
2d6a2be
Compare
paddle/majel/malloc.cc
Outdated
| return dest_d; | ||
| } | ||
|
|
||
| void free_mem_device(void* dest_d) { |
There was a problem hiding this comment.
It seems that this function is the counterpart of malloc_cuda, so it should be name free_cuda?
paddle/majel/malloc.cc
Outdated
| return cudaGetErrorString((cudaError_t)err); | ||
| } | ||
|
|
||
| void* malloc_device(size_t size) { |
There was a problem hiding this comment.
How about rename malloc_device into malloc_cuda? We have a plan to support other device interfaces like OpenCL and FPGA.
| } | ||
| #endif | ||
|
|
||
| class DefaultAllocator { |
There was a problem hiding this comment.
According to its name, class DefaultAllocator should be in allocation.{h,cc}; instead of malloc.{h,cc}?
There was a problem hiding this comment.
Majel provides various memory management policies. Every memory management policy can be abstracted into a allocator class. Here, we just implement a simple one first, DefalutAllocator.
malloc is a global method which is responsible for memory allocation. malloc will choose a specific memory allocation policy. And Allocation is a memory block handled by Array and will call malloc method.
There was a problem hiding this comment.
I agree with every sentence in your comment. And it seems that's the reason we should move class Allocation to allocation.{h,cc}?
paddle/majel/allocation.h
Outdated
| #pragma once | ||
| #include <memory> | ||
|
|
||
| #include "place.h" |
There was a problem hiding this comment.
Let us use full include path name -- #include "paddle/majel/place.h". I know everyone might have a different idea about this, but let us just unify. Thanks.
paddle/majel/allocation.h
Outdated
| @@ -0,0 +1,37 @@ | |||
| #pragma once | |||
| #include <memory> | |||
There was a problem hiding this comment.
Is <memory> mandatory in this header file? Should we move it to allocation.cc?
paddle/majel/malloc.cc
Outdated
| #include <cuda_runtime.h> | ||
| #endif | ||
|
|
||
| #define CHECK_CUDA(cudaFunc) \ |
There was a problem hiding this comment.
I remember that @reyoung and @helinwang both warning some time ago that PaddlePaddle as a library mustn't fatal with error, but need to return the error and make sure that it can be handled by the caller. I think it's worthy to confirm with them.
There was a problem hiding this comment.
Agree, I think in general a library should never fatal. On the exception that if it is in an unrecoverable state. malloc failure is a good example. Maybe CUDA failure is (almost) unrecoverable as well? If CHECK_CUDA fails, can the client do something to overcome the problem?
There was a problem hiding this comment.
At first, I think that if malloc method gets an error, the client can hardly do nothing to overcome the problem. So, just let it fatal.
Second, if we check the result state of malloc, then we have to check the result state when we construct an array. It's will be quite fussy.
There was a problem hiding this comment.
In terms of C library design principle, how about that we just follow C's convention of malloc -- returns NULL when the allocation fails?
There was a problem hiding this comment.
In majel's code, when allocation fails, a bad allocation exception will be throw.
if (ptr_ == nullptr) {
throw std::bad_alloc();
}
If we do not throw an exception or CHECK FATAL, we have to check if the Array has been constructed correctly.
I suggest that we can return error state of most other operations except malloc/free and some CUDA operations. Because we can hardly do something to make the client to overcome if these system related APIs go down.
There was a problem hiding this comment.
Let us don't CHECK_EQ here and remove the definition of CHECK_CUDA.
It is not hard to handle the error returned by cudaMalloc. Majel did the following in src/gpu/detail/cuda.cu:
void* malloc(size_t size) {
void* ptr = 0;
cudaError_t result = cudaMalloc(&ptr, size);
if (result == cudaSuccess) {
return ptr;
}
// clear last error
cudaGetLastError();
return nullptr;
}There was a problem hiding this comment.
Yes, but in allocation.cu, we will find
Allocation::Allocation(size_t size, Place place) :
place_(place), size_(size), owned_(true) {
if (size > 0) {
majel::detail::Allocator allocator(size_);
ptr_ = boost::apply_visitor(allocator,
place_);
if (ptr_ == nullptr) {
throw std::bad_alloc();
}
}
}
So, I think that maybe CHECK fatal and throwing a bad_alloc exception will make the same difference. The server will go down, and client can not recover either.
| inline bool operator!=(const GpuPlace& o) const { return !(*this == o); } | ||
|
|
||
| GpuPlace() : GpuPlace(0) {} | ||
| int device; |
There was a problem hiding this comment.
I am afraid that we cannot assume this? How if we are going to support OpenCL/FPGA other than CUDA? Would this assumption become a bug?
|
|
||
| class DefaultAllocator { | ||
| public: | ||
| static void* malloc(majel::Place place, size_t size); |
There was a problem hiding this comment.
Error __must_check malloc(majel::Place place, size_t size, void** ptr);There was a problem hiding this comment.
At first, I think that if malloc method gets an error, the client can hardly do nothing to overcome the problem. So, just let it fatal.
Second, if we check the result state of malloc, then we have to check the result state when we construct an array. It's will be quite fussy.
There was a problem hiding this comment.
I think it is common to expose the out-of-memory error to the client code, and I think it is the client code's responsibility to recover the error by either print some error message or try another device.
But I'd make the conclusion here to keep @QiJune 's original function definition because it follows C malloc's signature.
| typedef boost::variant<CpuPlace, GpuPlace> Place; | ||
| #else | ||
| typedef boost::variant<CpuPlace> Place; | ||
| #endif |
There was a problem hiding this comment.
如果编译CPU版本的话,那么Place里面只能接受CpuPlace;这个时候给一个Array传递GpuPlace,就会在编译的时候报错。
666ad98 to
7ae5e38
Compare
|
Close due to paddle use Eigen instead of majel. |
No description provided.