port allocation from majel to paddle by QiJune · Pull Request #2217 · PaddlePaddle/Paddle

QiJune · 2017-05-19T09:57:30Z

No description provided.

wangkuiyi · 2017-05-19T17:45:27Z

paddle/majel/allocation.cc

+
+#include "allocation.h"
+#include "hl_cuda.h"
+#include "paddle/utils/Logging.h"


I think we made an agreement yesterday that Majel shouldn't depend on Paddle? if so, for logging, we should just use google log directly?

wangkuiyi · 2017-05-19T17:47:27Z

cmake/generic.cmake

  endif()
-  add_dependencies(${TARGET_NAME} ${cc_library_DEPS} ${external_project_dependencies})
+  if(cc_library_DEPS)
+    target_link_libraries(${TARGET_NAME} ${cc_library_DEPS})


Do we need to call target_link_librareis when we build a library? Or, all we need here is add_dependencies and the linking will happen when we build shared objects or executable binaries?

Done. Remove the DEPS to add_test

wangkuiyi · 2017-05-19T17:48:36Z

paddle/majel/allocation.cc

+#include <boost/variant.hpp>
+
+#include "allocation.h"
+#include "hl_cuda.h"


I think we should create malloc.{h,cc}, which calls C runtime malloc and cudaMalloc, but doesn't depend on hl_cuda. In that way, we don't make Majel depend on Paddle.

wangkuiyi · 2017-05-19T17:49:21Z

paddle/majel/place.h

-  inline bool operator!=(const GpuPlace& o) const { return !(*this == o); }
-
-  GpuPlace() : GpuPlace(0) {}
-  int device;


I don't get it -- why we should make GpuPlace no longer distinguish GPUs? Is that because we want to use the CUDA context to determine the current GPU?

If so, I think what we need to do is not removing int device; from GpuPlace, but to redefine get_place to call cuCtxGetDevice?

One CPU thread is binding to one GPU card. Every cpu thread will set the GPU card first, using cudaSetDevice. There is no need for the tensor hold the place which the specific GPU card is.

I am afraid that we cannot assume this? How if we are going to support OpenCL/FPGA other than CUDA? Would this assumption become a bug?

I think communications between devices are costful, one neural network will be run on one device. If we are going to support OpenCL/FPGA, we can just define Place as follows:

typedef boost::variant<CpuPlace, CudaPlace, OpenclPlace, FpgaPlace> Place;

Then, we implement methods, such as malloc/free of corresponding device.

I agree that we can run a neural network on one device, but when we aggregate gradients/parameters from these devices, it seems that we need to copy data from/to exact places?

We will create one Context for one device, and then set device id to specific Context. The device id will be handled in Paddle, not in tensor library. It's sure that we can copy data between different GPU cards. Following is a example:

cudaSetDevice(1); cudaDeviceEnablePeerAccess(2,flags); //flags=0 cudaSetDevice(2); cudaDeviceEnablePeerAccess(1,flags); //flags=0 // Allocate some data float *gpu1data, *gpu2data; cudaSetDevice(1); cudaMalloc(&gpu1data, nbytes); cudaSetDevice(2); cudaMalloc(&gpu2data, nbytes); // Do the p2p copy! cudaMemcpy(gpu1data, gpu2data, cudaMemcpyDefault);

The gpu data block does not hold device id information, but the Context does.

wangkuiyi · 2017-05-19T18:27:22Z

paddle/majel/allocation.cc

+  }
+
+  void* operator()(const GpuPlace& p) const {
+    void* address = hl_malloc_device(size_);


I think we can move the definition of hl_maclloc_device to paddle/majel/malloc.cc. The definition has only 4 lines of code. In this way, Majel doesn't rely on paddle/cuda.

…cation

wangkuiyi · 2017-05-23T23:30:54Z

paddle/majel/malloc.cc

+  return dest_d;
+}
+
+void free_mem_device(void* dest_d) {


It seems that this function is the counterpart of malloc_cuda, so it should be name free_cuda?

wangkuiyi · 2017-05-23T23:36:07Z

paddle/majel/malloc.cc

+  return cudaGetErrorString((cudaError_t)err);
+}
+
+void* malloc_device(size_t size) {


How about rename malloc_device into malloc_cuda? We have a plan to support other device interfaces like OpenCL and FPGA.

wangkuiyi · 2017-05-23T23:46:20Z

paddle/majel/malloc.cc

+}
+#endif
+
+class DefaultAllocator {


According to its name, class DefaultAllocator should be in allocation.{h,cc}; instead of malloc.{h,cc}?

Majel provides various memory management policies. Every memory management policy can be abstracted into a allocator class. Here, we just implement a simple one first, DefalutAllocator.
malloc is a global method which is responsible for memory allocation. malloc will choose a specific memory allocation policy. And Allocation is a memory block handled by Array and will call malloc method.

I agree with every sentence in your comment. And it seems that's the reason we should move class Allocation to allocation.{h,cc}?

wangkuiyi · 2017-05-23T23:47:53Z

paddle/majel/allocation.h

+#pragma once
+#include <memory>
+
+#include "place.h"


Let us use full include path name -- #include "paddle/majel/place.h". I know everyone might have a different idea about this, but let us just unify. Thanks.

wangkuiyi · 2017-05-23T23:49:05Z

paddle/majel/allocation.h

@@ -0,0 +1,37 @@
+#pragma once
+#include <memory>


Is <memory> mandatory in this header file? Should we move it to allocation.cc?

wangkuiyi · 2017-05-23T23:50:43Z

paddle/majel/malloc.cc

+#include <cuda_runtime.h>
+#endif
+
+#define CHECK_CUDA(cudaFunc)                                         \


I remember that @reyoung and @helinwang both warning some time ago that PaddlePaddle as a library mustn't fatal with error, but need to return the error and make sure that it can be handled by the caller. I think it's worthy to confirm with them.

Agree, I think in general a library should never fatal. On the exception that if it is in an unrecoverable state. malloc failure is a good example. Maybe CUDA failure is (almost) unrecoverable as well? If CHECK_CUDA fails, can the client do something to overcome the problem?

At first, I think that if malloc method gets an error, the client can hardly do nothing to overcome the problem. So, just let it fatal.
Second, if we check the result state of malloc, then we have to check the result state when we construct an array. It's will be quite fussy.

In terms of C library design principle, how about that we just follow C's convention of malloc -- returns NULL when the allocation fails?

In majel's code, when allocation fails, a bad allocation exception will be throw.

if (ptr_ == nullptr) { throw std::bad_alloc(); }

If we do not throw an exception or CHECK FATAL, we have to check if the Array has been constructed correctly.
I suggest that we can return error state of most other operations except malloc/free and some CUDA operations. Because we can hardly do something to make the client to overcome if these system related APIs go down.

Let us don't CHECK_EQ here and remove the definition of CHECK_CUDA.

It is not hard to handle the error returned by cudaMalloc. Majel did the following in src/gpu/detail/cuda.cu:

void* malloc(size_t size) { void* ptr = 0; cudaError_t result = cudaMalloc(&ptr, size); if (result == cudaSuccess) { return ptr; } // clear last error cudaGetLastError(); return nullptr; }

Yes, but in allocation.cu, we will find

Allocation::Allocation(size_t size, Place place) : place_(place), size_(size), owned_(true) { if (size > 0) { majel::detail::Allocator allocator(size_); ptr_ = boost::apply_visitor(allocator, place_); if (ptr_ == nullptr) { throw std::bad_alloc(); } } }

So, I think that maybe CHECK fatal and throwing a bad_alloc exception will make the same difference. The server will go down, and client can not recover either.

wangkuiyi · 2017-05-23T23:52:19Z

paddle/majel/place.h

-  inline bool operator!=(const GpuPlace& o) const { return !(*this == o); }
-
-  GpuPlace() : GpuPlace(0) {}
-  int device;


I am afraid that we cannot assume this? How if we are going to support OpenCL/FPGA other than CUDA? Would this assumption become a bug?

reyoung · 2017-05-24T02:33:37Z

paddle/majel/malloc.cc

+
+class DefaultAllocator {
+public:
+  static void* malloc(majel::Place place, size_t size);


Error __must_check malloc(majel::Place place, size_t size, void** ptr);

At first, I think that if malloc method gets an error, the client can hardly do nothing to overcome the problem. So, just let it fatal.
Second, if we check the result state of malloc, then we have to check the result state when we construct an array. It's will be quite fussy.

I think it is common to expose the out-of-memory error to the client code, and I think it is the client code's responsibility to recover the error by either print some error message or try another device.

But I'd make the conclusion here to keep @QiJune 's original function definition because it follows C malloc's signature.

…/Paddle into fetaure/tensor_allocation

QiJune · 2017-06-01T12:28:12Z

paddle/majel/place.h

+typedef boost::variant<CpuPlace, GpuPlace> Place;
+#else
+typedef boost::variant<CpuPlace> Place;
+#endif


如果编译CPU版本的话，那么Place里面只能接受CpuPlace；这个时候给一个Array传递GpuPlace，就会在编译的时候报错。

luotao1 · 2019-02-01T11:24:04Z

Close due to paddle use Eigen instead of majel.

QiJune added 9 commits May 17, 2017 19:04

port allocation from majel to paddle

fa0e6e1

add cpu unittest for allocation

456e6d9

merge baidu/develop

8b3220a

change unittest cmake file

f202377

add gpu unittest for majel allocation

57b67d6

support only cpu unittest

da380e8

merge baidu/develop

45886cf

follow comments

80af107

resolve conflicts

13bfb93

QiJune requested a review from JiayiFeng May 19, 2017 10:47

wangkuiyi requested changes May 19, 2017

View reviewed changes

wangkuiyi reviewed May 19, 2017

View reviewed changes

QiJune requested review from gangliao and wangkuiyi May 22, 2017 02:11

QiJune added 3 commits May 22, 2017 11:53

implement majel malloc

c3831ed

Merge remote-tracking branch 'baidu/develop' into fetaure/tensor_allo…

30daf3f

…cation

refine codes

7822c86

QiJune force-pushed the fetaure/tensor_allocation branch from 7b08aad to 7822c86 Compare May 22, 2017 08:46

fix gpu build error

2d6a2be

QiJune force-pushed the fetaure/tensor_allocation branch from 3064514 to 2d6a2be Compare May 23, 2017 05:37

rename some methods

5dafd75

wangkuiyi reviewed May 23, 2017

View reviewed changes

reyoung reviewed May 24, 2017

View reviewed changes

QiJune added 5 commits May 24, 2017 07:10

follow comments

be29b9c

follow comments

2258c23

fix gpu build error

4af8933

merge baidu/develop

74b9d91

Merge branch 'fetaure/tensor_allocation' of https://github.com/QiJune…

99f6048

…/Paddle into fetaure/tensor_allocation

QiJune commented Jun 1, 2017

View reviewed changes

QiJune added 2 commits June 8, 2017 15:38

revert device id of gpu id

47de9b4

merge baidu/develop and fix gpu build error

7ae5e38

QiJune force-pushed the fetaure/tensor_allocation branch from 666ad98 to 7ae5e38 Compare June 12, 2017 05:06

QiJune added 2 commits June 12, 2017 13:09

make glog and glags compiling first

3c8e470

check return value of posix_memalign

000a1e0

luotao1 closed this Feb 1, 2019

Comments

Conversation

QiJune commented May 19, 2017

Uh oh!

wangkuiyi May 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangkuiyi May 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangkuiyi May 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

helinwang May 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

QiJune May 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangkuiyi May 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangkuiyi May 19, 2017 •

edited

Loading

wangkuiyi May 19, 2017 •

edited

Loading

wangkuiyi May 24, 2017 •

edited

Loading

helinwang May 23, 2017 •

edited

Loading

QiJune May 24, 2017 •

edited

Loading

wangkuiyi May 24, 2017 •

edited

Loading

wangkuiyi May 31, 2017 •

edited

Loading