Conversation
reyoung
left a comment
There was a problem hiding this comment.
BTW, maybe thrust::sort can be used for GPU implementation.
https://thrust.github.io/doc/group__sorting.html#ga1099d781e06c43805be06a918f7b7499
paddle/fluid/operators/argsort_op.cc
Outdated
| "Output(Indices) of ArgsortOp should not be null."); | ||
|
|
||
| auto in_dims = ctx->GetInputDim("X"); | ||
| int axis = static_cast<int>(ctx->Attrs().Get<int>("axis")); |
There was a problem hiding this comment.
Remove static_cast<int>() .
paddle/fluid/operators/argsort_op.cc
Outdated
| AddInput("X", "(Tensor) The input of Argsort op."); | ||
| AddOutput("Out", "(Tensor) The sorted tensor of Argsort op."); | ||
| AddOutput("Indices", | ||
| "(Tensor) The indices of a tensor giving the sorted order."); |
There was a problem hiding this comment.
Give the shape for Out and Indices.
paddle/fluid/operators/argsort_op.h
Outdated
| auto* input = ctx.Input<framework::Tensor>("X"); | ||
| auto* output = ctx.Output<framework::Tensor>("Out"); | ||
| auto* indices = ctx.Output<framework::Tensor>("Indices"); | ||
| int axis = static_cast<int>(ctx.Attr<int>("axis")); |
There was a problem hiding this comment.
Remove static_cast<int>()
paddle/fluid/operators/argsort_op.cc
Outdated
| PADDLE_ENFORCE(axis >= 0 || axis == -1, | ||
| "Attr(axis) %d of ArgsortOp must be nonnegative or equal to " | ||
| "-1.", | ||
| axis); |
There was a problem hiding this comment.
If axis < 0, we can re-set the axis = in_dims.size() + axis ? not limited to -1 for the negative value.
| out_data[index] = in_vec[j].first; | ||
| idx_data[index] = in_vec[j].second; | ||
| } | ||
| } |
There was a problem hiding this comment.
Line 40-73 can be changed to be more efficient and save memory used.
int64_t part_dims_prod = input->numel() / in_dims[axis];
int64_t step = 1;
for (int64_t i = in_dims.size()-1; i > axis; --i) step *= in_dims[i];
std::vector<int64_t> org_index_vec(in_dims.size());
std::vector<int64_t> idx_vec(in_dims.size());
idx_vec[axis] = 0;
for (int64_t i = 0; i < part_dims_prod; ++i) {
for (int64_t dim = in_dims.size() - 1; dim >= 0; --dim) {
if (dim != axis) {
idx_vec[dim] = idx % in_dims[dim];
idx /= in_dims[dim];
}
}
int64_t start_index = idx_vec[0];
for (int64_t dim = 1; dim < in_dims.size(); ++dim) {
start_index = start_index * in_dims[dim] + idx_vec[dim];
}
for (int64_t j = 0; j < in_dims.size(); ++j) {
org_index_vec[j] = start_index + j*step;
}
std::sort(
org_index_vec.begin(), org_index_vec.end(),
[in_data](int64_t idx1, int64_t idx2) {
return in_data[idx1] < in_data[idx2];
});
for (size_t j = 0; j < org_index_vec.size(); ++j) {
int64_t org_index = org_index_vec[j];
int64_t ret_index = start_index + j*step;
out_data[ret_index] = in_data[org_index];
idx_data[ret_index] = org_index;
}
}There was a problem hiding this comment.
Thanks! It is a good idea to only sort the index, and I made the change. Please take a look.
|
@reyoung Yes, we can use |
paddle/fluid/operators/argsort_op.cu
Outdated
|
|
||
| auto stream = reinterpret_cast<const platform::CUDADeviceContext&>( | ||
| ctx.device_context()) | ||
| .stream(); |
There was a problem hiding this comment.
auto stream = ctx.cuda_device_context().stream();
paddle/fluid/operators/argsort_op.cu
Outdated
| int64_t* med_ids) { | ||
| int64_t index = threadIdx.x + blockDim.x * blockIdx.x; | ||
| if (index < n) { | ||
| const int max_rank = 9; // Max rank of a tensor allow in Fluid |
There was a problem hiding this comment.
Move this constant variable before line 19.
const int kMaxRank = 6; There was a problem hiding this comment.
Do you mean outside the kernel function? Then done.
| class TestArgsortOp(OpTest): | ||
| def setUp(self): | ||
| self.init_axis() | ||
| x = np.random.random((2, 3, 4, 5)).astype("float32") |
There was a problem hiding this comment.
This unit testing has no gradient checking. so, better to use large shape here to coverage more case, since PADDLE_CUDA_NUM_THREADS is large.
python/paddle/fluid/layers/tensor.py
Outdated
| return out | ||
|
|
||
|
|
||
| def argsort(input, axis=-1): |
There was a problem hiding this comment.
-
Need unit testing in https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/unittests/test_layers.py
-
Better add
name.
def argsort(input, axis=-1, name=None):
paddle/fluid/operators/argsort_op.cu
Outdated
|
|
||
| auto stream = reinterpret_cast<const platform::CUDADeviceContext&>( | ||
| ctx.device_context()) | ||
| .stream(); |
python/paddle/fluid/layers/tensor.py
Outdated
| return out | ||
|
|
||
|
|
||
| def argsort(input, axis=-1): |
| class TestArgsortOp(OpTest): | ||
| def setUp(self): | ||
| self.init_axis() | ||
| x = np.random.random((2, 3, 4, 5)).astype("float32") |
paddle/fluid/operators/argsort_op.cu
Outdated
| int64_t* med_ids) { | ||
| int64_t index = threadIdx.x + blockDim.x * blockIdx.x; | ||
| if (index < n) { | ||
| const int max_rank = 9; // Max rank of a tensor allow in Fluid |
There was a problem hiding this comment.
Do you mean outside the kernel function? Then done.
Add the argsort operator
Resolve #11399