Skip to content

randint utils for samplers#26

Merged
ZenoTan merged 8 commits into
masterfrom
rand
Apr 27, 2022
Merged

randint utils for samplers#26
ZenoTan merged 8 commits into
masterfrom
rand

Conversation

@ZenoTan
Copy link
Copy Markdown
Member

@ZenoTan ZenoTan commented Apr 26, 2022

Added randint to enable fast random. It can be integrated into random samplers.

@ZenoTan ZenoTan requested review from rusty1s and yaoyaowd April 26, 2022 17:00
@ZenoTan ZenoTan self-assigned this Apr 26, 2022
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 26, 2022

Codecov Report

Merging #26 (7f600c9) into master (7ebaabb) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master       #26   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            4         5    +1     
  Lines           32        65   +33     
=========================================
+ Hits            32        65   +33     
Impacted Files Coverage Δ
pyg_lib/csrc/random/cpu/randint_engine.h 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7ebaabb...7f600c9. Read the comment docs.

Copy link
Copy Markdown

@yaoyaowd yaoyaowd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some document for the class as well?
Also by eyeball the code, bits_ seems always 64, so it is larger than B and never trigger the code in the if condition.

Comment thread pyg_lib/csrc/random/cpu/randint_engine.h
Comment thread pyg_lib/csrc/random/cpu/randint_engine.h Outdated
Comment thread pyg_lib/csrc/random/cpu/randint_engine.h Outdated
Comment thread pyg_lib/csrc/random/cpu/randint_engine.h Outdated
Comment thread pyg_lib/csrc/random/cpu/randint_engine.h Outdated
}
}
int64_t* prefetch_ptr = prefetch_randint_.data_ptr<int64_t>();
int64_t res = (prefetch_ptr[size_ - 1] % range) & ((1ULL << B) - 1);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we want ((1ULL << B) - 1);? to make sure we return positive numbers? I think maybe we can make sure whatever in prefetch_ptr is unsigned.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Treat random numbers as just unsigned bits by pointer casting

#include <vector>

#include "../../../pyg_lib/csrc/random/cpu/randint_engine.h"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a simple test to ensure random maybe 1000 times from a seed without duplicates.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one. It sometimes failed for 10000 times so I just have 1000 times. You have made a good guess :)

Comment thread pyg_lib/csrc/random/cpu/randint_engine.h Outdated
Comment thread pyg_lib/csrc/random/cpu/randint_engine.h Outdated
Comment thread pyg_lib/csrc/random/cpu/randint_engine.h Outdated
reinterpret_cast<uint64_t*>(prefetched_randint_.data_ptr<int64_t>());
uint64_t mask = (needed == 64) ? std::numeric_limits<uint64_t>::max()
: (1ULL << needed) - 1;
uint64_t res = (prefetch_ptr[size_ - 1] & mask) % range;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel if needed=16, and bits_=64, the next 4 calls of rand(1<<15) will return the same number because size_ is the same, mask is the same, and range is the same.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefetched element is shifted accordingly after that.

Copy link
Copy Markdown

@yaoyaowd yaoyaowd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments. The idea of using 64 bit for buffer and get 4 of 16 bit random number is brilliant. Is this benchmarked?

namespace pyg {
namespace random {

const int RAND_PREFETCH_THRESHOLD = 128;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this should be called RAND_PREFETCH_SIZE instead of THRESHOLD.

public:
PrefetchedRandint()
: PrefetchedRandint(RAND_PREFETCH_THRESHOLD, RAND_PREFETCH_BITS) {}
PrefetchedRandint(int size, int bits) : size_(size), bits_(bits) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need size_(size), bits_(bits) since they were initialized in prefetch function.

torch::randint(std::numeric_limits<int64_t>::min(),
std::numeric_limits<int64_t>::max(), {size},
torch::TensorOptions().dtype(torch::kInt64));
size_ = size;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts, what if we do size_=size - 1 and in the function above you can directly use prefetch_ptr[size] instead of size-1 everywhere.

Copy link
Copy Markdown
Member

@rusty1s rusty1s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good to me, and we can definitely merge this in. I feel that PrefetchedRandint needs more documentation and comments though.

Any reason why we go with this prefetching approach in the first place? It seems hard to extend if we think about weighted sampling in later stages. What speaks against using CPUGeneratorImpl directly from PyTorch?

CPUGeneratorImpl* generator = getDefaultCPUGenerator();
generator->random();

@ZenoTan
Copy link
Copy Markdown
Member Author

ZenoTan commented Apr 27, 2022

Compared with original randint (creating a randint tensor for every single random number), we reduce the tensor creation and checking overheads, so we get more than 10x faster. Our implementations in fact use high-level torch::randint API which will internally manage its internal random generator, mutex..., so the performance is similar to calling raw torch/STL random generators, and we save unused bits to reduce random number generation.

@@ -0,0 +1,118 @@
#pragma once

#include <torch/torch.h>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <torch/torch.h>
#include <ATen/Aten.h>

and replace all torch:: with at::. Just found out that this heavily improves compilation time.

@ZenoTan ZenoTan merged commit a745921 into master Apr 27, 2022
@ZenoTan ZenoTan deleted the rand branch April 27, 2022 14:19
@ZenoTan ZenoTan mentioned this pull request Apr 27, 2022
37 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants