`randint` utils for samplers by ZenoTan · Pull Request #26 · pyg-team/pyg-lib

ZenoTan · 2022-04-26T17:00:40Z

Added randint to enable fast random. It can be integrated into random samplers.

codecov-commenter · 2022-04-26T17:03:17Z

Codecov Report

Merging #26 (7f600c9) into master (7ebaabb) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master       #26   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            4         5    +1     
  Lines           32        65   +33     
=========================================
+ Hits            32        65   +33

Impacted Files	Coverage Δ
pyg_lib/csrc/random/cpu/randint_engine.h	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7ebaabb...7f600c9. Read the comment docs.

yaoyaowd

Could you add some document for the class as well?
Also by eyeball the code, bits_ seems always 64, so it is larger than B and never trigger the code in the if condition.

yaoyaowd · 2022-04-26T17:24:50Z

+      }
+    }
+    int64_t* prefetch_ptr = prefetch_randint_.data_ptr<int64_t>();
+    int64_t res = (prefetch_ptr[size_ - 1] % range) & ((1ULL << B) - 1);


why do we want ((1ULL << B) - 1);? to make sure we return positive numbers? I think maybe we can make sure whatever in prefetch_ptr is unsigned.

Treat random numbers as just unsigned bits by pointer casting

yaoyaowd · 2022-04-26T19:55:48Z

+#include <vector>
+
+#include "../../../pyg_lib/csrc/random/cpu/randint_engine.h"
+


could you add a simple test to ensure random maybe 1000 times from a seed without duplicates.

Added one. It sometimes failed for 10000 times so I just have 1000 times. You have made a good guess :)

yaoyaowd · 2022-04-26T20:23:39Z

+        reinterpret_cast<uint64_t*>(prefetched_randint_.data_ptr<int64_t>());
+    uint64_t mask = (needed == 64) ? std::numeric_limits<uint64_t>::max()
+                                   : (1ULL << needed) - 1;
+    uint64_t res = (prefetch_ptr[size_ - 1] & mask) % range;


I feel if needed=16, and bits_=64, the next 4 calls of rand(1<<15) will return the same number because size_ is the same, mask is the same, and range is the same.

The prefetched element is shifted accordingly after that.

yaoyaowd

A few minor comments. The idea of using 64 bit for buffer and get 4 of 16 bit random number is brilliant. Is this benchmarked?

yaoyaowd · 2022-04-27T04:50:20Z

+namespace pyg {
+namespace random {
+
+const int RAND_PREFETCH_THRESHOLD = 128;


nit: I think this should be called RAND_PREFETCH_SIZE instead of THRESHOLD.

yaoyaowd · 2022-04-27T04:50:54Z

+ public:
+  PrefetchedRandint()
+      : PrefetchedRandint(RAND_PREFETCH_THRESHOLD, RAND_PREFETCH_BITS) {}
+  PrefetchedRandint(int size, int bits) : size_(size), bits_(bits) {


I think we don't need size_(size), bits_(bits) since they were initialized in prefetch function.

yaoyaowd · 2022-04-27T04:52:46Z

+        torch::randint(std::numeric_limits<int64_t>::min(),
+                       std::numeric_limits<int64_t>::max(), {size},
+                       torch::TensorOptions().dtype(torch::kInt64));
+    size_ = size;


thoughts, what if we do size_=size - 1 and in the function above you can directly use prefetch_ptr[size] instead of size-1 everywhere.

rusty1s

Overall, this looks good to me, and we can definitely merge this in. I feel that PrefetchedRandint needs more documentation and comments though.

Any reason why we go with this prefetching approach in the first place? It seems hard to extend if we think about weighted sampling in later stages. What speaks against using CPUGeneratorImpl directly from PyTorch?

CPUGeneratorImpl* generator = getDefaultCPUGenerator();
generator->random();

ZenoTan · 2022-04-27T13:32:24Z

Compared with original randint (creating a randint tensor for every single random number), we reduce the tensor creation and checking overheads, so we get more than 10x faster. Our implementations in fact use high-level torch::randint API which will internally manage its internal random generator, mutex..., so the performance is similar to calling raw torch/STL random generators, and we save unused bits to reduce random number generation.

rusty1s · 2022-04-27T13:32:40Z

@@ -0,0 +1,118 @@
+#pragma once
+
+#include <torch/torch.h>


Suggested change

#include <torch/torch.h>

#include <ATen/Aten.h>

and replace all torch:: with at::. Just found out that this heavily improves compilation time.

randint

e404549

ZenoTan requested review from rusty1s and yaoyaowd April 26, 2022 17:00

ZenoTan self-assigned this Apr 26, 2022

yaoyaowd reviewed Apr 26, 2022

View reviewed changes

address some comments

f8455a6

ZenoTan added 0 - Priority P0 feature sampler labels Apr 26, 2022

fix test cov

8583458

yaoyaowd reviewed Apr 26, 2022

View reviewed changes

ZenoTan added 2 commits April 26, 2022 23:25

address comments

a0f700c

relax random test and more comments

213053b

yaoyaowd approved these changes Apr 27, 2022

View reviewed changes

rusty1s reviewed Apr 27, 2022

View reviewed changes

finalize

9e2f09e

rusty1s reviewed Apr 27, 2022

View reviewed changes

ZenoTan added 2 commits April 27, 2022 14:00

update

7b91c37

Merge branch 'master' of https://github.com/pyg-team/pyg-lib into rand

7f600c9

ZenoTan merged commit a745921 into master Apr 27, 2022

ZenoTan deleted the rand branch April 27, 2022 14:19

ZenoTan mentioned this pull request Apr 27, 2022

[Roadmap] 0.1.0 Release 🚀 #23

Open

37 tasks

		#include <vector>

		#include "../../../pyg_lib/csrc/random/cpu/randint_engine.h"

Conversation

ZenoTan commented Apr 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Apr 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yaoyaowd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaoyaowd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rusty1s left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZenoTan commented Apr 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ZenoTan commented Apr 26, 2022 •

edited

Loading

codecov-commenter commented Apr 26, 2022 •

edited

Loading

rusty1s left a comment •

edited

Loading

ZenoTan commented Apr 27, 2022 •

edited

Loading