-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Conversation
nickjong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: is there any prospect of controlling this behavior with a random seed? What random seed does it use?
In order to get an O(n) running time without random access, we had to rely on The idea of having a random seed for shuffle is related to #3122. There I was planning on using the SFrame shuffle only if the user did not pass a random seed to Something I've considered doing instead is moving that sort based shuffle functionality from the object detector into SFrame shuffle. The end result would be that SArray/SFrame shuffle would have a seed option; shuffle would run in O(n) time if no seed is given and O(n log n) if a seed is given. This shouldn't add too much additional work to #3122. Let me know if you think that is worth doing. |
One wrinkle here is that even when the user does not pass in a random seed, we choose one (er, at random) and record it in the model attributes. The user can reproduce the model by passing that seed and the same data and parameters (on the same platform). But all this means is that the Object Detection case would always use the seeded version, since there's always a seed by the time you get to that part of the implementation. |
|
I've suggested in #3060 that irreproducible model is not really useful for model development. I have worked on optimizing |
|
@guihao-liang - I've created #3187 to track adding a seed parameter to SArray/SFrame shuffle. |
|
A model that relies on the order of input isn't robust. For example, SGD in matrix factorization with multiple threads is non-deterministic, and generally most stuff on the GPU isn't super deterministic anyway. In small test cases, this may make sense, but a fast non-deterministic shuffle is definitely useful. |
|
SGD uses seed too. So for the same seeded SGD, it's also deterministic because it also uses pseudo-random. I don't know about matrix factorization. But for standard back-prop (RNN, CNN), everything is deterministic. The non-determinism comes from float computation errors and partial derivative approximation errors from singular functions such as ReLu. Determinism is important for education purposes. I took courses that teach DL, and it's important for students to reproduce what's taught by the lectures. Most important use is that random split the data set into training and testing. If every time the input is different, how could students check whether each step they are doing right since DL is an overall complicated process. I know turicreate is also used for educational purposes and I think determinism is important at least for this case. |
|
The non-randomness in our inner SGD routines (not the deep learning ones) comes from parallelism, so it's inherently non-deterministic based on . Determinism can be a trade-off with speed and accuracy in a lot of cases (such as this one with group-by), so it should be up to the user what the balance should be for their particular use case. I don't see a right answer here. |
Implements: #3123.