-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Expand documentation of custom random samplers. #31787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
cac6d62
ad865ce
49961bf
daf11a5
9902f8d
e5a48a1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -80,18 +80,48 @@ in order to support usual types of generated values. | |
|
|
||
| ### Generating random values of custom types | ||
|
|
||
| There are two categories: generating values from a type (e.g. `rand(Int)`), or from a collection (e.g. `rand(1:3)`). | ||
| The simple cases are explained first, and more advanced usage is presented later. | ||
| We assume here that the choice of algorithm is independent of the RNG, so we use `AbstractRNG` in our signatures. | ||
| Generating random values for some distributions may involve various trade-offs. *Pre-computed* values, such as an [alias table](https://en.wikipedia.org/wiki/Alias_method) for discrete distributions, or [“squeezing” functions](https://en.wikipedia.org/wiki/Rejection_sampling) for univariate distributions, can speed up sampling considerably. How much information should be pre-computed can depend on the number of values we plan to draw from a distribution. Also, some random number generators can have certain properties that various algorithms may want to exploit. | ||
|
|
||
| The `Random` module defines a customizable framework for obtaining random values that can address these issues. Each invocation of `rand` generates a *sampler* which can be customized with the above trade-offs in mind, by adding methods to `Sampler`, which in turn can dispatch on the random number generator, the object that characterizes the distribution, and a suggestion for the number of repetitions. Currently, for the latter, `Val{1}` (for a single sample) and `Val{Inf}` (for an arbitrary number) are used, with `Random.Repetition` an alias for both. | ||
|
|
||
| The object returned by `Sampler` is then used to generate the random values, by a method of `rand` defined for this purpose. Samplers can be arbitrary values, but for most applications the following predefined samplers may be sufficient: | ||
|
|
||
| 1. `SamplerType{T}()` can be used for implementing samplers that draw from `T` (e.g. `rand(Int)`), | ||
|
|
||
| 2. `SamplerTrivial(self)` is a simple wrapper for `self`, which can be accessed with `[]`. This is the recommended sampler when no pre-computed information is needed (e.g. `rand(1:3)`) | ||
tpapp marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| 3. `SamplerSimple(self, data)` also contains the additional `data` field, which can be used to store arbitrary pre-computed values. | ||
|
|
||
| In addition, [`Random.gentype`](@ref) should be defined so that element types can be computed for containers. | ||
tpapp marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| We provide examples for each of these. We assume here that the choice of algorithm is independent of the RNG, so we use `AbstractRNG` in our signatures. | ||
|
|
||
| ```@docs | ||
| Random.Sampler | ||
| Random.SamplerType | ||
| Random.SamplerTrivial | ||
| Random.SamplerSimple | ||
| Random.gentype | ||
| ``` | ||
|
|
||
| Decoupling pre-computation from actually generating the values is part of the API, and is also available to the user. As an example, assume that `rand(rng, 1:20)` has to be called repeatedly in a loop: the way to take advantage of this decoupling is as follows: | ||
|
|
||
| ```julia | ||
| rng = MersenneTwister() | ||
| sp = Random.Sampler(rng, 1:20) # or Random.Sampler(MersenneTwister,1:20) | ||
tpapp marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| for x in X | ||
| n = rand(rng, sp) # similar to n = rand(rng, 1:20) | ||
| # use n | ||
| end | ||
| ``` | ||
|
|
||
| This is the mechanism that is also used in the standard library, e.g. by the default implementation of random array generation (like in `rand(1:20, 10)`). | ||
|
|
||
| #### Generating values from a type | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Even if it was more of less implied, a clear statement could be made, right before this paragraph or earlier, along those lines: for implementors, two methods must in principle be defined: |
||
|
|
||
| Given a type `T`, it's currently assumed that if `rand(T)` is defined, an object of type `T` will be produced. | ||
| In order to define random generation of values of type `T`, the following method can be defined: | ||
| `rand(rng::AbstractRNG, ::Random.SamplerType{T})` (this should return what `rand(rng, T)` is expected to return). | ||
| Given a type `T`, it's currently assumed that if `rand(T)` is defined, an object of type `T` will be produced. `SamplerType` is the default sampler for types. In order to define random generation of values of type `T`, the `rand(rng::AbstractRNG, ::Random.SamplerType{T})` method should be defined, and should return values what `rand(rng, T)` is expected to return. `Random.gentype` does not need to be defined explicitly for this case, as the fallback method works. | ||
|
|
||
rfourquet marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Let's take the following example: we implement a `Die` type, with a variable number `n` of sides, numbered from `1` to `n`. | ||
| We want `rand(Die)` to produce a die with a random number of up to 20 sides (and at least 4): | ||
| Let's take the following example: we implement a `Die` type, with a variable number `n` of sides, numbered from `1` to `n`. We want `rand(Die)` to produce a `Die` with a random number of up to 20 sides (and at least 4): | ||
|
|
||
| ```jldoctest Die | ||
| struct Die | ||
|
|
@@ -126,12 +156,11 @@ julia> a = Vector{Die}(undef, 3); rand!(a) | |
| Die(8) | ||
| ``` | ||
|
|
||
| #### Generating values from a collection | ||
| #### A simple sampler without pre-computed data | ||
|
|
||
| Here we define a sampler for a collection. If no specialized algorithm or pre-computed data is required, it may be simplest to just return a random element, which can be implemented with a `SamplerTrivial` sampler, which is in fact the *default* fallback for values. | ||
|
|
||
tpapp marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Given a collection type `S`, it's currently assumed that if `rand(::S)` is defined, an object of type `eltype(S)` will be produced. | ||
| In order to define random generation out of objects of type `S`, the following method can be defined: | ||
| `rand(rng::AbstractRNG, sp::Random.SamplerTrivial{S})`. Here, `sp` simply wraps an object of type `S`, which can be accessed via `sp[]`. | ||
| Continuing the `Die` example, we want now to define `rand(d::Die)` to produce an `Int` corresponding to one of `d`'s sides: | ||
| In order to define random generation out of objects of type `S`, the following method should be defined: `rand(rng::AbstractRNG, sp::Random.SamplerTrivial{S})`. Here, `sp` simply wraps an object of type `S`, which can be accessed via `sp[]`. Continuing the `Die` example, we want now to define `rand(d::Die)` to produce an `Int` corresponding to one of `d`'s sides: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a particular reason why you merge lines?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, this was accidental. Should I revert this in the next PR?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No I think better to leave it like this now, unless you touch again this part. (I have a slight personal preference for split lines, as it's easier to read on my editor and makes it sometimes easier for git diffs and for commenting on github). |
||
|
|
||
| ```jldoctest Die; setup = :(Random.seed!(1)) | ||
| julia> Random.rand(rng::AbstractRNG, d::Random.SamplerTrivial{Die}) = rand(rng, 1:d[].nsides); | ||
|
|
@@ -146,38 +175,46 @@ julia> rand(Die(4), 3) | |
| 2 | ||
| ``` | ||
|
|
||
| In the last example, a `Vector{Any}` is produced; the reason is that `eltype(Die) == Any`. The remedy is to define | ||
| `Base.eltype(::Type{Die}) = Int`. | ||
| Given a collection type `S`, it's currently assumed that if `rand(::S)` is defined, an object of type `eltype(S)` will be produced. In the last example, a `Vector{Any}` is produced; the reason is that `eltype(Die) == Any`. The remedy is to define `Base.eltype(::Type{Die}) = Int`. | ||
|
|
||
| A `SamplerTrivial` does not have to wrap the original object. For example, in `Random`, `AbstractFloat` types are special-cased, because by default random values are not produced in the whole type domain, but rather in `[0,1)`. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this can be a bit confusing. In order to "forward" a sampler, you have to write a |
||
|
|
||
| #### Generating values for an `AbstractFloat` type | ||
|
|
||
| `AbstractFloat` types are special-cased, because by default random values are not produced in the whole type domain, but rather | ||
| in `[0,1)`. The following method should be implemented for `T <: AbstractFloat`: | ||
| `Random.rand(::AbstractRNG, ::Random.SamplerTrivial{Random.CloseOpen01{T}})` | ||
| Consequently, a method | ||
| ```julia | ||
| Sampler(::Type{RNG}, ::Type{T}, n::Repetition) where {RNG<:AbstractRNG,T<:AbstractFloat} = | ||
| Sampler(RNG, CloseOpen01(T), n) | ||
| ``` | ||
| is defined to return `SamplerTrivial` with a `Random.CloseOpen01{T}}` type defined for this purpose, which has an appropriate `rand` method defined for it. | ||
|
|
||
| #### An optimized sampler with pre-computed data | ||
|
|
||
| #### Optimizing generation with cached computation between calls | ||
| Consider a discrete distribution, where numbers `1:n` are drawn with given probabilities that some to one. When many values are needed from this distribution, the fastest method if using an [alias table](https://en.wikipedia.org/wiki/Alias_method). We don't provide the algorithm for building such a table here, but suppose it is available in `make_alias_table(probabilities)` instead, and `draw_number(rng, alias_table)` can be used to draw a random number from it. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "with given probabilities that some to one" : is there a typo?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixing these in the upcoming PR. The alias table method is only needed for non-equal probabilities from a discrete/categorical distribution. It is used in |
||
|
|
||
| When repeatedly generating random values (with the same `rand` parameters), it happens for some types | ||
| that the result of a computation is used for each call. In this case, the computation can be decoupled | ||
| from actually generating the values. This is the case for example with the default implementation for | ||
| `AbstractArray`. Assume that `rand(rng, 1:20)` has to be called repeatedly in a loop: the way to take advantage | ||
| of this decoupling is as follows: | ||
| Suppose that the distribution is described by | ||
| ```julia | ||
| struct DiscreteDistribution{V <: AbstractVector} | ||
| probabilities::V | ||
| end | ||
| ``` | ||
| and that we *always* want to build an a alias table, regardless of the number of values needed (we learn how to customize this below). The methods | ||
| ```julia | ||
| Random.gentype(::Type{<:DiscreteDistribution}) = Int | ||
|
|
||
| function Random.Sampler(::AbstractRng, distribution::DiscreteDistribution, ::Repetition) | ||
| SamplerSimple(disribution, make_alias_table(probabilities)) | ||
|
||
| end | ||
| ``` | ||
| should be defined to return a sampler with pre-computed data, then | ||
| ```julia | ||
| rng = MersenneTwister() | ||
| sp = Random.Sampler(rng, 1:20) # or Random.Sampler(MersenneTwister,1:20) | ||
| for x in X | ||
| n = rand(rng, sp) # similar to n = rand(rng, 1:20) | ||
| # use n | ||
| function rand(rng::AbstractRNG, sampler::SamplerSimple{<:DiscreteDistribution}) | ||
tpapp marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| draw_number(rng, sampler.data) | ||
| end | ||
| ``` | ||
| will be used to draw the values. | ||
|
|
||
| #### Custom sampler types | ||
|
|
||
| This mechanism is of course used by the default implementation of random array generation (like in `rand(1:20, 10)`). | ||
| In order to implement this decoupling for a custom type, a helper type can be used. | ||
| Going back to our `Die` example: `rand(::Die)` uses random generation from a range, so | ||
| there is an opportunity for this optimization: | ||
| In order to implement this decoupling for a custom type, a custom helper type can be used. Going back to our `Die` example: `rand(::Die)` uses random generation from a range, so there is an opportunity for this optimization: | ||
|
||
|
|
||
| ```julia | ||
| import Random: Sampler, rand | ||
|
|
@@ -194,10 +231,9 @@ Sampler(RNG::Type{<:AbstractRNG}, die::Die, r::Random.Repetition) = | |
| rand(rng::AbstractRNG, sp::SamplerDie) = rand(rng, sp.sp) | ||
| ``` | ||
|
|
||
| It's now possible to get a sampler with `sp = Sampler(rng, die)`, and use `sp` instead of `die` in any `rand` call involving `rng`. | ||
| In the simplistic example above, `die` doesn't need to be stored in `SamplerDie` but this is often the case in practice. | ||
| It's now possible to get a sampler with `sp = Sampler(rng, die)`, and use `sp` instead of `die` in any `rand` call involving `rng`. In the simplistic example above, `die` doesn't need to be stored in `SamplerDie` but this is often the case in practice. | ||
|
|
||
| This pattern is so frequent that a helper type named `Random.SamplerSimple` is available, | ||
| This pattern is so frequent that a helper type used above, named `Random.SamplerSimple`, is available, | ||
| saving us the definition of `SamplerDie`: we could have implemented our decoupling with: | ||
|
|
||
| ```julia | ||
|
|
@@ -228,8 +264,7 @@ Sampler(RNG::Type{<:AbstractRNG}, die::Die, ::Val{1}) = SamplerDie1(...) | |
| Sampler(RNG::Type{<:AbstractRNG}, die::Die, ::Val{Inf}) = SamplerDieMany(...) | ||
| ``` | ||
|
|
||
| Of course, `rand` must also be defined on those types (i.e. `rand(::AbstractRNG, ::SamplerDie1)` | ||
| and `rand(::AbstractRNG, ::SamplerDieMany)`). | ||
| Of course, `rand` must also be defined on those types (i.e. `rand(::AbstractRNG, ::SamplerDie1)` and `rand(::AbstractRNG, ::SamplerDieMany)`). Alternatively, `SamplerTrivial` and `SamplerSimple` can be used if custom types are not necessary. | ||
tpapp marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Note: `Sampler(rng, x)` is simply a shorthand for `Sampler(rng, x, Val(Inf))`, and | ||
| `Random.Repetition` is an alias for `Union{Val{1}, Val{Inf}}`. | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.