Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 75 additions & 40 deletions stdlib/Random/docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,18 +80,48 @@ in order to support usual types of generated values.

### Generating random values of custom types

There are two categories: generating values from a type (e.g. `rand(Int)`), or from a collection (e.g. `rand(1:3)`).
The simple cases are explained first, and more advanced usage is presented later.
We assume here that the choice of algorithm is independent of the RNG, so we use `AbstractRNG` in our signatures.
Generating random values for some distributions may involve various trade-offs. *Pre-computed* values, such as an [alias table](https://en.wikipedia.org/wiki/Alias_method) for discrete distributions, or [“squeezing” functions](https://en.wikipedia.org/wiki/Rejection_sampling) for univariate distributions, can speed up sampling considerably. How much information should be pre-computed can depend on the number of values we plan to draw from a distribution. Also, some random number generators can have certain properties that various algorithms may want to exploit.

The `Random` module defines a customizable framework for obtaining random values that can address these issues. Each invocation of `rand` generates a *sampler* which can be customized with the above trade-offs in mind, by adding methods to `Sampler`, which in turn can dispatch on the random number generator, the object that characterizes the distribution, and a suggestion for the number of repetitions. Currently, for the latter, `Val{1}` (for a single sample) and `Val{Inf}` (for an arbitrary number) are used, with `Random.Repetition` an alias for both.

The object returned by `Sampler` is then used to generate the random values, by a method of `rand` defined for this purpose. Samplers can be arbitrary values, but for most applications the following predefined samplers may be sufficient:

1. `SamplerType{T}()` can be used for implementing samplers that draw from `T` (e.g. `rand(Int)`),

2. `SamplerTrivial(self)` is a simple wrapper for `self`, which can be accessed with `[]`. This is the recommended sampler when no pre-computed information is needed (e.g. `rand(1:3)`)

3. `SamplerSimple(self, data)` also contains the additional `data` field, which can be used to store arbitrary pre-computed values.

In addition, [`Random.gentype`](@ref) should be defined so that element types can be computed for containers.

We provide examples for each of these. We assume here that the choice of algorithm is independent of the RNG, so we use `AbstractRNG` in our signatures.

```@docs
Random.Sampler
Random.SamplerType
Random.SamplerTrivial
Random.SamplerSimple
Random.gentype
```

Decoupling pre-computation from actually generating the values is part of the API, and is also available to the user. As an example, assume that `rand(rng, 1:20)` has to be called repeatedly in a loop: the way to take advantage of this decoupling is as follows:

```julia
rng = MersenneTwister()
sp = Random.Sampler(rng, 1:20) # or Random.Sampler(MersenneTwister,1:20)
for x in X
n = rand(rng, sp) # similar to n = rand(rng, 1:20)
# use n
end
```

This is the mechanism that is also used in the standard library, e.g. by the default implementation of random array generation (like in `rand(1:20, 10)`).

#### Generating values from a type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it was more of less implied, a clear statement could be made, right before this paragraph or earlier, along those lines: for implementors, two methods must in principle be defined: Sampler and rand. But when the default is sufficient (SamplerType and SamplerTrivial), only rand has to be defined (I trust you will find a nicer way to put it ;-) )


Given a type `T`, it's currently assumed that if `rand(T)` is defined, an object of type `T` will be produced.
In order to define random generation of values of type `T`, the following method can be defined:
`rand(rng::AbstractRNG, ::Random.SamplerType{T})` (this should return what `rand(rng, T)` is expected to return).
Given a type `T`, it's currently assumed that if `rand(T)` is defined, an object of type `T` will be produced. `SamplerType` is the default sampler for types. In order to define random generation of values of type `T`, the `rand(rng::AbstractRNG, ::Random.SamplerType{T})` method should be defined, and should return values what `rand(rng, T)` is expected to return. `Random.gentype` does not need to be defined explicitly for this case, as the fallback method works.

Let's take the following example: we implement a `Die` type, with a variable number `n` of sides, numbered from `1` to `n`.
We want `rand(Die)` to produce a die with a random number of up to 20 sides (and at least 4):
Let's take the following example: we implement a `Die` type, with a variable number `n` of sides, numbered from `1` to `n`. We want `rand(Die)` to produce a `Die` with a random number of up to 20 sides (and at least 4):

```jldoctest Die
struct Die
Expand Down Expand Up @@ -126,12 +156,11 @@ julia> a = Vector{Die}(undef, 3); rand!(a)
Die(8)
```

#### Generating values from a collection
#### A simple sampler without pre-computed data

Here we define a sampler for a collection. If no specialized algorithm or pre-computed data is required, it may be simplest to just return a random element, which can be implemented with a `SamplerTrivial` sampler, which is in fact the *default* fallback for values.

Given a collection type `S`, it's currently assumed that if `rand(::S)` is defined, an object of type `eltype(S)` will be produced.
In order to define random generation out of objects of type `S`, the following method can be defined:
`rand(rng::AbstractRNG, sp::Random.SamplerTrivial{S})`. Here, `sp` simply wraps an object of type `S`, which can be accessed via `sp[]`.
Continuing the `Die` example, we want now to define `rand(d::Die)` to produce an `Int` corresponding to one of `d`'s sides:
In order to define random generation out of objects of type `S`, the following method should be defined: `rand(rng::AbstractRNG, sp::Random.SamplerTrivial{S})`. Here, `sp` simply wraps an object of type `S`, which can be accessed via `sp[]`. Continuing the `Die` example, we want now to define `rand(d::Die)` to produce an `Int` corresponding to one of `d`'s sides:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason why you merge lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this was accidental. Should I revert this in the next PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I think better to leave it like this now, unless you touch again this part. (I have a slight personal preference for split lines, as it's easier to read on my editor and makes it sometimes easier for git diffs and for commenting on github).


```jldoctest Die; setup = :(Random.seed!(1))
julia> Random.rand(rng::AbstractRNG, d::Random.SamplerTrivial{Die}) = rand(rng, 1:d[].nsides);
Expand All @@ -146,38 +175,46 @@ julia> rand(Die(4), 3)
2
```

In the last example, a `Vector{Any}` is produced; the reason is that `eltype(Die) == Any`. The remedy is to define
`Base.eltype(::Type{Die}) = Int`.
Given a collection type `S`, it's currently assumed that if `rand(::S)` is defined, an object of type `eltype(S)` will be produced. In the last example, a `Vector{Any}` is produced; the reason is that `eltype(Die) == Any`. The remedy is to define `Base.eltype(::Type{Die}) = Int`.

A `SamplerTrivial` does not have to wrap the original object. For example, in `Random`, `AbstractFloat` types are special-cased, because by default random values are not produced in the whole type domain, but rather in `[0,1)`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be a bit confusing. In order to "forward" a sampler, you have to write a Sampler method (even if it returns a SamplerTrivial). As at this point we showed only the defaults, I'm afraid that "does not have to wrap the original object." doesn't make sense to the reader who may not have grasped the "define a sampler" part yet (it would be interesting if someone not familiar with the framework could comment on that). Moreover this is a very marginal use-case (even in the example of floats, we don't call SamplerTrivial directly, but the generic Sampler which itself (only) defaults to SamplerTrivial: e.g. Sampler(rng, BigFloat) doesn't return a SamplerTrivial).


#### Generating values for an `AbstractFloat` type

`AbstractFloat` types are special-cased, because by default random values are not produced in the whole type domain, but rather
in `[0,1)`. The following method should be implemented for `T <: AbstractFloat`:
`Random.rand(::AbstractRNG, ::Random.SamplerTrivial{Random.CloseOpen01{T}})`
Consequently, a method
```julia
Sampler(::Type{RNG}, ::Type{T}, n::Repetition) where {RNG<:AbstractRNG,T<:AbstractFloat} =
Sampler(RNG, CloseOpen01(T), n)
```
is defined to return `SamplerTrivial` with a `Random.CloseOpen01{T}}` type defined for this purpose, which has an appropriate `rand` method defined for it.

#### An optimized sampler with pre-computed data

#### Optimizing generation with cached computation between calls
Consider a discrete distribution, where numbers `1:n` are drawn with given probabilities that some to one. When many values are needed from this distribution, the fastest method if using an [alias table](https://en.wikipedia.org/wiki/Alias_method). We don't provide the algorithm for building such a table here, but suppose it is available in `make_alias_table(probabilities)` instead, and `draw_number(rng, alias_table)` can be used to draw a random number from it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"with given probabilities that some to one" : is there a typo?
"the fastest method if using" -> "is"
I don't know the alias table method, but if it's the fastest method, I would wonder why it's not implemented in Random...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing these in the upcoming PR.

The alias table method is only needed for non-equal probabilities from a discrete/categorical distribution. It is used in Distributions. Random has no such sampler AFAIK.


When repeatedly generating random values (with the same `rand` parameters), it happens for some types
that the result of a computation is used for each call. In this case, the computation can be decoupled
from actually generating the values. This is the case for example with the default implementation for
`AbstractArray`. Assume that `rand(rng, 1:20)` has to be called repeatedly in a loop: the way to take advantage
of this decoupling is as follows:
Suppose that the distribution is described by
```julia
struct DiscreteDistribution{V <: AbstractVector}
probabilities::V
end
```
and that we *always* want to build an a alias table, regardless of the number of values needed (we learn how to customize this below). The methods
```julia
Random.gentype(::Type{<:DiscreteDistribution}) = Int

function Random.Sampler(::AbstractRng, distribution::DiscreteDistribution, ::Repetition)
SamplerSimple(disribution, make_alias_table(probabilities))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make_alias_table(distribution.probabilities)

end
```
should be defined to return a sampler with pre-computed data, then
```julia
rng = MersenneTwister()
sp = Random.Sampler(rng, 1:20) # or Random.Sampler(MersenneTwister,1:20)
for x in X
n = rand(rng, sp) # similar to n = rand(rng, 1:20)
# use n
function rand(rng::AbstractRNG, sampler::SamplerSimple{<:DiscreteDistribution})
draw_number(rng, sampler.data)
end
```
will be used to draw the values.

#### Custom sampler types

This mechanism is of course used by the default implementation of random array generation (like in `rand(1:20, 10)`).
In order to implement this decoupling for a custom type, a helper type can be used.
Going back to our `Die` example: `rand(::Die)` uses random generation from a range, so
there is an opportunity for this optimization:
In order to implement this decoupling for a custom type, a custom helper type can be used. Going back to our `Die` example: `rand(::Die)` uses random generation from a range, so there is an opportunity for this optimization:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should change "In order to implement this decoupling for a custom type", as you already showed how to do the decoupling for a custom type... maybe add "in cases where SamplerSimple is not suitable? (but the problem then is that we explain SamplerSimple was suitable in our Die example!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to clarify this, I hope the improved wording is clear.


```julia
import Random: Sampler, rand
Expand All @@ -194,10 +231,9 @@ Sampler(RNG::Type{<:AbstractRNG}, die::Die, r::Random.Repetition) =
rand(rng::AbstractRNG, sp::SamplerDie) = rand(rng, sp.sp)
```

It's now possible to get a sampler with `sp = Sampler(rng, die)`, and use `sp` instead of `die` in any `rand` call involving `rng`.
In the simplistic example above, `die` doesn't need to be stored in `SamplerDie` but this is often the case in practice.
It's now possible to get a sampler with `sp = Sampler(rng, die)`, and use `sp` instead of `die` in any `rand` call involving `rng`. In the simplistic example above, `die` doesn't need to be stored in `SamplerDie` but this is often the case in practice.

This pattern is so frequent that a helper type named `Random.SamplerSimple` is available,
This pattern is so frequent that a helper type used above, named `Random.SamplerSimple`, is available,
saving us the definition of `SamplerDie`: we could have implemented our decoupling with:

```julia
Expand Down Expand Up @@ -228,8 +264,7 @@ Sampler(RNG::Type{<:AbstractRNG}, die::Die, ::Val{1}) = SamplerDie1(...)
Sampler(RNG::Type{<:AbstractRNG}, die::Die, ::Val{Inf}) = SamplerDieMany(...)
```

Of course, `rand` must also be defined on those types (i.e. `rand(::AbstractRNG, ::SamplerDie1)`
and `rand(::AbstractRNG, ::SamplerDieMany)`).
Of course, `rand` must also be defined on those types (i.e. `rand(::AbstractRNG, ::SamplerDie1)` and `rand(::AbstractRNG, ::SamplerDieMany)`). Alternatively, `SamplerTrivial` and `SamplerSimple` can be used if custom types are not necessary.

Note: `Sampler(rng, x)` is simply a shorthand for `Sampler(rng, x, Val(Inf))`, and
`Random.Repetition` is an alias for `Union{Val{1}, Val{Inf}}`.
Expand Down
42 changes: 40 additions & 2 deletions stdlib/Random/src/Random.jl
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,23 @@ const Repetition = Union{Val{1},Val{Inf}}
# Sampler(::AbstractRNG, X, ::Val{Inf}) = Sampler(X)
# Sampler(::AbstractRNG, ::Type{X}, ::Val{Inf}) where {X} = Sampler(X)

"""
Sampler(rng, x, repetition = Val(Inf))

Return a sampler object that can be used to generate random values from `rng` for `x`.

When `s = Sampler(rng, x, repetition)`, `rand(rng, s)` will be used to draw random values,
and should be defined accordingly.

`repetition` can be `Val(1)` or `Val(Inf)`, and should be used as a suggestion for deciding
the amount of precomputation, if applicable.

[`Random.SamplerType`](@ref) and [`Random.SamplerTrivial`](@ref) are default fallbacks for
*types* and *values*, respectively. [`Random.SamplerSimple`](@ref) can be used to store
pre-computed values without defining extra types for only this purpose.

See also [`Random.gentype`](@ref), which should be defined for `typeof(x)`.
"""
Sampler(rng::AbstractRNG, x, r::Repetition=Val(Inf)) = Sampler(typeof(rng), x, r)
Sampler(rng::AbstractRNG, ::Type{X}, r::Repetition=Val(Inf)) where {X} = Sampler(typeof(rng), X, r)

Expand All @@ -149,18 +166,30 @@ Sampler(::Type{RNG}, ::Type{X}) where {RNG<:AbstractRNG,X} = Sampler(RNG, X, Val

#### pre-defined useful Sampler types

# default fall-back for types
"""
SamplerType{T}()

A sampler for types, containing no other information. The default fallback for `Sampler`
when called with types. `Random.gentype` is not used.
"""
struct SamplerType{T} <: Sampler{T} end

Sampler(::Type{<:AbstractRNG}, ::Type{T}, ::Repetition) where {T} = SamplerType{T}()

Base.getindex(::SamplerType{T}) where {T} = T

# default fall-back for values
struct SamplerTrivial{T,E} <: Sampler{E}
self::T
end

"""
SamplerTrivial(x)

Create a sampler that just wraps the given value `x`, with the type information from
`Random.gentype(x)` (which should be defined).

The recommended use case is sampling from values without precomputed data.
"""
SamplerTrivial(x::T) where {T} = SamplerTrivial{T,gentype(T)}(x)

Sampler(::Type{<:AbstractRNG}, x, ::Repetition) = SamplerTrivial(x)
Expand All @@ -173,6 +202,15 @@ struct SamplerSimple{T,S,E} <: Sampler{E}
data::S
end

"""
SamplerSimple(x, data)

Create a sampler that wraps the given value `x` and the `data`, with the type information
from `Random.gentype(x)` (which should be defined). This is the default fall-back for
values.

The recommended use case is sampling from values with precomputed data.
"""
SamplerSimple(x::T, data::S) where {T,S} = SamplerSimple{T,S,gentype(T)}(x, data)

Base.getindex(sp::SamplerSimple) = sp.self
Expand Down