JuliaLang · fredrikekre · May 9, 2019 · Apr 21, 2019 · May 5, 2019 · May 8, 2019
diff --git a/stdlib/Random/docs/src/index.md b/stdlib/Random/docs/src/index.md
@@ -80,18 +80,48 @@ in order to support usual types of generated values.
 
 ### Generating random values of custom types
 
-There are two categories: generating values from a type (e.g. `rand(Int)`), or from a collection (e.g. `rand(1:3)`).
-The simple cases are explained first, and more advanced usage is presented later.
-We assume here that the choice of algorithm is independent of the RNG, so we use `AbstractRNG` in our signatures.
+Generating random values for some distributions may involve various trade-offs. *Pre-computed* values, such as an [alias table](https://en.wikipedia.org/wiki/Alias_method) for discrete distributions, or [“squeezing” functions](https://en.wikipedia.org/wiki/Rejection_sampling) for univariate distributions, can speed up sampling considerably. How much information should be pre-computed can depend on the number of values we plan to draw from a distribution. Also, some random number generators can have certain properties that various algorithms may want to exploit.
+
+The `Random` module defines a customizable framework for obtaining random values that can address these issues. Each invocation of `rand` generates a *sampler* which can be customized with the above trade-offs in mind, by adding methods to `Sampler`, which in turn can dispatch on the random number generator, the object that characterizes the distribution, and a suggestion for the number of repetitions. Currently, for the latter, `Val{1}` (for a single sample) and `Val{Inf}` (for an arbitrary number) are used, with `Random.Repetition` an alias for both.
+
+The object returned by `Sampler` is then used to generate the random values, by a method of `rand` defined for this purpose. Samplers can be arbitrary values, but for most applications the following predefined samplers may be sufficient:
+
+1. `SamplerType{T}()` can be used for implementing samplers that draw from `T` (e.g. `rand(Int)`),
+
+2. `SamplerTrivial(self)` is a simple wrapper for `self`, which can be accessed with `[]`. This is the recommended sampler when no pre-computed information is needed (e.g. `rand(1:3)`)
+
+3. `SamplerSimple(self, data)` also contains the additional `data` field, which can be used to store arbitrary pre-computed values.
+
+In addition, [`Random.gentype`](@ref) should be defined so that element types can be computed for containers.
+
+We provide examples for each of these. We assume here that the choice of algorithm is independent of the RNG, so we use `AbstractRNG` in our signatures.
+
+```@docs
+Random.Sampler
+Random.SamplerType
+Random.SamplerTrivial
+Random.SamplerSimple
+Random.gentype
+```
+
+Decoupling pre-computation from actually generating the values is part of the API, and is also available to the user. As an example, assume that `rand(rng, 1:20)` has to be called repeatedly in a loop: the way to take advantage of this decoupling is as follows:
+
+```julia
+rng = MersenneTwister()
+sp = Random.Sampler(rng, 1:20) # or Random.Sampler(MersenneTwister,1:20)
+for x in X
+    n = rand(rng, sp) # similar to n = rand(rng, 1:20)
+    # use n
+end
+```
+
+This is the mechanism that is also used in the standard library, e.g. by the default implementation of random array generation (like in `rand(1:20, 10)`).
 
 #### Generating values from a type
 
-Given a type `T`, it's currently assumed that if `rand(T)` is defined, an object of type `T` will be produced.
-In order to define random generation of values of type `T`, the following method can be defined:
-`rand(rng::AbstractRNG, ::Random.SamplerType{T})` (this should return what `rand(rng, T)` is expected to return).
+Given a type `T`, it's currently assumed that if `rand(T)` is defined, an object of type `T` will be produced. `SamplerType` is the default sampler for types. In order to define random generation of values of type `T`, the `rand(rng::AbstractRNG, ::Random.SamplerType{T})` method should be defined, and should return values what `rand(rng, T)` is expected to return. `Random.gentype` does not need to be defined explicitly for this case, as the fallback method works.
 
-Let's take the following example: we implement a `Die` type, with a variable number `n` of sides, numbered from `1` to `n`.
-We want `rand(Die)` to produce a die with a random number of up to 20 sides (and at least 4):
+Let's take the following example: we implement a `Die` type, with a variable number `n` of sides, numbered from `1` to `n`. We want `rand(Die)` to produce a `Die` with a random number of up to 20 sides (and at least 4):
 
 ```jldoctest Die
 struct Die
@@ -126,12 +156,11 @@ julia> a = Vector{Die}(undef, 3); rand!(a)
  Die(8)
 ```
 
-#### Generating values from a collection
+#### A simple sampler without pre-computed data
+
+Here we define a sampler for a collection. If no specialized algorithm or pre-computed data is required, it may be simplest to just return a random element, which can be implemented with a `SamplerTrivial` sampler, which is in fact the *default* fallback for values.
 
-Given a collection type `S`, it's currently assumed that if `rand(::S)` is defined, an object of type `eltype(S)` will be produced.
-In order to define random generation out of objects of type `S`, the following method can be defined:
-`rand(rng::AbstractRNG, sp::Random.SamplerTrivial{S})`. Here, `sp` simply wraps an object of type `S`, which can be accessed via `sp[]`.
-Continuing the `Die` example, we want now to define `rand(d::Die)` to produce an `Int` corresponding to one of `d`'s sides:
+In order to define random generation out of objects of type `S`, the following method should be defined: `rand(rng::AbstractRNG, sp::Random.SamplerTrivial{S})`. Here, `sp` simply wraps an object of type `S`, which can be accessed via `sp[]`. Continuing the `Die` example, we want now to define `rand(d::Die)` to produce an `Int` corresponding to one of `d`'s sides:
 
 ```jldoctest Die; setup = :(Random.seed!(1))
 julia> Random.rand(rng::AbstractRNG, d::Random.SamplerTrivial{Die}) = rand(rng, 1:d[].nsides);
@@ -146,38 +175,46 @@ julia> rand(Die(4), 3)
  2
 ```
 
-In the last example, a `Vector{Any}` is produced; the reason is that `eltype(Die) == Any`. The remedy is to define
-`Base.eltype(::Type{Die}) = Int`.
+Given a collection type `S`, it's currently assumed that if `rand(::S)` is defined, an object of type `eltype(S)` will be produced. In the last example, a `Vector{Any}` is produced; the reason is that `eltype(Die) == Any`. The remedy is to define `Base.eltype(::Type{Die}) = Int`.
 
+A `SamplerTrivial` does not have to wrap the original object. For example, in `Random`, `AbstractFloat` types are special-cased, because by default random values are not produced in the whole type domain, but rather in `[0,1)`.
 
-#### Generating values for an `AbstractFloat` type
-
-`AbstractFloat` types are special-cased, because by default random values are not produced in the whole type domain, but rather
-in `[0,1)`. The following method should be implemented for `T <: AbstractFloat`:
-`Random.rand(::AbstractRNG, ::Random.SamplerTrivial{Random.CloseOpen01{T}})`
+Consequently, a method
+```julia
+Sampler(::Type{RNG}, ::Type{T}, n::Repetition) where {RNG<:AbstractRNG,T<:AbstractFloat} =
+    Sampler(RNG, CloseOpen01(T), n)
+```
+is defined to return `SamplerTrivial` with a `Random.CloseOpen01{T}}` type defined for this purpose, which has an appropriate `rand` method defined for it.
 
+#### An optimized sampler with pre-computed data
 
-#### Optimizing generation with cached computation between calls
+Consider a discrete distribution, where numbers `1:n` are drawn with given probabilities that some to one. When many values are needed from this distribution, the fastest method if using an [alias table](https://en.wikipedia.org/wiki/Alias_method). We don't provide the algorithm for building such a table here, but suppose it is available in `make_alias_table(probabilities)` instead, and `draw_number(rng, alias_table)` can be used to draw a random number from it.
 
-When repeatedly generating random values (with the same `rand` parameters), it happens for some types
-that the result of a computation is used for each call. In this case, the computation can be decoupled
-from actually generating the values. This is the case for example with the default implementation for
-`AbstractArray`. Assume that `rand(rng, 1:20)` has to be called repeatedly in a loop: the way to take advantage
-of this decoupling is as follows:
+Suppose that the distribution is described by
+```julia
+struct DiscreteDistribution{V <: AbstractVector}
+    probabilities::V
+end
+```
+and that we *always* want to build an a alias table, regardless of the number of values needed (we learn how to customize this below). The methods
+```julia
+Random.gentype(::Type{<:DiscreteDistribution}) = Int
 
+function Random.Sampler(::AbstractRng, distribution::DiscreteDistribution, ::Repetition)
+    SamplerSimple(disribution, make_alias_table(probabilities))
+end
+```
+should be defined to return a sampler with pre-computed data, then
 ```julia
-rng = MersenneTwister()
-sp = Random.Sampler(rng, 1:20) # or Random.Sampler(MersenneTwister,1:20)
-for x in X
-    n = rand(rng, sp) # similar to n = rand(rng, 1:20)
-    # use n
+function rand(rng::AbstractRNG, sampler::SamplerSimple{<:DiscreteDistribution})
+    draw_number(rng, sampler.data)
 end
 ```
+will be used to draw the values.
+
+#### Custom sampler types
 
-This mechanism is of course used by the default implementation of random array generation (like in `rand(1:20, 10)`).
-In order to implement this decoupling for a custom type, a helper type can be used.
-Going back to our `Die` example: `rand(::Die)` uses random generation from a range, so
-there is an opportunity for this optimization:
+In order to implement this decoupling for a custom type, a custom helper type can be used. Going back to our `Die` example: `rand(::Die)` uses random generation from a range, so there is an opportunity for this optimization:
 
 ```julia
 import Random: Sampler, rand
@@ -194,10 +231,9 @@ Sampler(RNG::Type{<:AbstractRNG}, die::Die, r::Random.Repetition) =
 rand(rng::AbstractRNG, sp::SamplerDie) = rand(rng, sp.sp)
 ```
 
-It's now possible to get a sampler with `sp = Sampler(rng, die)`, and use `sp` instead of `die` in any `rand` call involving `rng`.
-In the simplistic example above, `die` doesn't need to be stored in `SamplerDie` but this is often the case in practice.
+It's now possible to get a sampler with `sp = Sampler(rng, die)`, and use `sp` instead of `die` in any `rand` call involving `rng`. In the simplistic example above, `die` doesn't need to be stored in `SamplerDie` but this is often the case in practice.
 
-This pattern is so frequent that a helper type named `Random.SamplerSimple` is available,
+This pattern is so frequent that a helper type used above, named `Random.SamplerSimple`, is available,
 saving us the definition of `SamplerDie`: we could have implemented our decoupling with:
 
 ```julia
@@ -228,8 +264,7 @@ Sampler(RNG::Type{<:AbstractRNG}, die::Die, ::Val{1}) = SamplerDie1(...)
 Sampler(RNG::Type{<:AbstractRNG}, die::Die, ::Val{Inf}) = SamplerDieMany(...)
 ```
 
-Of course, `rand` must also be defined on those types (i.e. `rand(::AbstractRNG, ::SamplerDie1)`
-and `rand(::AbstractRNG, ::SamplerDieMany)`).
+Of course, `rand` must also be defined on those types (i.e. `rand(::AbstractRNG, ::SamplerDie1)` and `rand(::AbstractRNG, ::SamplerDieMany)`). Alternatively, `SamplerTrivial` and `SamplerSimple` can be used if custom types are not necessary.
 
 Note: `Sampler(rng, x)` is simply a shorthand for `Sampler(rng, x, Val(Inf))`, and
 `Random.Repetition` is an alias for `Union{Val{1}, Val{Inf}}`.

diff --git a/stdlib/Random/src/Random.jl b/stdlib/Random/src/Random.jl
@@ -137,6 +137,23 @@ const Repetition = Union{Val{1},Val{Inf}}
 # Sampler(::AbstractRNG, X, ::Val{Inf}) = Sampler(X)
 # Sampler(::AbstractRNG, ::Type{X}, ::Val{Inf}) where {X} = Sampler(X)
 
+"""
+    Sampler(rng, x, repetition = Val(Inf))
+
+Return a sampler object that can be used to generate random values from `rng` for `x`.
+
+When `s = Sampler(rng, x, repetition)`, `rand(rng, s)` will be used to draw random values,
+and should be defined accordingly.
+
+`repetition` can be `Val(1)` or `Val(Inf)`, and should be used as a suggestion for deciding
+the amount of precomputation, if applicable.
+
+[`Random.SamplerType`](@ref) and [`Random.SamplerTrivial`](@ref) are default fallbacks for
+*types* and *values*, respectively. [`Random.SamplerSimple`](@ref) can be used to store
+pre-computed values without defining extra types for only this purpose.
+
+See also [`Random.gentype`](@ref), which should be defined for `typeof(x)`.
+"""
 Sampler(rng::AbstractRNG, x, r::Repetition=Val(Inf)) = Sampler(typeof(rng), x, r)
 Sampler(rng::AbstractRNG, ::Type{X}, r::Repetition=Val(Inf)) where {X} = Sampler(typeof(rng), X, r)
 
@@ -149,18 +166,30 @@ Sampler(::Type{RNG}, ::Type{X}) where {RNG<:AbstractRNG,X} = Sampler(RNG, X, Val
 
 #### pre-defined useful Sampler types
 
-# default fall-back for types
+"""
+    SamplerType{T}()
+
+A sampler for types, containing no other information. The default fallback for `Sampler`
+when called with types. `Random.gentype` is not used.
+"""
 struct SamplerType{T} <: Sampler{T} end
 
 Sampler(::Type{<:AbstractRNG}, ::Type{T}, ::Repetition) where {T} = SamplerType{T}()
 
 Base.getindex(::SamplerType{T}) where {T} = T
 
-# default fall-back for values
 struct SamplerTrivial{T,E} <: Sampler{E}
     self::T
 end
 
+"""
+    SamplerTrivial(x)
+
+Create a sampler that just wraps the given value `x`, with the type information from
+`Random.gentype(x)` (which should be defined).
+
+The recommended use case is sampling from values without precomputed data.
+"""
 SamplerTrivial(x::T) where {T} = SamplerTrivial{T,gentype(T)}(x)
 
 Sampler(::Type{<:AbstractRNG}, x, ::Repetition) = SamplerTrivial(x)
@@ -173,6 +202,15 @@ struct SamplerSimple{T,S,E} <: Sampler{E}
     data::S
 end
 
+"""
+    SamplerSimple(x, data)
+
+Create a sampler that wraps the given value `x` and the `data`, with the type information
+from `Random.gentype(x)` (which should be defined). This is the default fall-back for
+values.
+
+The recommended use case is sampling from values with precomputed data.
+"""
 SamplerSimple(x::T, data::S) where {T,S} = SamplerSimple{T,S,gentype(T)}(x, data)
 
 Base.getindex(sp::SamplerSimple) = sp.self