Skip to content

Commit ca3100f

Browse files
bashtageKevin Sheppard
authored andcommitted
Merge pull request #245 from bashtage/qa-5
Qa 5
2 parents 0f005ad + 6225711 commit ca3100f

17 files changed

+1131
-233
lines changed

doc/source/change-log.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,11 +15,12 @@ Change Log
1515
maintained until after NumPy 1.21 (or 2 releases after NumPy 1.19) for users who
1616
cannot update NumPy.
1717

18-
Since v1.19.0
19-
=============
18+
v1.19.1
19+
=======
2020
- Added :class:`randomgen.romu.Romu` which is among the fastest available bit generators.
2121
- Added :func:`~randomgen.sfc.SFC64.weyl_increments` to simplify generating increments for
2222
use in parallel applications of :class:`~randomgen.sfc.SFC64`.
23+
- Completed * :ref:`quality-assurance` of all bit generators to at least 4TB.
2324

2425
v1.19.0
2526
=======

doc/source/testing.rst

Lines changed: 91 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,15 @@
1+
.. _quality-assurance:
2+
13
=================
24
Quality Assurance
35
=================
46

5-
A values below are the maximum output size where a bit generator or sequence of bit generators
6-
has passed PractRand_. A -- indicates that configuration is not relevant. Failures are marked
7-
with FAIL. Most bit generators were only tested in their default configuration.
7+
Core Testing
8+
------------
9+
10+
A values in the below are the maximum output size where a bit generator or sequence of
11+
bit generators has passed PractRand_. A -- indicates that configuration is not relevant.
12+
Failures are marked with FAIL. Most bit generators were only tested in their default configuration.
813
Non-default configurations are indicated by listing the keyword arguments to the bit generator.
914
Two sets of tests were performed. The first tested all configurations using 128GB of data using
1015
PractRand's extended set of tests and additional bit folding. The second set of tests used
@@ -16,11 +21,11 @@ initialized with the same 256-bits of entropy taken from random.org.
1621
.. include:: test-results.txt
1722

1823
Notes
19-
-----
24+
~~~~~
2025
¹ Failures at or before 128GB were generated by tests that used the expanded
2126
set of tests and extra bt folds (``-te 1`` and ``-tf 2``). Failures at sample
2227
sizes above 128GB were produces using the default configuration
23-
(``-te 0`` and ``-tf 0``).
28+
(``-te 0`` and ``-tf 1``).
2429

2530
² PCG64DXSM and PCG64(variant=dxsm) are identical and so the latter not separately reported.
2631

@@ -32,10 +37,8 @@ is required.
3237

3338
⁵ Identical output to the version included in NumPy 1.19.
3439

35-
.. _PractRand: http://pracrand.sourceforge.net/
36-
3740
Example Configuration
38-
---------------------
41+
~~~~~~~~~~~~~~~~~~~~~
3942
All configurations are constructed using the same template. The code below tests a
4043
configuration using 8,196 streams of :class:`~randomgen.aes.AESCounter`. The other
4144
configurations simply make changes to either ``JUMPED`` or ``STREAMS``.
@@ -66,3 +69,83 @@ configurations simply make changes to either ``JUMPED`` or ``STREAMS``.
6669
for child in SEED_SEQ.spawn(STREAMS):
6770
bit_gens.append(rg.AESCounter(child, **BIT_GENERATOR_KWARGS))
6871
output = 64
72+
73+
Additional Experiments
74+
----------------------
75+
The best practice for using any of the bit generators is to initialize
76+
a single :class:`~numpy.random.SeedSequence` with a reasonably random seed,
77+
and then to use this seed sequence to initialize all bit generators.
78+
Some additional experiments were used to check that the quality of output
79+
streams is not excessively sensitive to use that deviates from this best practice.
80+
81+
Correlated Seeds
82+
~~~~~~~~~~~~~~~~
83+
While the recommended practice is to use a :class:`~numpy.random.SeedSequence`,
84+
it is natural to worry about bad seeds. A common sequence of bad seeds are
85+
those which set a single bit to be non-zero: 1, 2, 4, 8, 16, and so on.
86+
By default, bit generators use a :class:`~numpy.random.SeedSequence` to transform
87+
seed values into an initial state for the bit generator.
88+
:class:`~numpy.random.SeedSequence` is itself a random number generator that always
89+
escapes low-entropy states -- that is, those with many 0s or 1s -- immediately.
90+
All bit generators were tested with 8 streams using seeds of the form :math:`2^i` for
91+
i in 0, 1, ..., 7. Only three bit generators failed this experiment: :class:`~randomgen.dsfmt.DSFMT`,
92+
:class:`~randomgen.mt19937.MT19937`, and :class:`~randomgen.sfmt.SFMT`. These are all
93+
members of the Mersenne Twister family which commonly fail ``BRank`` tests.
94+
95+
Sequential Seeds
96+
~~~~~~~~~~~~~~~~
97+
The recommended practice for constructing multiple :class:`~numpy.random.Generator`s
98+
is to use :class:`~numpy.random.SeedSequence`'s :func:`~numpy.random.SeedSequence.spawn`
99+
method.
100+
101+
::
102+
103+
from numpy.random import default_rng, Generator, SeedSequence
104+
from randomgen import Romu
105+
106+
NUM_STREAMS = 2**15
107+
seed_seq = SeedSequence(5897100938578919857511)
108+
# To use the default bit generator, which is not guaranteed to be stable
109+
generators = [default_rng(child) for child in seed_seq.spawn(NUM_STREAMS)]
110+
111+
# To use a specific bit generator
112+
generators = [Generator(Romu(child)) for child in seed_seq.spawn(NUM_STREAMS)]
113+
114+
It is common to see examples that use sequential seed that resemble:
115+
116+
::
117+
118+
generators = [default_rng(i) for i in range(NUM_STREAMS)]
119+
120+
This practice was examined with all bit generators using 8,196 streams
121+
seeded using 0, 1, 2, ..., 8,195 by intertwining the output of the
122+
generators. **None** of the generators failed these tests.
123+
124+
Zero (0) Seeding
125+
~~~~~~~~~~~~~~~~
126+
Bit generators use a :class:`~numpy.random.SeedSequence` that always
127+
escapes low-entropy states immediately to transform
128+
seed values into an initial state for the bit generator.
129+
To ensure that this is not an issue, all bit generators were tested using 4, 32 or 8196
130+
streams using 128GB in PractRand_ with expanded tests and extra folding. The table
131+
below reports **only** the configurations that failed. These were all Mersenne Twister-class
132+
generators and so failure is attributable to the bit generator and not the seeding.
133+
All other generators passed these tests.
134+
135+
136+
+--------------+---------------+----------------+------+
137+
| Streams | 4 | 32 | 8196 |
138+
+==============+===============+================+======+
139+
| DSFMT | FAIL at 64 GB | FAIL at 64 GB | -- |
140+
+--------------+---------------+----------------+------+
141+
| MT19937 | FAIL at 64 GB | FAIL at 64 GB | -- |
142+
+--------------+---------------+----------------+------+
143+
| SFMT | FAIL at 64 GB | FAIL at 64 GB | -- |
144+
+--------------+---------------+----------------+------+
145+
146+
The non-failures at 8196 are due to the relatively short length of each sequence tested since
147+
128GB shared across 8196 streams only samples :math:`2^{37}/(2^{13}\times2^{3})=2^{21}` values
148+
from each stream since each value is 8-bytes.
149+
150+
151+
.. _PractRand: http://pracrand.sourceforge.net/

randomgen/_seed_sequence.pyx

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -384,13 +384,22 @@ cdef class SeedSequence(object):
384384
-------
385385
entropy_array : 1D uint32 array
386386
"""
387-
# Convert run-entropy, program-entropy, and the spawn key into uint32
387+
# Convert run-entropy and the spawn key into uint32
388388
# arrays and concatenate them.
389389

390390
# We MUST have at least some run-entropy. The others are optional.
391391
assert self.entropy is not None
392392
run_entropy = _coerce_to_uint32_array(self.entropy)
393393
spawn_entropy = _coerce_to_uint32_array(self.spawn_key)
394+
if len(spawn_entropy) > 0 and len(run_entropy) < self.pool_size:
395+
# Explicitly fill out the entropy with 0s to the pool size to avoid
396+
# conflict with spawn keys. We changed this in 1.19.0 to fix
397+
# gh-16539. In order to preserve stream-compatibility with
398+
# unspawned SeedSequences with small entropy inputs, we only do
399+
# this when a spawn_key is specified.
400+
diff = self.pool_size - len(run_entropy)
401+
run_entropy = np.concatenate(
402+
[run_entropy, np.zeros(diff, dtype=np.uint32)])
394403
entropy_array = np.concatenate([run_entropy, spawn_entropy])
395404
return entropy_array
396405

randomgen/mtrand.pyx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -813,7 +813,7 @@ warnings.filterwarnings("ignore", "RandomState", FutureWarning)
813813
raise ValueError("Cannot take a larger sample than "
814814
"population when replace=False")
815815
elif size < 0:
816-
raise ValueError("negative dimensions are not allowed")
816+
raise ValueError("Negative dimensions are not allowed")
817817

818818
if p is not None:
819819
if np.count_nonzero(p > 0) < size:
@@ -3703,7 +3703,7 @@ warnings.filterwarnings("ignore", "RandomState", FutureWarning)
37033703
[True, True] # random
37043704
37053705
"""
3706-
from numpy.dual import svd
3706+
from numpy.linalg import svd
37073707

37083708
# Check preconditions on arguments
37093709
mean = np.array(mean)

randomgen/tests/test_seed_sequence.py

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
import numpy as np
2-
from numpy.testing import assert_array_equal
2+
from numpy.testing import assert_array_equal, assert_array_compare
33
import pytest
44

55
from randomgen._seed_sequence import SeedlessSeedSequence, SeedSequence
@@ -11,7 +11,7 @@
1111
HAS_NP_SEED_SEQUENCE = True
1212
except (ImportError, AttributeError):
1313
try:
14-
from numpy.random.bit_generator import SeedSequence as NPSeedSequence
14+
from numpy.random import SeedSequence as NPSeedSequence
1515

1616
HAS_NP_SEED_SEQUENCE = True
1717
except (ImportError, AttributeError):
@@ -205,3 +205,26 @@ def test_against_numpy_spawn():
205205
assert ss.n_children_spawned == np_ss.n_children_spawned
206206
for child, np_child in zip(ss_children, np_ss_children):
207207
assert_array_equal(child.generate_state(10), np_child.generate_state(10))
208+
209+
210+
def test_zero_padding():
211+
""" Ensure that the implicit zero-padding does not cause problems.
212+
"""
213+
# Ensure that large integers are inserted in little-endian fashion to avoid
214+
# trailing 0s.
215+
ss0 = SeedSequence(42)
216+
ss1 = SeedSequence(42 << 32)
217+
assert_array_compare(np.not_equal, ss0.generate_state(4), ss1.generate_state(4))
218+
219+
# Ensure backwards compatibility with the original 0.17 release for small
220+
# integers and no spawn key.
221+
expected42 = np.array(
222+
[3444837047, 2669555309, 2046530742, 3581440988], dtype=np.uint32
223+
)
224+
assert_array_equal(SeedSequence(42).generate_state(4), expected42)
225+
226+
# Regression test for gh-16539 to ensure that the implicit 0s don't
227+
# conflict with spawn keys.
228+
assert_array_compare(
229+
np.not_equal, SeedSequence(42, spawn_key=(0,)).generate_state(4), expected42
230+
)

tools/configuration.py

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
from collections import defaultdict
2+
3+
import jinja2
4+
5+
from randomgen import (
6+
DSFMT,
7+
EFIIX64,
8+
HC128,
9+
JSF,
10+
LXM,
11+
MT19937,
12+
PCG64,
13+
SFC64,
14+
SFMT,
15+
SPECK128,
16+
AESCounter,
17+
ChaCha,
18+
LCG128Mix,
19+
Philox,
20+
Romu,
21+
ThreeFry,
22+
Xoshiro256,
23+
Xoshiro512,
24+
)
25+
26+
ALL_BIT_GENS = [
27+
AESCounter,
28+
ChaCha,
29+
DSFMT,
30+
EFIIX64,
31+
HC128,
32+
JSF,
33+
LXM,
34+
PCG64,
35+
LCG128Mix,
36+
MT19937,
37+
Philox,
38+
SFC64,
39+
SFMT,
40+
SPECK128,
41+
ThreeFry,
42+
Xoshiro256,
43+
Xoshiro512,
44+
Romu,
45+
]
46+
JUMPABLE = [bg for bg in ALL_BIT_GENS if hasattr(bg, "jumped")]
47+
48+
SPECIALS = {
49+
ChaCha: {"rounds": [8, 20]},
50+
JSF: {"seed_size": [1, 3]},
51+
SFC64: {"k": [1, 3394385948627484371, "weyl"]},
52+
LCG128Mix: {"output": ["upper"]},
53+
PCG64: {"variant": ["dxsm", "dxsm-128", "xsl-rr"]},
54+
Romu: {"variant": ["quad", "trio"]},
55+
}
56+
OUTPUT = defaultdict(lambda: 64)
57+
OUTPUT.update({MT19937: 32, DSFMT: 32})
58+
with open("templates/configuration.jinja") as tmpl:
59+
TEMPLATE = jinja2.Template(tmpl.read())
60+
61+
DSFMT_WRAPPER = """\
62+
63+
class Wrapper32:
64+
def __init__(self, seed, **kwargs):
65+
if isinstance(seed, rg.DSFMT):
66+
self._bit_gen = seed
67+
else:
68+
self._bit_gen = rg.DSFMT(seed)
69+
70+
def random_raw(self, n=None):
71+
return self._bit_gen.random_raw(n).astype("u4")
72+
73+
def jumped(self):
74+
return Wrapper32(self._bit_gen.jumped())
75+
76+
rg.Wrapper32 = Wrapper32
77+
"""
78+
# Specials
79+
# SFC64
80+
DEFAULT_ENTOPY = (
81+
86316980830225721106033794313786972513572058861498566720023788662568817403978
82+
)

0 commit comments

Comments
 (0)