A structured, evidence-based investigation into cryptographic implementations that use the Mersenne Twister PRNG (mt19937)
The Mersenne Twister (mt19937), published by Matsumoto & Nishimura in 1998, is one of the most widely deployed general-purpose pseudo-random number generators in software history. It offers excellent statistical properties and an extraordinarily long period (2¹⁹⁹³⁷ − 1), making it highly attractive for simulations, games, and statistical sampling.
However, mt19937 is not a cryptographically secure PRNG (CSPRNG). Its internal state (624 × 32-bit words) can be fully reconstructed from as few as 624 consecutive output values, its output is entirely predictable from its seed, and it provides no forward secrecy or backtracking resistance. Its use in any security-sensitive context — key generation, IV/nonce derivation, session token creation, password salting — constitutes a serious cryptographic vulnerability.
Despite this, mt19937 has repeatedly appeared in cryptographic and security-sensitive codebases, sometimes by oversight, sometimes due to language defaults (e.g., Python's random module, PHP's rand()/mt_rand(), Java's java.util.Random), and sometimes through deliberate but misinformed design decisions.
This project systematically documents every such instance, at forensic quality, for academic and security research purposes.
- Catalogue all known cryptographic libraries, frameworks, protocols, and products that have used mt19937 (or a language's default PRNG backed by mt19937) in security-sensitive contexts.
- Classify each instance by severity, exploitability, and remediation status.
- Provide reproducible evidence for every finding, anchored to primary sources (source code, CVEs, advisories, academic papers).
- Track remediation: identify which findings have been patched, when, and how.
- Produce a public-facing dataset suitable for citation in security advisories and academic publications.
| Property | mt19937 | CSPRNG Requirement |
|---|---|---|
| State reconstruction | ✗ Possible from 624 outputs | Must be computationally infeasible |
| Forward secrecy | ✗ None | Required |
| Backtracking resistance | ✗ None | Required |
| Seed entropy | ✗ Often low (timestamp, PID) | Must be ≥128 bits |
| NIST/FIPS approval | ✗ Not approved | Required for regulated contexts |
| Output bias | ✓ Negligible (statistical) | N/A |
Reference: Matsumoto, M. & Nishimura, T. (1998). Mersenne Twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Transactions on Modeling and Computer Simulation, 8(1), 3–30.
Relevant CWEs:
- CWE-338 — Use of Cryptographically Weak Pseudo-Random Number Generator (PRNG)
- CWE-330 — Use of Insufficiently Random Values
- CWE-334 — Small Space of Random Values
mt19937-research/
│
├── AGENTS.md ← Agent operational instructions (read first)
├── README.md ← This file
├── CHANGELOG.md ← Running log of significant updates
│
├── sources/ ← Raw, unmodified evidence artifacts
│ ├── code/ ← Archived source code snapshots
│ │ └── <impl-name>_<version>/
│ ├── advisories/ ← CVE/advisory snapshots
│ ├── papers/ ← Academic papers (PDF + BibTeX)
│ └── web/ ← Archived web pages
│
├── candidates/ ← Unverified candidates awaiting analysis
│ └── <impl-name>.candidate.json
│
├── findings/ ← Verified, classified findings
│ ├── registry.jsonl ← Master NDJSON registry
│ ├── SUMMARY.md ← Human-readable dashboard
│ ├── DISPUTES.md ← Contested findings
│ │
│ ├── evidence/ ← One EvidenceRecord per finding
│ │ └── <impl-name>_<id>.evidence.json
│ │
│ ├── reports/ ← Narrative report per implementation
│ │ └── <impl-name>.md
│ │
│ └── verification/ ← Independent sign-offs
│ └── <impl-name>_<id>.verify.json
│
├── taxonomy/
│ ├── severity.md ← Severity classification guide
│ ├── cwe-mapping.md ← CWE cross-reference table
│ └── language-patterns.md ← Language-specific mt19937 fingerprints
│
├── scripts/
│ ├── search_github.py ← GitHub code search helper
│ ├── validate_record.py ← JSON schema validator
│ └── build_summary.py ← Generates SUMMARY.md from registry
│
└── meta/
├── agent-assignments.md
├── coordination-log.md
└── open-questions.md
- Read
AGENTS.mdin full before contributing any findings. - Check
meta/agent-assignments.mdto understand current role assignments. - Review
findings/SUMMARY.mdto understand the current state of the dataset. - Open questions are tracked in
meta/open-questions.md— consider tackling one.
- Your primary instruction document is
AGENTS.md. Read it completely before taking any action. - Identify your assigned role (Discovery, Forensic, Classification, Documentation, or Verification).
- Check
meta/coordination-log.mdfor pending tasks and recent decisions. - All output files must conform to the schemas defined in
AGENTS.md §6. - Do not produce findings without primary source citations. Do not speculate.
Before committing any JSON record, validate it:
python scripts/validate_record.py --file candidates/my-lib.candidate.json
python scripts/validate_record.py --file findings/evidence/my-lib_EVID-0001.evidence.jsonpython scripts/build_summary.py
# Output: findings/SUMMARY.mdSystematic search across code repositories (GitHub, GitLab, Sourcegraph), vulnerability databases (NVD, OSV, MITRE CVE, exploit-db), package registries (PyPI, npm, Maven, RubyGems, Packagist, crates.io), and academic literature (Google Scholar, Semantic Scholar, USENIX, ACM DL, IEEE Xplore).
Each candidate undergoes direct source code examination. The exact file, line numbers, version, and commit hash are recorded. The context of the mt19937 usage (what it generates, how it is seeded, whether any documentation disclaims cryptographic use) is assessed against a standardized questionnaire.
Confirmed findings are classified using a five-tier severity scale (CRITICAL → INFORMATIONAL) and mapped to relevant CWEs and CVEs. See taxonomy/severity.md for criteria.
All CRITICAL and HIGH findings are independently re-verified by a separate agent or researcher before being added to the finalized registry.
Per-implementation narrative reports are generated in findings/reports/. The master registry (findings/registry.jsonl) serves as the canonical, machine-readable dataset.
| Severity | Description |
|---|---|
| CRITICAL | mt19937 directly used for key, IV, or nonce generation; practically exploitable |
| HIGH | mt19937 used for session tokens or password salts; predictability demonstrated |
| MEDIUM | Security-adjacent usage; exploitation requires additional prerequisites |
| LOW | Present near security functions but not directly used cryptographically |
| INFORMATIONAL | Non-cryptographic use within a security library; noted for completeness |
Several programming languages use mt19937 as their default random() implementation. Any security-sensitive code that calls these without explicitly selecting a CSPRNG is potentially in scope:
| Language / Runtime | Unsafe Default | Safe Alternative |
|---|---|---|
| Python 2 & 3 | random module |
secrets, os.urandom() |
| PHP < 7.0 | rand(), mt_rand() |
random_bytes(), random_int() |
| Java | java.util.Random |
java.security.SecureRandom |
| C / C++ (many impls) | rand(), std::mt19937 |
/dev/urandom, getrandom(), OpenSSL |
| Ruby | Random / rand |
SecureRandom |
| JavaScript (some runtimes) | Math.random() |
crypto.getRandomValues() |
| R | Default RNG | sodium package |
| MATLAB | rand() |
N/A (not a security platform) |
- Argyros & Kiayias (2012) — I Forgot Your Password: Randomness Attacks Against PHP Applications — demonstrated practical exploitation of
mt_rand()in PHP web applications (USENIX Security 2012). - Goldberg & Wagner (1996) — Randomness and the Netscape Browser — early precedent for PRNG failures in security contexts.
- NCC Group (various) — Multiple advisories citing improper PRNG usage.
- CWE-338 — MITRE's canonical classification for this class of vulnerability.
- NIST SP 800-90A — Standard for approved DRBG mechanisms; mt19937 is not listed.
- All contributions must include primary source evidence. Unsourced claims will not be accepted.
- Use the JSON schemas defined in
AGENTS.md §6exactly — do not add or remove fields without team consensus. - Record all significant decisions in
meta/coordination-log.md. - Disputed findings go to
findings/DISPUTES.mduntil resolved. - Responsible disclosure: do not share unpatched CRITICAL or HIGH findings externally without project lead approval.
This dataset is released for academic and security research purposes.
If you use this dataset in a publication, please cite:
@misc{mt19937-research-2026,
title = {MT19937 in Cryptography: A Forensic Research Dataset},
year = {2026},
howpublished = {\url{https://github.com/your-org/mt19937-research}},
note = {Collaborative forensic research project}
}All inter-agent and inter-researcher communication must be logged in meta/coordination-log.md. For urgent escalations (actively exploited unpatched vulnerabilities), flag the entry with [URGENT] and notify the project lead immediately.
Schema version: 1.0.0 | Initialized: 2026-03-28