Skip to content

Svetjana666/Mersenne_Twister

Repository files navigation

MT19937 in Cryptography — Forensic Research Project

A structured, evidence-based investigation into cryptographic implementations that use the Mersenne Twister PRNG (mt19937)


Overview

The Mersenne Twister (mt19937), published by Matsumoto & Nishimura in 1998, is one of the most widely deployed general-purpose pseudo-random number generators in software history. It offers excellent statistical properties and an extraordinarily long period (2¹⁹⁹³⁷ − 1), making it highly attractive for simulations, games, and statistical sampling.

However, mt19937 is not a cryptographically secure PRNG (CSPRNG). Its internal state (624 × 32-bit words) can be fully reconstructed from as few as 624 consecutive output values, its output is entirely predictable from its seed, and it provides no forward secrecy or backtracking resistance. Its use in any security-sensitive context — key generation, IV/nonce derivation, session token creation, password salting — constitutes a serious cryptographic vulnerability.

Despite this, mt19937 has repeatedly appeared in cryptographic and security-sensitive codebases, sometimes by oversight, sometimes due to language defaults (e.g., Python's random module, PHP's rand()/mt_rand(), Java's java.util.Random), and sometimes through deliberate but misinformed design decisions.

This project systematically documents every such instance, at forensic quality, for academic and security research purposes.


Research Objectives

  1. Catalogue all known cryptographic libraries, frameworks, protocols, and products that have used mt19937 (or a language's default PRNG backed by mt19937) in security-sensitive contexts.
  2. Classify each instance by severity, exploitability, and remediation status.
  3. Provide reproducible evidence for every finding, anchored to primary sources (source code, CVEs, advisories, academic papers).
  4. Track remediation: identify which findings have been patched, when, and how.
  5. Produce a public-facing dataset suitable for citation in security advisories and academic publications.

Background: Why mt19937 Is Cryptographically Unsafe

Property mt19937 CSPRNG Requirement
State reconstruction ✗ Possible from 624 outputs Must be computationally infeasible
Forward secrecy ✗ None Required
Backtracking resistance ✗ None Required
Seed entropy ✗ Often low (timestamp, PID) Must be ≥128 bits
NIST/FIPS approval ✗ Not approved Required for regulated contexts
Output bias ✓ Negligible (statistical) N/A

Reference: Matsumoto, M. & Nishimura, T. (1998). Mersenne Twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Transactions on Modeling and Computer Simulation, 8(1), 3–30.

Relevant CWEs:

  • CWE-338 — Use of Cryptographically Weak Pseudo-Random Number Generator (PRNG)
  • CWE-330 — Use of Insufficiently Random Values
  • CWE-334 — Small Space of Random Values

Repository Structure

mt19937-research/
│
├── AGENTS.md                        ← Agent operational instructions (read first)
├── README.md                        ← This file
├── CHANGELOG.md                     ← Running log of significant updates
│
├── sources/                         ← Raw, unmodified evidence artifacts
│   ├── code/                        ← Archived source code snapshots
│   │   └── <impl-name>_<version>/
│   ├── advisories/                  ← CVE/advisory snapshots
│   ├── papers/                      ← Academic papers (PDF + BibTeX)
│   └── web/                         ← Archived web pages
│
├── candidates/                      ← Unverified candidates awaiting analysis
│   └── <impl-name>.candidate.json
│
├── findings/                        ← Verified, classified findings
│   ├── registry.jsonl               ← Master NDJSON registry
│   ├── SUMMARY.md                   ← Human-readable dashboard
│   ├── DISPUTES.md                  ← Contested findings
│   │
│   ├── evidence/                    ← One EvidenceRecord per finding
│   │   └── <impl-name>_<id>.evidence.json
│   │
│   ├── reports/                     ← Narrative report per implementation
│   │   └── <impl-name>.md
│   │
│   └── verification/                ← Independent sign-offs
│       └── <impl-name>_<id>.verify.json
│
├── taxonomy/
│   ├── severity.md                  ← Severity classification guide
│   ├── cwe-mapping.md               ← CWE cross-reference table
│   └── language-patterns.md        ← Language-specific mt19937 fingerprints
│
├── scripts/
│   ├── search_github.py             ← GitHub code search helper
│   ├── validate_record.py           ← JSON schema validator
│   └── build_summary.py            ← Generates SUMMARY.md from registry
│
└── meta/
    ├── agent-assignments.md
    ├── coordination-log.md
    └── open-questions.md

Getting Started

For Human Researchers

  1. Read AGENTS.md in full before contributing any findings.
  2. Check meta/agent-assignments.md to understand current role assignments.
  3. Review findings/SUMMARY.md to understand the current state of the dataset.
  4. Open questions are tracked in meta/open-questions.md — consider tackling one.

For AI Research Agents

  1. Your primary instruction document is AGENTS.md. Read it completely before taking any action.
  2. Identify your assigned role (Discovery, Forensic, Classification, Documentation, or Verification).
  3. Check meta/coordination-log.md for pending tasks and recent decisions.
  4. All output files must conform to the schemas defined in AGENTS.md §6.
  5. Do not produce findings without primary source citations. Do not speculate.

Running the Validation Script

Before committing any JSON record, validate it:

python scripts/validate_record.py --file candidates/my-lib.candidate.json
python scripts/validate_record.py --file findings/evidence/my-lib_EVID-0001.evidence.json

Rebuilding the Summary Dashboard

python scripts/build_summary.py
# Output: findings/SUMMARY.md

Research Methodology

Phase 1 — Discovery

Systematic search across code repositories (GitHub, GitLab, Sourcegraph), vulnerability databases (NVD, OSV, MITRE CVE, exploit-db), package registries (PyPI, npm, Maven, RubyGems, Packagist, crates.io), and academic literature (Google Scholar, Semantic Scholar, USENIX, ACM DL, IEEE Xplore).

Phase 2 — Forensic Source Analysis

Each candidate undergoes direct source code examination. The exact file, line numbers, version, and commit hash are recorded. The context of the mt19937 usage (what it generates, how it is seeded, whether any documentation disclaims cryptographic use) is assessed against a standardized questionnaire.

Phase 3 — Classification

Confirmed findings are classified using a five-tier severity scale (CRITICAL → INFORMATIONAL) and mapped to relevant CWEs and CVEs. See taxonomy/severity.md for criteria.

Phase 4 — Independent Verification

All CRITICAL and HIGH findings are independently re-verified by a separate agent or researcher before being added to the finalized registry.

Phase 5 — Reporting

Per-implementation narrative reports are generated in findings/reports/. The master registry (findings/registry.jsonl) serves as the canonical, machine-readable dataset.


Severity Scale (Summary)

Severity Description
CRITICAL mt19937 directly used for key, IV, or nonce generation; practically exploitable
HIGH mt19937 used for session tokens or password salts; predictability demonstrated
MEDIUM Security-adjacent usage; exploitation requires additional prerequisites
LOW Present near security functions but not directly used cryptographically
INFORMATIONAL Non-cryptographic use within a security library; noted for completeness

Known Language-Default Risks

Several programming languages use mt19937 as their default random() implementation. Any security-sensitive code that calls these without explicitly selecting a CSPRNG is potentially in scope:

Language / Runtime Unsafe Default Safe Alternative
Python 2 & 3 random module secrets, os.urandom()
PHP < 7.0 rand(), mt_rand() random_bytes(), random_int()
Java java.util.Random java.security.SecureRandom
C / C++ (many impls) rand(), std::mt19937 /dev/urandom, getrandom(), OpenSSL
Ruby Random / rand SecureRandom
JavaScript (some runtimes) Math.random() crypto.getRandomValues()
R Default RNG sodium package
MATLAB rand() N/A (not a security platform)

Prior Work & Related Resources

  • Argyros & Kiayias (2012)I Forgot Your Password: Randomness Attacks Against PHP Applications — demonstrated practical exploitation of mt_rand() in PHP web applications (USENIX Security 2012).
  • Goldberg & Wagner (1996)Randomness and the Netscape Browser — early precedent for PRNG failures in security contexts.
  • NCC Group (various) — Multiple advisories citing improper PRNG usage.
  • CWE-338 — MITRE's canonical classification for this class of vulnerability.
  • NIST SP 800-90A — Standard for approved DRBG mechanisms; mt19937 is not listed.

Contribution Guidelines

  • All contributions must include primary source evidence. Unsourced claims will not be accepted.
  • Use the JSON schemas defined in AGENTS.md §6 exactly — do not add or remove fields without team consensus.
  • Record all significant decisions in meta/coordination-log.md.
  • Disputed findings go to findings/DISPUTES.md until resolved.
  • Responsible disclosure: do not share unpatched CRITICAL or HIGH findings externally without project lead approval.

License & Citation

This dataset is released for academic and security research purposes.

If you use this dataset in a publication, please cite:

@misc{mt19937-research-2026,
  title        = {MT19937 in Cryptography: A Forensic Research Dataset},
  year         = {2026},
  howpublished = {\url{https://github.com/your-org/mt19937-research}},
  note         = {Collaborative forensic research project}
}

Contact & Coordination

All inter-agent and inter-researcher communication must be logged in meta/coordination-log.md. For urgent escalations (actively exploited unpatched vulnerabilities), flag the entry with [URGENT] and notify the project lead immediately.


Schema version: 1.0.0 | Initialized: 2026-03-28