Reproducibility Contract (RNG Handling)#

Note

This document is the canonical policy for randomness in process-improve. Tracks ENG-08.

Why one contract?#

Reproducibility of a fitted model, a bootstrap interval, or an optimisation result is part of the package’s value proposition. A chemometrics or DoE user comparing two runs across a software upgrade has to be able to distinguish “the algorithm changed” from “the random seed changed”. Today the codebase mixes:

  • unseeded np.random.default_rng() calls inside production algorithms,

  • hard-coded literal seeds (np.random.default_rng(42), seed=0) inside production algorithms,

  • well-seeded paths that thread random_state through correctly,

  • @tool_spec(rng={...}) metadata that describes the contract but enforces nothing.

The result is that “this run is reproducible” is currently a vibe, not a guarantee. This document pins the guarantee.

The contract#

  1. Every public function that touches an RNG MUST accept a ``random_state`` parameter. The accepted types are int | numpy.random.Generator | None (matching the convention used by scikit-learn since 1.4).

    def bootstrap(
        self,
        X: pd.DataFrame,
        *,
        n_boot: int = 1000,
        random_state: int | np.random.Generator | None = None,
    ) -> ...:
    
  2. Resolve ``random_state`` once at function entry using the helper process_improve._random.check_random_state(). Its resolution rules match sklearn’s check_random_state but it returns a modern numpy.random.Generator:

    • None -> a fresh, unseeded np.random.default_rng().

    • int -> np.random.default_rng(int).

    • Generator -> returned as-is.

    Use the resolved Generator for all draws inside the function; never call np.random.* directly.

  3. Hard-coded literal seeds in production code are forbidden. If a function currently does

    rng = np.random.default_rng(42)  # don't
    

    the 42 must move to the public signature as the default, and the function must use check_random_state to resolve it:

    def find_optimum(
        ...,
        *,
        random_state: int | np.random.Generator | None = None,
    ):
        rng = check_random_state(random_state)
    
  4. Unseeded ``default_rng()`` is forbidden except where “fresh noise on every call” is the documented contract. The only such case currently is simulation.model.simulate’s noise term, which is part of the simulator’s documented behaviour and is gated behind an explicit # Fresh noise: documented behaviour, not an accident comment.

  5. Every ``@tool_spec`` that touches an RNG MUST declare its contract via the ``rng=`` metadata:

    @tool_spec(
        name="bootstrap_pca",
        ...,
        rng={
            "uses_rng": True,
            "seed_param": "random_state",
            "default_seed": 0,
        },
    )
    

    The seed_param field is the name of the kwarg the reproducibility-check harness will exercise. Deterministic tools declare {"uses_rng": False}.

  6. A self-test harness enforces (5). A test in tests/test_rng_contract.py will import every registered tool, run it twice with the declared default_seed, and assert byte-equal outputs. The same test verifies that a tool declaring {"uses_rng": False} produces byte-equal outputs without any seeding.

How to migrate an existing function#

Most production paths are one mechanical edit:

Before:

def my_thing(...):
    rng = np.random.default_rng()        # or np.random.default_rng(42)
    ...

After:

def my_thing(
    ...,
    *,
    random_state: int | np.random.Generator | None = None,
):
    rng = check_random_state(random_state)
    ...

The behaviour for callers that did not pass random_state is preserved (None falls through to an unseeded default_rng).

For an algorithm whose documented public behaviour was “always seeded with 42”, set default_seed=42 in the @tool_spec(rng=...) metadata and accept random_state=None as a synonym; this preserves byte-equivalence for existing callers.

What is not covered#

  • The unseeded noise in simulation.model.simulate is deliberately not reproducible; that is the simulator’s contract.

  • The hard-coded plot seeds in surfaces.py / design_quality.py (seed for plot-only jitter) are in-scope for the migration but are low priority – a plot with shifted-by-one jitter is not a correctness failure.

  • Numerical reproducibility across NumPy major versions is out of scope (NumPy ships RNG algorithm changes on a documented schedule).

Open work#

Aspects of the contract that are not yet implemented:

  • The tests/test_rng_contract.py harness that exercises every @tool_spec(rng={"uses_rng": True}) against its declared default_seed will land alongside the first sweep that migrates production callsites.

  • Migration of the existing offenders is tracked in SEC-21 sub-item 9 (Resampler), the relevant SEC-33 sub-item (#282) (optimization.py:564), and ENG-08 (#290) itself for the remaining call sites.

Cross-references#