Evaluating Design Quality#

Once you have a candidate design, the next question is: “How good is it for the model I intend to fit?” The evaluate_design() function answers this by computing a complete, model-aware set of quality metrics from the design matrix and a chosen model. A single call can fully characterise a response-surface design: its efficiency, where it predicts well or badly, how its coefficients are correlated, and what bias it carries from terms left out of the model.

Every metric is computed against the model you specify, not against the design in isolation. A design that is excellent for a main-effects model can be poor for a full quadratic model, so the model always comes first.

When to Use This Tool#

  • Comparing candidate designs - score Box-Behnken, central composite, definitive screening, OMARS, and optimal designs on the same footing before committing runs.

  • Checking a design against a reduced model - verify that an explicit formula (for example main effects plus pure quadratics) is estimable and well-conditioned.

  • Diagnosing where a design predicts poorly - the fraction-of-design-space (FDS) curve shows the spread of prediction variance across the whole region, not just at the design points.

  • Quantifying bias - the alias matrix reports how much omitted two-factor interactions would bias the fitted coefficients.

Quick Start#

from process_improve.experiments import generate_design, evaluate_design, Factor

factors = [Factor(name=n, low=-1, high=1) for n in "ABCDE"]
design = generate_design(factors, design_type="box_behnken", center_points=6)

# A reduced model: main effects plus pure quadratics (no two-factor interactions).
model = "A+B+C+D+E+I(A**2)+I(B**2)+I(C**2)+I(D**2)+I(E**2)"

metrics = evaluate_design(
    design,
    model=model,
    metric=["d_efficiency", "a_optimality", "e_optimality", "fds"],
)

To compute every available metric in one call, pass metric="all" or use the evaluate_all() convenience wrapper:

from process_improve.experiments import evaluate_all

everything = evaluate_all(design, model=model)

The Metrics#

Optimality and efficiency#

These summarise the information matrix \(X^\top X\) for the fitted model.

  • d_efficiency - \(100 \cdot \det(X^\top X)^{1/p} / N\); overall information content (higher is better).

  • a_optimality - \(\operatorname{trace}((X^\top X)^{-1})\), the average coefficient variance (lower is better). Reported alongside an a_efficiency score for parity with d_efficiency.

  • e_optimality - the smallest eigenvalue of \(X^\top X\), the worst-estimated direction in parameter space (higher is better).

  • vif - variance inflation factor per term; how much each coefficient variance is inflated by non-orthogonality.

  • condition_number - conditioning of the model matrix.

Term correlation#

correlation summarises the pairwise correlation among the second-order terms (pure quadratics and two-factor interactions). Because the \(x_i^2\) columns have a non-zero mean, a naive Pearson correlation is inflated by that shared offset and depends on the coding. The metric instead residualises each second-order column against the intercept-and-main-effect block first, giving a coding-invariant measure. It returns max_abs_r, mean_abs_r, and the full matrix.

Alias (bias) matrix#

alias_matrix generalises the two-level alias structure to any design and model. Given the fitted model \(X_1\) and a set of potential extra terms \(X_2\) (by default the two-factor interactions not already in the model), it computes

\[A = (X_1^\top X_1)^{-1} X_1^\top X_2,\]

so the expected fitted coefficients are biased as \(\mathbb{E}[b_1] = \beta_1 + A\,\beta_2\). The result reports the matrix, the worst single bias (max_abs), the maximum over the main-effect rows (max_abs_main_effect_rows), and the Frobenius norm.

Prediction variance and the FDS curve#

prediction_variance is the leverage \(d(x) = x^\top (X^\top X)^{-1} x\) at the design runs. fds instead samples \(d(x)\) over the whole design region, giving the fraction-of-design-space distribution. From the same region sample it reports:

  • average_prediction_variance - the region average (I / V-optimality),

  • max_prediction_variance - the region maximum (G-optimality), in \(\sigma^2\) units,

  • the run-count-scaled SPV variants (each multiplied by N), and

  • a coarse 11-point quantiles summary.

i_efficiency and g_efficiency are derived from this same region machinery, so they are consistent with the fds payload.

Power#

power reports the statistical power to detect each model term. Pass effect_size for a single power value per term, or omit it for a power curve over a range of effect sizes.

Fractional-factorial structure#

For two-level fractional factorials the tool also reports alias_structure, confounding, resolution, defining_relation, clear_effects, and minimum_aberration from the generators.

Region Sampling and Reproducibility#

The region-integrated metrics (i_efficiency, g_efficiency, average_prediction_variance, max_prediction_variance, and fds) are computed by Monte-Carlo sampling the design region. The sampling is fully controllable and seeded:

Parameter

Default

Meaning

region

"cuboidal"

"cuboidal" samples \([-1, 1]^k\); "spherical" samples the ball of radius \(\sqrt{k}\).

n_samples

100_000

Number of random points drawn over the region.

include_vertices

True

Always append the \(2^k\) cube corners, where the worst-case prediction variance usually sits.

random_seed

42

Seed for the region sampler; fixing it makes the maximum reproducible.

The region average (I) is stable across seeds, but the region maximum (G) is sensitive to the sample: the worst point is often in the interior, so a denser sample finds higher worst-case values. To tighten and reproduce the G estimate, raise n_samples and fix random_seed:

metrics = evaluate_design(
    design, model=model, metric="fds",
    n_samples=120_000, random_seed=1,
)
metrics["fds"]["max_prediction_variance"]  # reproducible run to run

The region, sample size, vertex flag, and seed actually used are echoed back in the fds payload.

Tunable FDS Curve#

By default fds returns the coarse 11-point quantiles summary. For a smooth plot, set fds_resolution to the number of points you want. The payload then gains a curve sub-dict with fraction, prediction_variance, and scaled_prediction_variance (the run-count scaled SPV) arrays of that length, evaluated on evenly spaced fractions in \([0, 1]\). The arrays are monotonically non-decreasing, and their endpoints equal the minimum and maximum prediction variance.

fds = evaluate_design(
    design, model=model, metric="fds",
    fds_resolution=200, random_seed=1,
)["fds"]

curve = fds["curve"]
curve["fraction"]              # 200 evenly spaced fractions in [0, 1]
curve["prediction_variance"]   # the FDS curve, sigma^2 units
curve["scaled_prediction_variance"]  # the same, scaled by N (SPV)

Setting fds_resolution is fully backward compatible: when it is None (the default) the output is unchanged and the coarse quantile summary is still present.

See Also#

  • evaluate_design() - full API reference.

  • evaluate_all() - compute every metric in one call.

  • Experimental Strategy Recommendation - choosing which design to generate in the first place.