Evaluating Design Quality
=========================

Once you have a candidate design, the next question is: *"How good is it for
the model I intend to fit?"*  The :func:`~process_improve.experiments.evaluate_design`
function answers this by computing a complete, model-aware set of quality
metrics from the design matrix and a chosen model.  A single call can fully
characterise a response-surface design: its efficiency, where it predicts well
or badly, how its coefficients are correlated, and what bias it carries from
terms left out of the model.

Every metric is computed against the *model you specify*, not against the
design in isolation.  A design that is excellent for a main-effects model can
be poor for a full quadratic model, so the model always comes first.

When to Use This Tool
---------------------

- **Comparing candidate designs** - score Box-Behnken, central composite,
  definitive screening, OMARS, and optimal designs on the same footing before
  committing runs.
- **Checking a design against a reduced model** - verify that an explicit
  formula (for example main effects plus pure quadratics) is estimable and
  well-conditioned.
- **Diagnosing where a design predicts poorly** - the fraction-of-design-space
  (FDS) curve shows the spread of prediction variance across the whole region,
  not just at the design points.
- **Quantifying bias** - the alias matrix reports how much omitted two-factor
  interactions would bias the fitted coefficients.

Quick Start
-----------

.. code-block:: python

   from process_improve.experiments import generate_design, evaluate_design, Factor

   factors = [Factor(name=n, low=-1, high=1) for n in "ABCDE"]
   design = generate_design(factors, design_type="box_behnken", center_points=6)

   # A reduced model: main effects plus pure quadratics (no two-factor interactions).
   model = "A+B+C+D+E+I(A**2)+I(B**2)+I(C**2)+I(D**2)+I(E**2)"

   metrics = evaluate_design(
       design,
       model=model,
       metric=["d_efficiency", "a_optimality", "e_optimality", "fds"],
   )

To compute every available metric in one call, pass ``metric="all"`` or use the
:func:`~process_improve.experiments.evaluate_all` convenience wrapper:

.. code-block:: python

   from process_improve.experiments import evaluate_all

   everything = evaluate_all(design, model=model)

The Metrics
-----------

Optimality and efficiency
~~~~~~~~~~~~~~~~~~~~~~~~~~~

These summarise the information matrix :math:`X^\top X` for the fitted model.

- ``d_efficiency`` - :math:`100 \cdot \det(X^\top X)^{1/p} / N`; overall
  information content (higher is better).
- ``a_optimality`` - :math:`\operatorname{trace}((X^\top X)^{-1})`, the average
  coefficient variance (**lower** is better).  Reported alongside an
  ``a_efficiency`` score for parity with ``d_efficiency``.
- ``e_optimality`` - the smallest eigenvalue of :math:`X^\top X`, the
  worst-estimated direction in parameter space (**higher** is better).
- ``vif`` - variance inflation factor per term; how much each coefficient
  variance is inflated by non-orthogonality.
- ``condition_number`` - conditioning of the model matrix.

Term correlation
~~~~~~~~~~~~~~~~~

``correlation`` summarises the pairwise correlation among the *second-order*
terms (pure quadratics and two-factor interactions).  Because the
:math:`x_i^2` columns have a non-zero mean, a naive Pearson correlation is
inflated by that shared offset and depends on the coding.  The metric instead
residualises each second-order column against the intercept-and-main-effect
block first, giving a coding-invariant measure.  It returns ``max_abs_r``,
``mean_abs_r``, and the full ``matrix``.

Alias (bias) matrix
~~~~~~~~~~~~~~~~~~~~~

``alias_matrix`` generalises the two-level alias structure to any design and
model.  Given the fitted model :math:`X_1` and a set of potential extra terms
:math:`X_2` (by default the two-factor interactions *not* already in the
model), it computes

.. math::

   A = (X_1^\top X_1)^{-1} X_1^\top X_2,

so the expected fitted coefficients are biased as
:math:`\mathbb{E}[b_1] = \beta_1 + A\,\beta_2`.  The result reports the matrix,
the worst single bias (``max_abs``), the maximum over the main-effect rows
(``max_abs_main_effect_rows``), and the Frobenius norm.

Prediction variance and the FDS curve
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

``prediction_variance`` is the leverage :math:`d(x) = x^\top (X^\top X)^{-1} x`
*at the design runs*.  ``fds`` instead samples :math:`d(x)` over the **whole
design region**, giving the fraction-of-design-space distribution.  From the
same region sample it reports:

- ``average_prediction_variance`` - the region average (I / V-optimality),
- ``max_prediction_variance`` - the region maximum (G-optimality), in
  :math:`\sigma^2` units,
- the run-count-scaled SPV variants (each multiplied by ``N``), and
- a coarse 11-point ``quantiles`` summary.

``i_efficiency`` and ``g_efficiency`` are derived from this same region
machinery, so they are consistent with the ``fds`` payload.

Power
~~~~~

``power`` reports the statistical power to detect each model term.  Pass
``effect_size`` for a single power value per term, or omit it for a power curve
over a range of effect sizes.

Fractional-factorial structure
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For two-level fractional factorials the tool also reports ``alias_structure``,
``confounding``, ``resolution``, ``defining_relation``, ``clear_effects``, and
``minimum_aberration`` from the generators.

Region Sampling and Reproducibility
-----------------------------------

The region-integrated metrics (``i_efficiency``, ``g_efficiency``,
``average_prediction_variance``, ``max_prediction_variance``, and ``fds``) are
computed by Monte-Carlo sampling the design region.  The sampling is fully
controllable and seeded:

.. list-table::
   :header-rows: 1
   :widths: 22 14 64

   * - Parameter
     - Default
     - Meaning
   * - ``region``
     - ``"cuboidal"``
     - ``"cuboidal"`` samples :math:`[-1, 1]^k`; ``"spherical"`` samples the
       ball of radius :math:`\sqrt{k}`.
   * - ``n_samples``
     - ``100_000``
     - Number of random points drawn over the region.
   * - ``include_vertices``
     - ``True``
     - Always append the :math:`2^k` cube corners, where the worst-case
       prediction variance usually sits.
   * - ``random_seed``
     - ``42``
     - Seed for the region sampler; fixing it makes the maximum reproducible.

The region **average** (I) is stable across seeds, but the region **maximum**
(G) is sensitive to the sample: the worst point is often in the interior, so a
denser sample finds higher worst-case values.  To tighten and reproduce the
G estimate, raise ``n_samples`` and fix ``random_seed``:

.. code-block:: python

   metrics = evaluate_design(
       design, model=model, metric="fds",
       n_samples=120_000, random_seed=1,
   )
   metrics["fds"]["max_prediction_variance"]  # reproducible run to run

The region, sample size, vertex flag, and seed actually used are echoed back in
the ``fds`` payload.

Tunable FDS Curve
-----------------

By default ``fds`` returns the coarse 11-point ``quantiles`` summary.  For a
smooth plot, set ``fds_resolution`` to the number of points you want.  The
payload then gains a ``curve`` sub-dict with ``fraction``,
``prediction_variance``, and ``scaled_prediction_variance`` (the run-count
scaled SPV) arrays of that length, evaluated on evenly spaced fractions in
:math:`[0, 1]`.  The arrays are monotonically non-decreasing, and their
endpoints equal the minimum and maximum prediction variance.

.. code-block:: python

   fds = evaluate_design(
       design, model=model, metric="fds",
       fds_resolution=200, random_seed=1,
   )["fds"]

   curve = fds["curve"]
   curve["fraction"]              # 200 evenly spaced fractions in [0, 1]
   curve["prediction_variance"]   # the FDS curve, sigma^2 units
   curve["scaled_prediction_variance"]  # the same, scaled by N (SPV)

Setting ``fds_resolution`` is fully backward compatible: when it is ``None``
(the default) the output is unchanged and the coarse quantile summary is still
present.

See Also
--------

- :func:`~process_improve.experiments.evaluate_design` - full API reference.
- :func:`~process_improve.experiments.evaluate_all` - compute every metric in
  one call.
- :doc:`doe_strategy` - choosing which design to generate in the first place.