Generating OMARS Designs
========================

OMARS designs (orthogonal minimally aliased response surface designs) are
three-level designs that sit between screening designs and full response
surface designs.  Every main effect is orthogonal to every other main effect
*and* to all second-order terms (the pure quadratics and the two-factor
interactions), so all aliasing is confined to the second-order block, where it
is kept *minimal*.

:func:`~process_improve.experiments.generate_omars` constructs such designs on
demand with an integer linear program (ILP).  Unlike the minimal
conference-foldover member produced by ``generate_design(design_type="omars")``
(which is saturated and leaves no error degrees of freedom), the designs from
``generate_omars`` are sized to support a full second-order analysis with
:func:`~process_improve.experiments.analyze_omars`.

.. note::

   ``generate_omars`` requires the optional ``ilp`` extra (PuLP, which bundles
   the CBC solver)::

       pip install 'process-improve[ilp]'

Quick start
-----------

.. code-block:: python

   from process_improve.experiments import Factor, generate_omars, analyze_omars

   factors = [Factor(name=n, low=-1, high=1) for n in "ABCDE"]

   # Smallest foldover OMARS design that still leaves error degrees of freedom.
   result = generate_omars(factors)
   print(result.metadata["n_runs_selected"], result.metadata["expected_error_df"])
   print(result.metadata["omars_verified"])   # True

   # The design is ready for the staged OMARS analysis.
   design = result.design[result.factor_names]
   # ... collect responses y, then:
   analysis = analyze_omars(design, y)         # analysis.success is True

You can pin an exact (odd) run size or search a window:

.. code-block:: python

   result = generate_omars(factors, n_runs=29)
   result = generate_omars(factors, n_runs_range=(27, 41))

The method
----------

Every design produced here is a **foldover** :math:`[H; -H; 0]`: a half-design
:math:`H`, its mirror image :math:`-H`, and a single centre run.  The foldover
structure makes three of the four OMARS-defining conditions hold automatically,
which is what keeps the construction tractable.

For a design coded to :math:`\{-1, 0, +1\}`:

- **Balance** is automatic: a run :math:`h` and its mirror :math:`-h` cancel,
  so every main-effect column sums to zero.
- **Main effects clear of the two-factor interactions** is automatic: the term
  :math:`x_i x_a x_b` is an *odd* function, so the contributions from :math:`h`
  and :math:`-h` cancel.
- **Main effects clear of the pure quadratics** is automatic: :math:`x_i x_j^2`
  is odd in :math:`x_i`, so those contributions cancel too.
- **Quadratics are estimable** because the centre run makes each :math:`x_i^2`
  column take the value 0 at least once (so it is not constant).

The only condition that is **not** automatic is the mutual orthogonality of the
main effects.  That condition is *linear* in the binary "include half-run"
variables :math:`s_r`: for every pair of factors :math:`i < j`,

.. math::

   \sum_r \left( x_{r,i}\, x_{r,j} \right) s_r = 0 ,

and the run count is :math:`N = 2\sum_r s_r + 1`.  The ILP therefore selects a
half-design from the :math:`(3^k - 1)/2` distinct non-mirror three-level runs
subject to only :math:`k(k-1)/2` equality constraints.  Because the
coefficients are integers, the equalities are exact (no numerical tolerance
enters the optimisation); a floating-point :func:`~process_improve.experiments.is_omars`
re-check guards every accepted design as a sanity check.

Choosing the run size and the design
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- When ``n_runs`` is given it is used directly (it must be odd and larger than
  the number of second-order parameters :math:`1 + 2k + k(k-1)/2`).
- Otherwise the solver minimises the run count within a window to return the
  smallest feasible design that still leaves error degrees of freedom.
- Several distinct designs are then enumerated at that run size by adding
  *no-good cuts* (forbidding a previously found selection), and the winner is
  chosen by ``selection_criterion``:

  - ``"dominance"`` (default) keeps the Pareto front on D-efficiency (higher is
    better) and the maximum second-order correlation (lower is better), then
    prefers the smallest, most efficient design.
  - ``"d_efficiency"`` maximises the D-efficiency of the full second-order model.
  - ``"min_second_order_correlation"`` minimises the largest second-order
    correlation.

- Optionally, ``satisfice`` sets *acceptability thresholds* that are applied
  **before** the dominance/criterion step: a design is kept only if it clears
  every threshold.  Supported keys are ``"d_efficiency"`` (a minimum) and
  ``"max_second_order_correlation"`` (a maximum), e.g.
  ``satisfice={"d_efficiency": 5.0, "max_second_order_correlation": 0.7}``.
  Together these implement the **satisficing-and-dominance** multicriteria
  selection of Nunez Ares and Goos (2020): first discard designs that fail the
  minimum bars (satisficing), then drop dominated designs and choose from the
  Pareto front (dominance).  This deliberately avoids collapsing several
  criteria into a single weighted score, which would hide the trade-offs.  A
  ``ValueError`` is raised if no enumerated design meets the thresholds.

The returned :class:`~process_improve.experiments.DesignResult` records the
provenance and a search report under ``metadata`` (``family``, ``sparsity``,
``expected_error_df``, ``d_efficiency``, ``max_second_order_correlation`` and an
``omars_search`` report with the ILP iteration count and solver time).

Performance: iterations and timing by factor count
--------------------------------------------------

The table below reports, for the default settings (the automatic smallest-size
search with ``max_candidates=6``), the size of the candidate half-pool, the run
size of the smallest design found, the resulting error degrees of freedom, the
number of ILP solves performed (the *iteration count*: one minimise-size probe
plus the no-good-cut re-solves), and the cumulative CBC solver time.  Times were
measured single-threaded on an ``x86_64`` machine with CPython 3.11 and CBC (the
solver bundled with PuLP); they are indicative and will vary by machine.

.. list-table::
   :header-rows: 1
   :widths: 8 14 8 10 12 12

   * - Factors :math:`k`
     - Half-pool size
     - Runs :math:`N`
     - Error df
     - ILP solves
     - Solver time (s)
   * - 3
     - 13
     - 13
     - 3
     - 6
     - 0.03
   * - 4
     - 40
     - 19
     - 4
     - 6
     - 0.05
   * - 5
     - 121
     - 27
     - 6
     - 6
     - 0.14
   * - 6
     - 364
     - 35
     - 7
     - 6
     - 5.6
   * - 7
     - 1093
     - 45
     - 9
     - 6
     - 39

The iteration count is fixed by ``max_candidates`` (each iteration is a full ILP
solve); the cost per iteration grows with the half-pool size :math:`(3^k - 1)/2`
and the number of orthogonality constraints :math:`k(k-1)/2`.  Three to six
factors solve in well under a second; seven factors take tens of seconds.  Beyond
seven factors the half-pool grows quickly, so a longer ``solver_options["time_limit"]``
(default 60 s) or a tighter ``n_runs`` is advisable.

Limitations
-----------

- **Foldover family only.**  ``generate_omars`` builds the (dominant) foldover
  OMARS family.  The rarer non-foldover members from the enumerated catalogue
  are a documented future extension.
- **Odd run counts.**  A foldover design has :math:`2h + 1` runs, so ``n_runs``
  must be odd.
- **Full second-order conditioning.**  The smallest OMARS designs are highly
  aliased in the second-order block by construction, so the D-efficiency of the
  *full* quadratic model is low; this is expected, and is exactly why
  :func:`~process_improve.experiments.analyze_omars` resolves the second-order
  terms in stages rather than fitting them all at once.  Request a larger
  ``n_runs`` for a better-conditioned design.

References
----------

- Nunez Ares, J. and Goos, P. (2020).  "Enumeration and multicriteria selection
  of orthogonal minimally aliased response surface designs."  *Technometrics*,
  62(1):21-36.
- Nunez Ares, J. and Goos, P. (2019).  "An integer linear programming approach
  to find trend-robust run orders of experimental designs."  *Journal of Quality
  Technology*.