Generating OMARS Designs#
OMARS designs (orthogonal minimally aliased response surface designs) are three-level designs that sit between screening designs and full response surface designs. Every main effect is orthogonal to every other main effect and to all second-order terms (the pure quadratics and the two-factor interactions), so all aliasing is confined to the second-order block, where it is kept minimal.
generate_omars() constructs such designs on
demand with an integer linear program (ILP). Unlike the minimal
conference-foldover member produced by generate_design(design_type="omars")
(which is saturated and leaves no error degrees of freedom), the designs from
generate_omars are sized to support a full second-order analysis with
analyze_omars().
Note
generate_omars requires the optional ilp extra (PuLP, which bundles
the CBC solver):
pip install 'process-improve[ilp]'
Quick start#
from process_improve.experiments import Factor, generate_omars, analyze_omars
factors = [Factor(name=n, low=-1, high=1) for n in "ABCDE"]
# Smallest foldover OMARS design that still leaves error degrees of freedom.
result = generate_omars(factors)
print(result.metadata["n_runs_selected"], result.metadata["expected_error_df"])
print(result.metadata["omars_verified"]) # True
# The design is ready for the staged OMARS analysis.
design = result.design[result.factor_names]
# ... collect responses y, then:
analysis = analyze_omars(design, y) # analysis.success is True
You can pin an exact (odd) run size or search a window:
result = generate_omars(factors, n_runs=29)
result = generate_omars(factors, n_runs_range=(27, 41))
The method#
Every design produced here is a foldover \([H; -H; 0]\): a half-design \(H\), its mirror image \(-H\), and a single centre run. The foldover structure makes three of the four OMARS-defining conditions hold automatically, which is what keeps the construction tractable.
For a design coded to \(\{-1, 0, +1\}\):
Balance is automatic: a run \(h\) and its mirror \(-h\) cancel, so every main-effect column sums to zero.
Main effects clear of the two-factor interactions is automatic: the term \(x_i x_a x_b\) is an odd function, so the contributions from \(h\) and \(-h\) cancel.
Main effects clear of the pure quadratics is automatic: \(x_i x_j^2\) is odd in \(x_i\), so those contributions cancel too.
Quadratics are estimable because the centre run makes each \(x_i^2\) column take the value 0 at least once (so it is not constant).
The only condition that is not automatic is the mutual orthogonality of the main effects. That condition is linear in the binary “include half-run” variables \(s_r\): for every pair of factors \(i < j\),
and the run count is \(N = 2\sum_r s_r + 1\). The ILP therefore selects a
half-design from the \((3^k - 1)/2\) distinct non-mirror three-level runs
subject to only \(k(k-1)/2\) equality constraints. Because the
coefficients are integers, the equalities are exact (no numerical tolerance
enters the optimisation); a floating-point is_omars()
re-check guards every accepted design as a sanity check.
Choosing the run size and the design#
When
n_runsis given it is used directly (it must be odd and larger than the number of second-order parameters \(1 + 2k + k(k-1)/2\)).Otherwise the solver minimises the run count within a window to return the smallest feasible design that still leaves error degrees of freedom.
Several distinct designs are then enumerated at that run size by adding no-good cuts (forbidding a previously found selection), and the winner is chosen by
selection_criterion:"dominance"(default) keeps the Pareto front on D-efficiency (higher is better) and the maximum second-order correlation (lower is better), then prefers the smallest, most efficient design."d_efficiency"maximises the D-efficiency of the full second-order model."min_second_order_correlation"minimises the largest second-order correlation.
Optionally,
satisficesets acceptability thresholds that are applied before the dominance/criterion step: a design is kept only if it clears every threshold. Supported keys are"d_efficiency"(a minimum) and"max_second_order_correlation"(a maximum), e.g.satisfice={"d_efficiency": 5.0, "max_second_order_correlation": 0.7}. Together these implement the satisficing-and-dominance multicriteria selection of Nunez Ares and Goos (2020): first discard designs that fail the minimum bars (satisficing), then drop dominated designs and choose from the Pareto front (dominance). This deliberately avoids collapsing several criteria into a single weighted score, which would hide the trade-offs. AValueErroris raised if no enumerated design meets the thresholds.
The returned DesignResult records the
provenance and a search report under metadata (family, sparsity,
expected_error_df, d_efficiency, max_second_order_correlation and an
omars_search report with the ILP iteration count and solver time).
Performance: iterations and timing by factor count#
The table below reports, for the default settings (the automatic smallest-size
search with max_candidates=6), the size of the candidate half-pool, the run
size of the smallest design found, the resulting error degrees of freedom, the
number of ILP solves performed (the iteration count: one minimise-size probe
plus the no-good-cut re-solves), and the cumulative CBC solver time. Times were
measured single-threaded on an x86_64 machine with CPython 3.11 and CBC (the
solver bundled with PuLP); they are indicative and will vary by machine.
Factors \(k\) |
Half-pool size |
Runs \(N\) |
Error df |
ILP solves |
Solver time (s) |
|---|---|---|---|---|---|
3 |
13 |
13 |
3 |
6 |
0.03 |
4 |
40 |
19 |
4 |
6 |
0.05 |
5 |
121 |
27 |
6 |
6 |
0.14 |
6 |
364 |
35 |
7 |
6 |
5.6 |
7 |
1093 |
45 |
9 |
6 |
39 |
The iteration count is fixed by max_candidates (each iteration is a full ILP
solve); the cost per iteration grows with the half-pool size \((3^k - 1)/2\)
and the number of orthogonality constraints \(k(k-1)/2\). Three to six
factors solve in well under a second; seven factors take tens of seconds. Beyond
seven factors the half-pool grows quickly, so a longer solver_options["time_limit"]
(default 60 s) or a tighter n_runs is advisable.
Limitations#
Foldover family only.
generate_omarsbuilds the (dominant) foldover OMARS family. The rarer non-foldover members from the enumerated catalogue are a documented future extension.Odd run counts. A foldover design has \(2h + 1\) runs, so
n_runsmust be odd.Full second-order conditioning. The smallest OMARS designs are highly aliased in the second-order block by construction, so the D-efficiency of the full quadratic model is low; this is expected, and is exactly why
analyze_omars()resolves the second-order terms in stages rather than fitting them all at once. Request a largern_runsfor a better-conditioned design.
References#
Nunez Ares, J. and Goos, P. (2020). “Enumeration and multicriteria selection of orthogonal minimally aliased response surface designs.” Technometrics, 62(1):21-36.
Nunez Ares, J. and Goos, P. (2019). “An integer linear programming approach to find trend-robust run orders of experimental designs.” Journal of Quality Technology.