Course wrap-up: vocabulary, API map, and what comes next#
Source worksheet: yint.org/w12 - the final week’s concept review.
Modules 1 to 7 spent ~50 pages of narrative and code on the fundamentals of designed experiments. This wrap-up module gathers every term and habit the course covers, in one table, so you can test your own understanding by either restating each term in plain language or jumping to the module where it first appeared.
It also maps each habit to the part of process_improve that implements it, so when a colleague asks “how do I do X again?”, the answer is at most two clicks away.
Tip
Two ways to use this page:
As a checklist. Scan the 35 concepts below. For each one, can you explain it to a colleague in two sentences? If not, jump back to the module that owns it.
As an API map. When you start a new study, the right entry point into
process_improveis usually one of a handful of functions; the table at the end of this page lists them.
The 35 concepts, organized by module#
Every concept from the week-12 worksheet, in order of where it first shows up in this series.
Phase A - Foundations (Modules 1 and 2)#
Concept |
One-line definition |
First in |
|---|---|---|
factor |
Something we deliberately change. |
|
categorical factor |
A factor whose levels are discrete (e.g., R / W). |
|
numeric factor |
A factor on a continuous scale. |
|
outcome / response |
A measured quantity that depends on the factors. |
|
objective |
What we are trying to do (maximize, minimize, target). |
|
levels |
The specific values a factor takes in the design. |
|
main effect |
Average response change moving a factor from low to high. |
|
coded value |
Factor on the |
|
real-world value |
Factor in its physical units. |
|
average effect |
Mean response across one factor level. |
|
model prediction |
What the fitted equation says at a given point. |
|
extrapolate |
Predict outside the design region (use with care). |
|
interactions |
Effect of one factor depending on the level of another. |
|
one-factor-at-a-time |
Sequential single-factor tweaks. Misses interactions. |
Phase B - Full factorial designs (Modules 3 and 4)#
Concept |
One-line definition |
First in |
|---|---|---|
full factorial |
All combinations of all factor levels. |
|
cube plot |
3-D visualization of a 2^3 design. |
|
contour plot |
2-D map of the response surface. |
|
center point |
A run at the middle of every factor range. |
|
replicates |
Repeated runs at the same condition. |
|
standard error |
Estimated noise level of a coefficient. |
|
interaction plots |
Lines per level of one factor against another. |
|
Pareto plot |
Bar chart of effect magnitudes, biggest first. |
|
little / no effect |
A factor whose coefficient is dwarfed by noise. |
|
noise level |
Run-to-run variation under “identical” conditions. |
Phase C - Doing less, learning more (Modules 5 and 6)#
Concept |
One-line definition |
First in |
|---|---|---|
half-fraction |
Run half the corners of a full factorial. |
|
generators |
The equation(s) that build extra factors from existing columns. |
|
aliases / confounding |
Two effects whose contributions cannot be separated. |
|
defining relation |
The product of all generators with |
|
words |
Each term in the defining relation (e.g., |
|
resolution |
Length of the shortest word in the defining relation. |
|
screening experiments |
Sift many factors with a small design. |
|
trade-off table |
Standard chart for picking |
|
covariate |
A variable you can measure but not control. |
|
disturbance |
A variable you can neither measure nor control. |
|
nuisance variable |
A controllable factor you do not scientifically care about. |
|
controlled variable |
A factor you can set (factor / nuisance). |
|
blocking |
Account for nuisance factors by structuring the design. |
|
baseline |
A reference run at a known operating point. |
Phase D - Optimization (Module 7)#
Concept |
One-line definition |
First in |
|---|---|---|
sequential experiments |
A string of small designs, each one informed by the last. |
|
optimization |
Find the factor settings that best meet the objective. |
|
response surface |
The mapping from factors to response. |
|
steepest ascent |
Move along the response gradient. |
|
augmented model |
Adding new runs (e.g., axial points) to an existing design. |
|
nonlinearity |
Curvature in the response surface that linear models miss. |
That is the 35 concepts of week 12. If you can rattle off all of them, you are done.
The process_improve API, mapped to each step#
Step |
Function / class |
Module |
|---|---|---|
Define factors with real-world ranges |
|
|
Collect columns into an experiment |
|
|
Build a full factorial |
|
|
Build a fractional / response-surface design |
|
|
Fit a linear model |
|
|
Run a full analysis pipeline |
|
|
Predict at new points |
|
|
Evaluate a design before running it |
|
|
Augment a design (axial points, replicates) |
|
|
Multi-response optimization (desirability) |
|
Plot helpers all live under
process_improve.experiments.visualization and are reached either
through the dispatch function visualize_doe(plot_type=...) or
the explicit constructors:
visualize_doe(plot_type="square_plot" | "cube_plot" | "contour" | "surface_3d")for design and surface plots.visualize_doe(plot_type="pareto" | "half_normal" | "daniel")for effect-magnitude plots.visualize_doe(plot_type="residuals_vs_fitted" | "normal_probability" | "residuals_vs_order" | "box_cox")for residual diagnostics.visualize_doe(plot_type="interaction" | "steepest_ascent_path")for interaction and optimization-trace plots.
See the Designed Experiments reference page for the full module contents.
A quick multi-response example#
The library can pick a compromise operating point that balances several competing responses using desirability. Below, the same quadratic surface fitted in Module 7 is paired with a second response (call it “cost”) whose objective is to be minimized. The desirability search returns the factor settings that maximize the geometric mean of the two desirabilities.
[1]:
from process_improve.experiments import optimize_responses
# Re-fit the Module 7 surface and pair it with a synthetic "cost".
# We give the optimizer the coefficient lists directly.
yield_coefs = [
{"term": "Intercept", "coefficient": 69.77},
{"term": "x1", "coefficient": 8.07},
{"term": "x2", "coefficient": 3.75},
{"term": "I(x1 ** 2)", "coefficient": -3.03},
{"term": "I(x2 ** 2)", "coefficient": -1.80},
{"term": "x1:x2", "coefficient": -2.11},
]
cost_coefs = [
{"term": "Intercept", "coefficient": 50.0},
{"term": "x1", "coefficient": 5.0}, # cost climbs with x1
{"term": "x2", "coefficient": 3.0}, # and with x2
]
result = optimize_responses(
fitted_models=[
{"response_name": "yield_pct", "coefficients": yield_coefs,
"factor_names": ["x1", "x2"]},
{"response_name": "cost_eur", "coefficients": cost_coefs,
"factor_names": ["x1", "x2"]},
],
goals=[
{"response": "yield_pct", "goal": "maximize", "low": 50, "high": 80},
{"response": "cost_eur", "goal": "minimize", "low": 40, "high": 70},
],
method="desirability",
)
desir = result["desirability"]
print("Best compromise factor settings (coded units):")
for k, v in desir["optimal_coded"].items():
print(f" {k} = {v:+.3f}")
print()
print("Predicted responses at the compromise point:")
for k, v in desir["predicted_responses"].items():
print(f" {k} = {v:.2f}")
print()
print("Individual desirabilities:")
for k, v in desir["individual_desirability"].items():
print(f" {k} = {v:.3f}")
print(f"Composite desirability: {desir['composite_desirability']:.3f}")
Best compromise factor settings (coded units):
x1 = +0.418
x2 = -0.194
Predicted responses at the compromise point:
yield_pct = 71.99
cost_eur = 51.51
Individual desirabilities:
yield_pct = 0.733
cost_eur = 0.616
Composite desirability: 0.672
Guidance
Desirability is a compromise, not a Pareto optimum. If two
responses point in different directions (here, increasing yield
also increases cost) the optimizer settles somewhere in the
middle, weighted by the importance of each goal. Always plot
the desirability overlay (visualize_doe(plot_type="overlay"))
to see what you are giving up.
Confirmation and sensitivity#
The single most under-rated step at the end of an optimization:
Confirm. Run one experiment at the predicted optimum. If the measured response matches the prediction within noise, the model is trustworthy at that point.
Sensitivity check. Run two or three experiments slightly away from the optimum (in each factor direction). None of them should beat the optimum; if one does, the search continues.
Document. Save the entire script and the design history; reproducibility is the difference between “we found the optimum” and “we have a story we tell ourselves about the optimum”.
These three together are how you decide the work is done.
What to read next#
The companion textbook: Process Improvement using Data, Chapter 5, for the formal derivations and a more rigorous treatment of resolution and confounding.
The
process_improve.experimentsAPI reference for the full set of helpers (optimization strategies, design augmentation, knowledge-base advice).The case studies under
docs/user_guide/case_studiesfor larger real-world worked analyses, including the oil-company factorial.
That is the end of Applied DoE. Good luck with your next study.