Course wrap-up: vocabulary, API map, and what comes next#

Source worksheet: yint.org/w12 - the final week’s concept review.

Modules 1 to 7 spent ~50 pages of narrative and code on the fundamentals of designed experiments. This wrap-up module gathers every term and habit the course covers, in one table, so you can test your own understanding by either restating each term in plain language or jumping to the module where it first appeared.

It also maps each habit to the part of process_improve that implements it, so when a colleague asks “how do I do X again?”, the answer is at most two clicks away.

Tip

Two ways to use this page:

As a checklist. Scan the 35 concepts below. For each one, can you explain it to a colleague in two sentences? If not, jump back to the module that owns it.
As an API map. When you start a new study, the right entry point into process_improve is usually one of a handful of functions; the table at the end of this page lists them.

The 35 concepts, organized by module#

Every concept from the week-12 worksheet, in order of where it first shows up in this series.

Phase A - Foundations (Modules 1 and 2)#

Concept	One-line definition	First in
factor	Something we deliberately change.	Module 1
categorical factor	A factor whose levels are discrete (e.g., R / W).	Module 1
numeric factor	A factor on a continuous scale.	Module 1
outcome / response	A measured quantity that depends on the factors.	Module 1
objective	What we are trying to do (maximize, minimize, target).	Module 1
levels	The specific values a factor takes in the design.	Module 1
main effect	Average response change moving a factor from low to high.	Module 1
coded value	Factor on the `[-1, +1]` scale.	Module 2
real-world value	Factor in its physical units.	Module 2
average effect	Mean response across one factor level.	Module 1
model prediction	What the fitted equation says at a given point.	Module 2
extrapolate	Predict outside the design region (use with care).	Module 2
interactions	Effect of one factor depending on the level of another.	Module 2
one-factor-at-a-time	Sequential single-factor tweaks. Misses interactions.	Module 2

Phase B - Full factorial designs (Modules 3 and 4)#

Concept	One-line definition	First in
full factorial	All combinations of all factor levels.	Module 3
cube plot	3-D visualization of a 2^3 design.	Module 3
contour plot	2-D map of the response surface.	Module 3
center point	A run at the middle of every factor range.	Module 3
replicates	Repeated runs at the same condition.	Module 3
standard error	Estimated noise level of a coefficient.	Module 3
interaction plots	Lines per level of one factor against another.	Module 3
Pareto plot	Bar chart of effect magnitudes, biggest first.	Module 3
little / no effect	A factor whose coefficient is dwarfed by noise.	Module 3
noise level	Run-to-run variation under “identical” conditions.	Module 3

Phase C - Doing less, learning more (Modules 5 and 6)#

Concept	One-line definition	First in
half-fraction	Run half the corners of a full factorial.	Module 5
generators	The equation(s) that build extra factors from existing columns.	Module 5
aliases / confounding	Two effects whose contributions cannot be separated.	Module 5
defining relation	The product of all generators with `I`.	Module 5
words	Each term in the defining relation (e.g., `ABCD`).	Module 5
resolution	Length of the shortest word in the defining relation.	Module 5
screening experiments	Sift many factors with a small design.	Module 5
trade-off table	Standard chart for picking `k` and `p`.	Module 6
covariate	A variable you can measure but not control.	Module 5
disturbance	A variable you can neither measure nor control.	Module 5
nuisance variable	A controllable factor you do not scientifically care about.	Module 5
controlled variable	A factor you can set (factor / nuisance).	Module 5
blocking	Account for nuisance factors by structuring the design.	Module 6
baseline	A reference run at a known operating point.	Module 6

Phase D - Optimization (Module 7)#

Concept	One-line definition	First in
sequential experiments	A string of small designs, each one informed by the last.	Module 6
optimization	Find the factor settings that best meet the objective.	Module 6
response surface	The mapping from factors to response.	Module 7
steepest ascent	Move along the response gradient.	Module 7
augmented model	Adding new runs (e.g., axial points) to an existing design.	Module 7
nonlinearity	Curvature in the response surface that linear models miss.	Module 7

That is the 35 concepts of week 12. If you can rattle off all of them, you are done.

The `process_improve` API, mapped to each step#

Step	Function / class	Module
Define factors with real-world ranges	`c()`, `Column`	`process_improve.experiments`
Collect columns into an experiment	`gather()`	`process_improve.experiments`
Build a full factorial	`full_factorial()`	`process_improve.experiments`
Build a fractional / response-surface design	`generate_design()`	`process_improve.experiments`
Fit a linear model	`lm()`	`process_improve.experiments.models`
Run a full analysis pipeline	`analyze_experiment()`	`process_improve.experiments.analysis`
Predict at new points	`predict()`	`process_improve.experiments.models`
Evaluate a design before running it	`evaluate_design()`	`process_improve.experiments.evaluate`
Augment a design (axial points, replicates)	`augment_design()`	`process_improve.experiments.augment`
Multi-response optimization (desirability)	`optimize_responses()`	`process_improve.experiments.optimization`

Plot helpers all live under process_improve.experiments.visualization and are reached either through the dispatch function visualize_doe(plot_type=...) or the explicit constructors:

visualize_doe(plot_type="square_plot" | "cube_plot" | "contour" | "surface_3d") for design and surface plots.
visualize_doe(plot_type="pareto" | "half_normal" | "daniel") for effect-magnitude plots.
visualize_doe(plot_type="residuals_vs_fitted" | "normal_probability" | "residuals_vs_order" | "box_cox") for residual diagnostics.
visualize_doe(plot_type="interaction" | "steepest_ascent_path") for interaction and optimization-trace plots.

See the Designed Experiments reference page for the full module contents.

A quick multi-response example#

The library can pick a compromise operating point that balances several competing responses using desirability. Below, the same quadratic surface fitted in Module 7 is paired with a second response (call it “cost”) whose objective is to be minimized. The desirability search returns the factor settings that maximize the geometric mean of the two desirabilities.

[1]:

from process_improve.experiments import optimize_responses

# Re-fit the Module 7 surface and pair it with a synthetic "cost".
# We give the optimizer the coefficient lists directly.

yield_coefs = [
    {"term": "Intercept",    "coefficient": 69.77},
    {"term": "x1",           "coefficient":  8.07},
    {"term": "x2",           "coefficient":  3.75},
    {"term": "I(x1 ** 2)",   "coefficient": -3.03},
    {"term": "I(x2 ** 2)",   "coefficient": -1.80},
    {"term": "x1:x2",        "coefficient": -2.11},
]
cost_coefs = [
    {"term": "Intercept", "coefficient": 50.0},
    {"term": "x1",        "coefficient":  5.0},   # cost climbs with x1
    {"term": "x2",        "coefficient":  3.0},   # and with x2
]

result = optimize_responses(
    fitted_models=[
        {"response_name": "yield_pct", "coefficients": yield_coefs,
         "factor_names": ["x1", "x2"]},
        {"response_name": "cost_eur",  "coefficients": cost_coefs,
         "factor_names": ["x1", "x2"]},
    ],
    goals=[
        {"response": "yield_pct", "goal": "maximize", "low": 50, "high": 80},
        {"response": "cost_eur",  "goal": "minimize", "low": 40, "high": 70},
    ],
    method="desirability",
)
desir = result["desirability"]
print("Best compromise factor settings (coded units):")
for k, v in desir["optimal_coded"].items():
    print(f"  {k} = {v:+.3f}")
print()
print("Predicted responses at the compromise point:")
for k, v in desir["predicted_responses"].items():
    print(f"  {k} = {v:.2f}")
print()
print("Individual desirabilities:")
for k, v in desir["individual_desirability"].items():
    print(f"  {k} = {v:.3f}")
print(f"Composite desirability: {desir['composite_desirability']:.3f}")

Best compromise factor settings (coded units):
  x1 = +0.418
  x2 = -0.194

Predicted responses at the compromise point:
  yield_pct = 71.99
  cost_eur = 51.51

Individual desirabilities:
  yield_pct = 0.733
  cost_eur = 0.616
Composite desirability: 0.672

Guidance

Desirability is a compromise, not a Pareto optimum. If two responses point in different directions (here, increasing yield also increases cost) the optimizer settles somewhere in the middle, weighted by the importance of each goal. Always plot the desirability overlay (visualize_doe(plot_type="overlay")) to see what you are giving up.

Confirmation and sensitivity#

The single most under-rated step at the end of an optimization:

Confirm. Run one experiment at the predicted optimum. If the measured response matches the prediction within noise, the model is trustworthy at that point.
Sensitivity check. Run two or three experiments slightly away from the optimum (in each factor direction). None of them should beat the optimum; if one does, the search continues.
Document. Save the entire script and the design history; reproducibility is the difference between “we found the optimum” and “we have a story we tell ourselves about the optimum”.

These three together are how you decide the work is done.