Course wrap-up: vocabulary, API map, and what comes next#

Source worksheet: yint.org/w12 - the final week’s concept review.

Modules 1 to 7 spent ~50 pages of narrative and code on the fundamentals of designed experiments. This wrap-up module gathers every term and habit the course covers, in one table, so you can test your own understanding by either restating each term in plain language or jumping to the module where it first appeared.

It also maps each habit to the part of process_improve that implements it, so when a colleague asks “how do I do X again?”, the answer is at most two clicks away.

Tip

Two ways to use this page:

  • As a checklist. Scan the 35 concepts below. For each one, can you explain it to a colleague in two sentences? If not, jump back to the module that owns it.

  • As an API map. When you start a new study, the right entry point into process_improve is usually one of a handful of functions; the table at the end of this page lists them.

The 35 concepts, organized by module#

Every concept from the week-12 worksheet, in order of where it first shows up in this series.

Phase A - Foundations (Modules 1 and 2)#

Concept

One-line definition

First in

factor

Something we deliberately change.

Module 1

categorical factor

A factor whose levels are discrete (e.g., R / W).

Module 1

numeric factor

A factor on a continuous scale.

Module 1

outcome / response

A measured quantity that depends on the factors.

Module 1

objective

What we are trying to do (maximize, minimize, target).

Module 1

levels

The specific values a factor takes in the design.

Module 1

main effect

Average response change moving a factor from low to high.

Module 1

coded value

Factor on the [-1, +1] scale.

Module 2

real-world value

Factor in its physical units.

Module 2

average effect

Mean response across one factor level.

Module 1

model prediction

What the fitted equation says at a given point.

Module 2

extrapolate

Predict outside the design region (use with care).

Module 2

interactions

Effect of one factor depending on the level of another.

Module 2

one-factor-at-a-time

Sequential single-factor tweaks. Misses interactions.

Module 2

Phase B - Full factorial designs (Modules 3 and 4)#

Concept

One-line definition

First in

full factorial

All combinations of all factor levels.

Module 3

cube plot

3-D visualization of a 2^3 design.

Module 3

contour plot

2-D map of the response surface.

Module 3

center point

A run at the middle of every factor range.

Module 3

replicates

Repeated runs at the same condition.

Module 3

standard error

Estimated noise level of a coefficient.

Module 3

interaction plots

Lines per level of one factor against another.

Module 3

Pareto plot

Bar chart of effect magnitudes, biggest first.

Module 3

little / no effect

A factor whose coefficient is dwarfed by noise.

Module 3

noise level

Run-to-run variation under “identical” conditions.

Module 3

Phase C - Doing less, learning more (Modules 5 and 6)#

Concept

One-line definition

First in

half-fraction

Run half the corners of a full factorial.

Module 5

generators

The equation(s) that build extra factors from existing columns.

Module 5

aliases / confounding

Two effects whose contributions cannot be separated.

Module 5

defining relation

The product of all generators with I.

Module 5

words

Each term in the defining relation (e.g., ABCD).

Module 5

resolution

Length of the shortest word in the defining relation.

Module 5

screening experiments

Sift many factors with a small design.

Module 5

trade-off table

Standard chart for picking k and p.

Module 6

covariate

A variable you can measure but not control.

Module 5

disturbance

A variable you can neither measure nor control.

Module 5

nuisance variable

A controllable factor you do not scientifically care about.

Module 5

controlled variable

A factor you can set (factor / nuisance).

Module 5

blocking

Account for nuisance factors by structuring the design.

Module 6

baseline

A reference run at a known operating point.

Module 6

Phase D - Optimization (Module 7)#

Concept

One-line definition

First in

sequential experiments

A string of small designs, each one informed by the last.

Module 6

optimization

Find the factor settings that best meet the objective.

Module 6

response surface

The mapping from factors to response.

Module 7

steepest ascent

Move along the response gradient.

Module 7

augmented model

Adding new runs (e.g., axial points) to an existing design.

Module 7

nonlinearity

Curvature in the response surface that linear models miss.

Module 7

That is the 35 concepts of week 12. If you can rattle off all of them, you are done.

The process_improve API, mapped to each step#

Step

Function / class

Module

Define factors with real-world ranges

c(), Column

process_improve.experiments

Collect columns into an experiment

gather()

process_improve.experiments

Build a full factorial

full_factorial()

process_improve.experiments

Build a fractional / response-surface design

generate_design()

process_improve.experiments

Fit a linear model

lm()

process_improve.experiments.models

Run a full analysis pipeline

analyze_experiment()

process_improve.experiments.analysis

Predict at new points

predict()

process_improve.experiments.models

Evaluate a design before running it

evaluate_design()

process_improve.experiments.evaluate

Augment a design (axial points, replicates)

augment_design()

process_improve.experiments.augment

Multi-response optimization (desirability)

optimize_responses()

process_improve.experiments.optimization

Plot helpers all live under process_improve.experiments.visualization and are reached either through the dispatch function visualize_doe(plot_type=...) or the explicit constructors:

  • visualize_doe(plot_type="square_plot" | "cube_plot" | "contour" | "surface_3d") for design and surface plots.

  • visualize_doe(plot_type="pareto" | "half_normal" | "daniel") for effect-magnitude plots.

  • visualize_doe(plot_type="residuals_vs_fitted" | "normal_probability" | "residuals_vs_order" | "box_cox") for residual diagnostics.

  • visualize_doe(plot_type="interaction" | "steepest_ascent_path") for interaction and optimization-trace plots.

See the Designed Experiments reference page for the full module contents.

A quick multi-response example#

The library can pick a compromise operating point that balances several competing responses using desirability. Below, the same quadratic surface fitted in Module 7 is paired with a second response (call it “cost”) whose objective is to be minimized. The desirability search returns the factor settings that maximize the geometric mean of the two desirabilities.

[1]:
from process_improve.experiments import optimize_responses

# Re-fit the Module 7 surface and pair it with a synthetic "cost".
# We give the optimizer the coefficient lists directly.

yield_coefs = [
    {"term": "Intercept",    "coefficient": 69.77},
    {"term": "x1",           "coefficient":  8.07},
    {"term": "x2",           "coefficient":  3.75},
    {"term": "I(x1 ** 2)",   "coefficient": -3.03},
    {"term": "I(x2 ** 2)",   "coefficient": -1.80},
    {"term": "x1:x2",        "coefficient": -2.11},
]
cost_coefs = [
    {"term": "Intercept", "coefficient": 50.0},
    {"term": "x1",        "coefficient":  5.0},   # cost climbs with x1
    {"term": "x2",        "coefficient":  3.0},   # and with x2
]

result = optimize_responses(
    fitted_models=[
        {"response_name": "yield_pct", "coefficients": yield_coefs,
         "factor_names": ["x1", "x2"]},
        {"response_name": "cost_eur",  "coefficients": cost_coefs,
         "factor_names": ["x1", "x2"]},
    ],
    goals=[
        {"response": "yield_pct", "goal": "maximize", "low": 50, "high": 80},
        {"response": "cost_eur",  "goal": "minimize", "low": 40, "high": 70},
    ],
    method="desirability",
)
desir = result["desirability"]
print("Best compromise factor settings (coded units):")
for k, v in desir["optimal_coded"].items():
    print(f"  {k} = {v:+.3f}")
print()
print("Predicted responses at the compromise point:")
for k, v in desir["predicted_responses"].items():
    print(f"  {k} = {v:.2f}")
print()
print("Individual desirabilities:")
for k, v in desir["individual_desirability"].items():
    print(f"  {k} = {v:.3f}")
print(f"Composite desirability: {desir['composite_desirability']:.3f}")
Best compromise factor settings (coded units):
  x1 = +0.418
  x2 = -0.194

Predicted responses at the compromise point:
  yield_pct = 71.99
  cost_eur = 51.51

Individual desirabilities:
  yield_pct = 0.733
  cost_eur = 0.616
Composite desirability: 0.672

Guidance

Desirability is a compromise, not a Pareto optimum. If two responses point in different directions (here, increasing yield also increases cost) the optimizer settles somewhere in the middle, weighted by the importance of each goal. Always plot the desirability overlay (visualize_doe(plot_type="overlay")) to see what you are giving up.

Confirmation and sensitivity#

The single most under-rated step at the end of an optimization:

  1. Confirm. Run one experiment at the predicted optimum. If the measured response matches the prediction within noise, the model is trustworthy at that point.

  2. Sensitivity check. Run two or three experiments slightly away from the optimum (in each factor direction). None of them should beat the optimum; if one does, the search continues.

  3. Document. Save the entire script and the design history; reproducibility is the difference between “we found the optimum” and “we have a story we tell ourselves about the optimum”.

These three together are how you decide the work is done.