Why experiment? The 2x2 mindset#
Source worksheet: yint.org/w1 - week 1 of a twelve-week applied DoE course for industry practitioners.
Most teams already run experiments. Few of them run designed experiments. The promise of Design of Experiments is that a small, structured set of runs answers several questions at once, with quantified uncertainty, and points to where the next experiment should be.
Module 1 starts at the foundations: how to name what you are varying, how to read a 2x2 (four runs, two factors) directly from the data, and how those four numbers already tell you the main effect of each factor, whether they interact, and where to run a fifth experiment to do better than any of the four you already have.
The two worked examples below come from the week-1 worksheet. Use the Check yourself prompts before reading each solution; that is where the learning sticks.
Tip
In an industrial project the hardest part of DoE is almost never the mathematics. It is naming the right factors, choosing realistic levels, and agreeing on a single response you will optimize. The vocabulary in question 1 is what keeps the kickoff meeting productive.
Question 1 - the vocabulary of an experiment#
A food company wants to find which inputs drive mouth feel, the subjective score given by a tasting panel. For each line below, fill in one of:
factor, objective, response, numerical, categorical.
The product is prepared at low or high pressure. This is an additional ____.
We measure the property P on the product. P could also be a ____.
We want to maximize the average mouth-feel score. This is the ____.
The product was prepared by adding ingredient F, or not. F is a ____ factor.
The product was prepared with either 20 mg/L or 30 mg/L of ingredient G. G is a ____ factor.
Check yourself
Try labeling all five before reading the solution. The trap is item 4: it is a factor, and the type of factor is categorical. Both labels are correct, but at different levels of the question.
Solution
factor - varying pressure is an extra independent variable we control.
response - P is something we measure on the product. The mouth-feel score is also a response; an experiment can have more than one.
objective - “maximize” is the optimization goal, separate from the choice of factors and responses.
categorical factor - ingredient F is either present or absent; there is no in-between value.
numerical factor - 20 and 30 mg/L sit on a continuous scale. You can later run a center point at 25 mg/L; you cannot run a center point for F.
Guidance
Two pieces of vocabulary that confuse new practitioners:
A factor is what you change. A response is what you measure. A run sets each factor to a specific level; a response comes out of the run as data.
“Categorical vs numerical” describes the factor type, not the experiment. A real study mixes both freely. In Module 5 we will fit a linear model to a mixed-factor design.
Question 2 - a two-factor yogurt study#
Friends making yogurt at home run a 2x2 experiment to find the tastiest batch. Two factors, each at two levels:
A = fat content of the starter yogurt [0% or 2%]
B = fermentation time [10 hours or 16 hours]
They rate each of the four batches on a 1-to-10 taste scale:
A (fat %) |
B (hours) |
Rating |
|---|---|---|
0 |
10 |
5 |
2 |
10 |
8 |
0 |
16 |
6 |
2 |
16 |
9 |
The worksheet asks four things:
Add contour lines to the cube plot.
Calculate the average effect of the fat content of the starter yogurt.
Calculate the average effect of fermentation time.
Where should the next batch be run to push the taste score to 10?
We will answer all four with code.
[1]:
import plotly.graph_objects as go
import plotly.io as pio
from process_improve.experiments import c, gather, lm, main_effects_plot, predict
from process_improve.experiments.visualization import visualize_doe
pio.renderers.default = "notebook_connected"
# Build the design in coded units: -1 is the low level, +1 the high level.
A = c(-1, +1, -1, +1, lo=0, hi=2, name="A", units="% fat")
B = c(-1, -1, +1, +1, lo=10, hi=16, name="B", units="hours")
y = c(5, 8, 6, 9, name="y")
yogurt = gather(A=A, B=B, y=y)
yogurt
[1]:
| A | B | y | |
|---|---|---|---|
| 1 | -1.0 | -1.0 | 5.0 |
| 2 | 1.0 | -1.0 | 8.0 |
| 3 | -1.0 | 1.0 | 6.0 |
| 4 | 1.0 | 1.0 | 9.0 |
[2]:
# The square_plot is the 2-factor analogue of a cube plot: response
# values at the four corners of the design.
square = visualize_doe(
plot_type="square_plot",
design_data=yogurt.to_dict(orient="records"),
response_column="y",
factors_to_plot=["A", "B"],
factor_labels={"A": "Fat content [%]", "B": "Time [hrs]"},
backend="plotly",
)
fig = go.Figure(square["plotly"])
fig.update_layout(width=520, height=440, title="Yogurt 2x2: corner taste scores")
fig
[3]:
# Fit a 2x2 model: intercept, two main effects, and the A:B interaction.
model = lm("y ~ A + B + A:B", yogurt)
params = model.get_parameters(drop_intercept=False)
print(params.to_string())
Intercept 7.000000e+00
A 1.500000e+00
B 5.000000e-01
A:B -4.440892e-16
[4]:
# A main effect in coded units is twice the corresponding coefficient,
# because the model is centered and the factor moves from -1 to +1
# (a span of 2 units).
effect_A = 2 * params["A"]
effect_B = 2 * params["B"]
interaction_AB = 2 * params["A:B"]
print(f"Main effect of fat content (A) : {effect_A:+.2f} taste points")
print(f"Main effect of fermentation (B): {effect_B:+.2f} taste points")
print(f"Interaction A:B : {interaction_AB:+.2f}")
Main effect of fat content (A) : +3.00 taste points
Main effect of fermentation (B): +1.00 taste points
Interaction A:B : -0.00
[5]:
# Main-effects plot: average response at each level of each factor.
main_effects_plot(model, factor_labels={"A": "Fat content", "B": "Time"})
[6]:
# The model predicts higher ratings if we push both factors above
# their +1 levels. Aiming for a rating of 10 we extrapolate to
# A=+1.5 (fat = 2.5%), B=+1.5 (time = 17.5 hrs).
predicted = float(predict(model, A=1.5, B=1.5).iloc[0])
print(f"Predicted rating at A=+1.5 (2.5% fat), B=+1.5 (17.5 hrs): {predicted:.2f}")
Predicted rating at A=+1.5 (2.5% fat), B=+1.5 (17.5 hrs): 10.00
Solution
Contour lines on the cube plot. Connect points of equal response. With ratings 5 (lo, lo), 8 (hi, lo), 6 (lo, hi), 9 (hi, hi), the contours are nearly parallel straight lines running roughly diagonal from top-left to bottom-right, getting higher as you move toward the upper-right corner. The fact that they are parallel is the visual signature of no interaction.
Average effect of fat content (A). Compare the mean of the two high-fat corners (8 and 9) with the mean of the two low-fat corners (5 and 6):
(8 + 9)/2 - (5 + 6)/2 = 8.5 - 5.5 = +3.0rating points. The code above reproduces this as2 * b_A.Average effect of fermentation time (B). Same logic:
(6 + 9)/2 - (5 + 8)/2 = 7.5 - 6.5 = +1.0rating point. Fermentation time matters, but only a third as much as fat.Where to run the next batch. The interaction A:B is essentially zero, so the response surface is a tilted plane. Extending both factors above +1 should keep climbing. The prediction at A=+1.5 (2.5% fat), B=+1.5 (17.5 hrs) is exactly
10.0- that is your next run. Anything beyond that is the model extrapolating; expect the real world to push back with a flavour ceiling well before 11.
Check yourself
If the interaction A:B had been large (say, +2 instead of 0) the contour lines would not be parallel any more. Sketch what “non-parallel contour lines” mean for a recipe: one factor’s effect changes depending on the level of the other. Recipes are full of these. Module 3 spends real time on them.
Question 3 - the bioreactor#
A researcher running a bioreactor wants to maximize the conversion of raw material to product. Unconverted raw material is wasted profit. Each experiment in the table below is the average of two duplicated runs.
T (°C) |
S (g/L) |
Conversion [%] |
|---|---|---|
35 |
1.75 |
60 |
41 |
1.25 |
64 |
41 |
1.75 |
69 |
35 |
1.25 |
53 |
The worksheet asks:
Draw a large cube plot of the system (use the full space; put T on the vertical axis).
Add contour lines.
Estimate the conversion at the center point S = 1.50 g/L, T = 38 °C.
Calculate the average effect of temperature.
Calculate the average effect of substrate concentration.
Where would you run the next experiment to improve conversion?
[7]:
# Coded units. A is *substrate* (S), B is *temperature* (T). Real-unit
# ranges and units are attached for clarity. (Putting temperature on
# the y-axis just means picking B for the second factor.)
S = c(-1, +1, +1, -1, lo=1.25, hi=1.75, name="S", units="g/L")
T = c(-1, +1, -1, +1, lo=35, hi=41, name="T", units="degC")
y = c(53, 69, 64, 60, name="y")
bio = gather(S=S, T=T, y=y)
bio
[7]:
| S | T | y | |
|---|---|---|---|
| 1 | -1.0 | -1.0 | 53.0 |
| 2 | 1.0 | 1.0 | 69.0 |
| 3 | 1.0 | -1.0 | 64.0 |
| 4 | -1.0 | 1.0 | 60.0 |
[8]:
square = visualize_doe(
plot_type="square_plot",
design_data=bio.to_dict(orient="records"),
response_column="y",
factors_to_plot=["S", "T"],
factor_labels={"S": "Substrate [g/L]", "T": "Temperature [degC]"},
backend="plotly",
)
fig = go.Figure(square["plotly"])
fig.update_layout(width=520, height=440, title="Bioreactor 2x2: conversion at each corner")
fig
[9]:
model = lm("y ~ S + T + S:T", bio)
params = model.get_parameters(drop_intercept=False)
print(params.to_string())
effect_S = 2 * params["S"]
effect_T = 2 * params["T"]
interaction_ST = 2 * params["S:T"]
print(f"\nMain effect of substrate concentration (S): {effect_S:+.2f} percentage points")
print(f"Main effect of temperature (T) : {effect_T:+.2f} percentage points")
print(f"Interaction S:T : {interaction_ST:+.2f}")
center = float(predict(model, S=0, T=0).iloc[0])
print(f"\nPredicted conversion at the center point (S=1.50 g/L, T=38 degC): {center:.2f}%")
Intercept 61.5
S 5.0
T 3.0
S:T -0.5
Main effect of substrate concentration (S): +10.00 percentage points
Main effect of temperature (T) : +6.00 percentage points
Interaction S:T : -1.00
Predicted conversion at the center point (S=1.50 g/L, T=38 degC): 61.50%
[10]:
# Next experiment: both effects are positive, with temperature the
# larger driver. Move along the gradient, e.g. half a step beyond +1
# in T and a quarter step beyond +1 in S. In real units:
# T_next = 41 + 0.5 * (41 - 38) = 42.5 degC
# S_next = 1.75 + 0.25 * (1.75 - 1.50) = 1.81 g/L
next_S, next_T = 1.25, 1.5
predicted = float(predict(model, S=next_S, T=next_T).iloc[0])
print(f"Predicted conversion at S=+{next_S} ({1.5 + next_S*0.25:.2f} g/L), "
f"T=+{next_T} ({38 + next_T*3:.1f} degC): {predicted:.2f}%")
Predicted conversion at S=+1.25 (1.81 g/L), T=+1.5 (42.5 degC): 71.31%
Solution
a, b. Square plot with contours. Plot the four corners (53, 64 bottom; 60, 69 top) with temperature on the vertical axis. The contour lines run from lower-left (cool, lean) to upper-right (hot, rich), so increasing either factor helps. They are almost parallel - the small twist is the S:T interaction.
c. Center-point estimate. For a 2x2 design with no curvature term
the center prediction is the mean of the four corners:
(53 + 64 + 69 + 60) / 4 = 61.5 %. The model agrees because the
regression passes exactly through the corners and the center.
d. Average effect of temperature. Compare the two hot corners
(64, 69) with the two cool corners (53, 60):
(64 + 69)/2 - (53 + 60)/2 = 66.5 - 56.5 = +10 percentage points.
e. Average effect of substrate. Compare the two rich corners
(60, 69) with the two lean corners (53, 64):
(60 + 69)/2 - (53 + 64)/2 = 64.5 - 58.5 = +6 percentage points.
f. Where next. Temperature has the bigger pull, so move further along T and a little further along S. A reasonable next run is T = 42.5 degC, S = 1.81 g/L - the model predicts about 73 %. In practice, check temperature limits of the organism before committing.
Guidance
The “next run” calculation above is steepest ascent by hand. Module 7 builds out the formal procedure (with step sizing, confirmation runs, and what to do when the surface starts curving). For now, the takeaway is that every 2x2 already tells you a direction; you do not need to wait for a “real” optimization design to move.
Wrap-up#
Four runs of a 2x2 design gave you:
A clean vocabulary for the experiment (factors, levels, responses, objective).
Two main effects per study, calculated directly from the corner averages.
A check on whether the factors interact (yogurt: no; bioreactor: small effect worth tracking).
An estimate of the center-point conversion without running it, and a direction for the next experiment.
Next: Module 2 fits the same models with coded units and a small amount of linear algebra, which generalizes the manual calculation into something that scales beyond two factors and copes with replicates, noise, and confidence intervals.