Full factorials: replicates, center points, and three factors#

Source worksheets: yint.org/w3 and yint.org/w4 - weeks 3 and 4 of the applied DoE short course.

Modules 1 and 2 stopped at the four corners of a 2x2. In a real project you do three things on top of that, often before the experiment even finishes:

  1. Replicate corners so you can read the noise level, get a real confidence interval, and detect when a saturated model is hiding the residual error behind an automatic R^2 = 1.

  2. Run a center point at the middle of the design to check whether the response surface is genuinely a plane, or whether it is starting to curve.

  3. Add a third factor. The cube plot is back; the same toolkit scales without any new mathematics.

Module 3 works through three case studies that hit each of these in turn: a chemical side-product 2x2 with a center point (w3), an online shop 2x2 with replicates (w4), and a 3-factor stability study where one factor turns out to barely matter (w4).

Tip

Two habits to build in this module:

  • Every experiment gets a center point if you can afford it. One extra run earns you a degree of freedom for the regression and tells you whether the linear model is the right model.

  • Replicates give a direct estimate of the noise level. Without replicates a saturated design returns R^2 = 1 by construction, and any confidence interval relies on assumptions you cannot check from the data alone.

A. Side-product factorial with a center point (w3)#

A factorial experiment is run to minimize an unwanted side product in a chemical process. Two factors are varied:

  • A = additive volume [20 or 30 mL]

  • B = baffles in the reactor? [No or Yes]

Run

A (mL)

B (baffles)

Side product (g)

1

20

No

89

2

30

No

268

3

20

Yes

179

4

30

Yes

448

The worksheet asks for the prediction equation, an interpretation of each coefficient, a comment on the interaction, and where to run the next experiment to minimize the side product.

[1]:
import plotly.graph_objects as go
import plotly.io as pio

from process_improve.experiments import c, gather, lm, main_effects_plot, predict
from process_improve.experiments.visualization import visualize_doe

pio.renderers.default = "notebook_connected"

# Standard order: A, B coded.  Categorical B uses No=-1, Yes=+1.

A = c(-1, +1, -1, +1, lo=20, hi=30, name="A", units="mL")
B = c(-1, -1, +1, +1, lo=0,  hi=1,  name="B", units="0=No, 1=Yes")
y = c(89, 268, 179, 448, name="y")
side = gather(A=A, B=B, y=y)
side
[1]:
A B y
1 -1.0 -1.0 89.0
2 1.0 -1.0 268.0
3 -1.0 1.0 179.0
4 1.0 1.0 448.0
[2]:
model = lm("y ~ A + B + A:B", side)
params = model.get_parameters(drop_intercept=False)
print(params.to_string())
print()
print("Prediction equation:")
print(f"  y = {params['Intercept']:.0f} "
      f"+ {params['A']:+.0f} * x_A "
      f"+ {params['B']:+.0f} * x_B "
      f"+ {params['A:B']:+.0f} * x_A * x_B")
Intercept    246.0
A            112.0
B             67.5
A:B           22.5

Prediction equation:
  y = 246 + +112 * x_A + +68 * x_B + +22 * x_A * x_B
[3]:
sq = visualize_doe(
    plot_type="square_plot",
    design_data=side.to_dict(orient="records"),
    response_column="y",
    factors_to_plot=["A", "B"],
    factor_labels={"A": "Additive [mL]", "B": "Baffles"},
    backend="plotly",
)
fig = go.Figure(sq["plotly"])
fig.update_layout(width=520, height=440, title="Side product at the four corners")
fig

Solution

The fitted model in coded units is y = 246 + 112 * x_A + 67.5 * x_B + 22.5 * x_A * x_B.

  • Coefficient on A (= +112 g). Moving the additive from 20 to 30 mL increases the side product on average by 2 * 112 = 224 g. That is a large response shift, and in the wrong direction for the objective of minimizing the side product.

  • Coefficient on B (= +67.5 g). Adding baffles makes things worse too, by 2 * 67.5 = 135 g on average. Surprising for a “should help mixing” intuition, but the data are the data.

  • Interaction A:B (= +22.5 g). Modest in absolute terms but same sign as both main effects: baffles make the additive effect worse.

Where to run next. To minimize the side product, move down in coded units on both factors: lower additive volume (below 20 mL) and no baffles. A reasonable next run is at A = 15 mL (x_A = -2), B = No (x_B = -1) - predicted y = 246 - 224 - 67.5 + 45 = -0.5 g, i.e. close to zero. In practice you stop when the model says “we cannot go lower” or when the prediction is suspicious because you are extrapolating.

Adding two center-point runs#

Two extra experiments are run at the center of the additive range (25 mL), once with each level of baffles:

Run

A (mL)

B (baffles)

Side product (g)

5

25

No

186

6

25

Yes

300

These do not change which terms are in the model, but they let us check whether the linear-and-interaction model is actually the right model. If the center-point responses match the model’s predictions at x_A = 0, the surface is linear in A. If they sit well off the plane, the system has curvature and a quadratic term is needed.

[4]:
A2 = c(-1, +1, -1, +1,  0,  0, lo=20, hi=30, name="A", units="mL")
B2 = c(-1, -1, +1, +1, -1, +1, lo=0,  hi=1,  name="B", units="0=No, 1=Yes")
y2 = c(89, 268, 179, 448, 186, 300, name="y")
side2 = gather(A=A2, B=B2, y=y2)

model2 = lm("y ~ A + B + A:B", side2)
params2 = model2.get_parameters(drop_intercept=False)
print("Coefficients with the center points included:")
print(params2.to_string())
print()
print("Predicted vs observed at the center points:")
for label, point, observed in [
    ("(x_A=0, x_B=-1)", dict(A=0, B=-1), 186),
    ("(x_A=0, x_B=+1)", dict(A=0, B=+1), 300),
]:
    pred = float(predict(model2, **point).iloc[0])
    print(f"  {label}: predicted {pred:6.1f}, observed {observed} - diff {observed - pred:+.1f}")
Coefficients with the center points included:
Intercept    245.0
A            112.0
B             64.0
A:B           22.5

Predicted vs observed at the center points:
  (x_A=0, x_B=-1): predicted  181.0, observed 186 - diff +5.0
  (x_A=0, x_B=+1): predicted  309.0, observed 300 - diff -9.0

Solution

The coefficients barely move: intercept 246 -> 245, b_A = +112 (unchanged), b_B shifts a touch from +67.5 to +64, interaction stays at +22.5. The center-point responses are within a handful of grams of the linear prediction.

What this means. Inside the design window, the surface is essentially a plane - there is no evidence of curvature. The factorial alone is doing a good enough job. If the center point had landed far from the prediction (say 100 g off), you would have strong evidence of curvature and the next move would be axial points for a quadratic model (Module 7).

Guidance

Center points are cheap insurance. In a 2x2, one center point buys you a curvature check; two or three center points also give you an honest noise estimate (and therefore a real confidence interval). Module 6 returns to this when the budget gets tight.

B. Online shop 2x2 with replicates (w4)#

A small online business runs an 8-week experiment every Tuesday (same day of week, to remove the day-of-week effect). Two factors:

  • S = free-shipping threshold [€30 or €50]

  • P = profile required before checkout? [No or Yes]

Each corner is replicated: two Tuesdays per combination, eight Tuesdays total.

S (euro)

P (profile)

Daily sales (euro)

30

No

348 and 356

50

No

359 and 363

30

Yes

327 and 296

50

Yes

243 and 257

Now R^2 is no longer exactly 1 - the replicates give the regression something to disagree with - and you can read the noise level off the residuals.

[5]:
import numpy as np

S = c(-1, -1, -1, -1, +1, +1, +1, +1, lo=30, hi=50, name="S", units="euro")
P = c(-1, -1, +1, +1, -1, -1, +1, +1, lo=0,  hi=1,  name="P", units="0=No, 1=Yes")
sales = c(348, 356, 327, 296, 359, 363, 243, 257, name="sales")
shop = gather(S=S, P=P, sales=sales)

m = lm("sales ~ S + P + S:P", shop)
params = m.get_parameters(drop_intercept=False)
print(params.to_string())
print()

# Estimate the noise level from the regression residuals.

fitted = shop.apply(lambda r: float(predict(m, S=r["S"], P=r["P"]).iloc[0]), axis=1)
residuals = shop["sales"] - fitted
print(f"Standard error of the residuals: {np.std(residuals, ddof=4):.1f} euro")
print(f"Range across replicates at each corner: "
      f"S- P- = {356-348} | S+ P- = {363-359} | "
      f"S- P+ = {327-296} | S+ P+ = {257-243}")
Intercept    318.625
S            -13.125
P            -37.875
S:P          -17.625

Standard error of the residuals: 12.4 euro
Range across replicates at each corner: S- P- = 8 | S+ P- = 4 | S- P+ = 31 | S+ P+ = 14
[6]:
fig = main_effects_plot(m, factor_labels={"S": "Free-ship threshold", "P": "Profile required"})
fig.update_layout(width=620, height=380, title="Online shop main effects")
fig

Solution

The fitted model is sales = 318.6 - 13.1 * x_S - 37.9 * x_P - 17.6 * x_S * x_P.

  • Coefficient on S (= -13.1 euro). Raising the free-shipping threshold from €30 to €50 loses about 2 * 13.1 = €26 per day. Customers like the lower threshold. Not surprising at all.

  • Coefficient on P (= -37.9 euro). Requiring a profile costs about 2 * 37.9 = €76 per day. This is the biggest single lever in the experiment. Some customers abandon the cart when forced to create an account.

  • Interaction S:P (= -17.6 euro). The negative interaction means the profile requirement hurts daily sales more strongly when the shipping threshold is also at the unfavorable level.

The replicates’ ranges - 8, 4, 31, 14 - show the noise is uneven across corners. The (S=30, P=Yes) corner is much more volatile, probably because that corner depresses overall sales and the absolute swings get more dramatic.

Practical advice. Free shipping over €30 (the low setting), no profile required. Predicted daily sales at that corner: y = 318.6 + 13.1 + 37.9 - 17.6 = 352 euro.

Q6: a mistake on the last experiment#

The colleague running the last Tuesday set the shipping threshold to €60 by accident (intended: €50) and recorded €220 (instead of €257). Two changes: the factor level is now at x_S = +2 rather than +1, and the response is different.

Rule of thumb from the source material: record what really happened, not what was supposed to happen. The model accommodates the odd level naturally.

[7]:
S2 = c(-1, -1, -1, -1, +1, +1, +1, +2, lo=30, hi=50, name="S", units="euro")
P2 = c(-1, -1, +1, +1, -1, -1, +1, +1, lo=0,  hi=1,  name="P", units="0=No, 1=Yes")
sales2 = c(348, 356, 327, 296, 359, 363, 243, 220, name="sales")
shop2 = gather(S=S2, P=P2, sales=sales2)
m2 = lm("sales ~ S + P + S:P", shop2)
print("Coefficients (with the mistake recorded honestly):")
print(m2.get_parameters(drop_intercept=False).to_string())
Coefficients (with the mistake recorded honestly):
Intercept    317.916667
S            -13.416667
P            -38.583333
S:P          -17.916667

Solution

The coefficients shift a little: b_S goes from -13.1 to -13.4; b_P from -37.9 to -38.6; b_S:P from -17.6 to -17.9. The qualitative conclusions are unchanged - both factors are bad for sales, and the interaction is bad for sales - but the model now correctly reflects the asymmetric coverage of the design.

What we learn from this scenario. Honest recording is the easiest way to keep predictions valid. The library handles unbalanced coded values without complaint; trying to “fix” the model by pretending the run was at +1 would make every future prediction slightly biased.

C. A three-factor stability study (w4)#

A development team is trying to push product stability above 50 days with three factors:

  • A = enzyme strength [20% or 30%]

  • B = feed concentration [75% or 85%]

  • C = mixer type [R or W] (categorical)

A

B

C

Stability (days)

40

27

35

21

41

27

31

20

The worksheet asks which factors matter, what to do with the one that does not, and where to run the next experiments to reach 50 days.

[8]:
A = c(-1, +1, -1, +1, -1, +1, -1, +1, lo=20, hi=30, name="A", units="%")
B = c(-1, -1, +1, +1, -1, -1, +1, +1, lo=75, hi=85, name="B", units="%")
C = c(-1, -1, -1, -1, +1, +1, +1, +1, lo=0,  hi=1,  name="C", units="0=R, 1=W")
stab = c(40, 27, 35, 21, 41, 27, 31, 20, name="stab")
study = gather(A=A, B=B, C=C, stab=stab)

m_full = lm("stab ~ A * B * C", study)
print("Full model (all main effects and interactions):")
print(m_full.get_parameters(drop_intercept=False).to_string())
Full model (all main effects and interactions):
Intercept    30.25
A            -6.50
B            -3.50
A:B           0.25
C            -0.50
A:C           0.25
B:C          -0.75
A:B:C         0.50
[9]:
# A three-factor cube plot puts the eight observations on the vertices.

cube = visualize_doe(
    plot_type="cube_plot",
    design_data=study.to_dict(orient="records"),
    response_column="stab",
    factors_to_plot=["A", "B", "C"],
    factor_labels={"A": "Enzyme [%]", "B": "Feed [%]", "C": "Mixer"},
    backend="plotly",
)
fig = go.Figure(cube["plotly"])
fig.update_layout(width=560, height=480, title="Three-factor stability cube")
fig
[10]:
# Pareto-style bar chart of the effect magnitudes.

pareto = visualize_doe(
    plot_type="pareto",
    analysis_results={
        "effects": {
            term: 2 * coef
            for term, coef in m_full.get_parameters(drop_intercept=True).items()
        },
    },
    backend="plotly",
)
fig = go.Figure(pareto["plotly"])
fig.update_layout(width=640, height=380, title="Pareto plot of effects on stability")
fig
[11]:
# Drop the C main effect (smallest in magnitude) and refit.

m_reduced = lm("stab ~ A * B", study)
print("Reduced model (C dropped):")
print(m_reduced.get_parameters(drop_intercept=False).to_string())
Reduced model (C dropped):
Intercept    30.25
A            -6.50
B            -3.50
A:B           0.25

Solution

Q8 - Which factor matters most. In the Pareto plot the coefficient magnitudes are A = 6.5 > B = 3.5 >> C = 0.5. Mixer type (C) has almost no effect on stability across the range tested.

Q9 - Contour-plot check. Plot stability against any two of A/B/C: contours along the A-axis are steeper than along the B-axis, which is steeper than along the C-axis. The visual confirms the regression.

Q10 - Does “low effect” mean “unimportant”? No. It means stability is insensitive to mixer type over the range tested. That is useful information: pick whichever mixer is cheaper, safer, or more available - it will not hurt the response.

Q11 - Rebuild without C. The intercept, b_A, b_B and b_A:B are unchanged (full vs reduced model is identical for those terms because the design is orthogonal). The standard errors get smaller because you spent fewer degrees of freedom on a useless factor.

[12]:
# Three suggested next experiments to reach 50 days.

candidates = [
    ("A=-2 (15%), B=-2 (70%), C=0",  dict(A=-2,   B=-2)),
    ("A=-3 (10%), B=-1 (75%), C=0",  dict(A=-3,   B=-1)),
    ("A=-2.5 (12.5%), B=-1.5 (72.5%), C=0", dict(A=-2.5, B=-1.5)),
]
for label, coded in candidates:
    pred = float(predict(m_reduced, **coded).iloc[0])
    print(f"{label:50s} -> predicted stability {pred:5.1f} days")
A=-2 (15%), B=-2 (70%), C=0                        -> predicted stability  51.2 days
A=-3 (10%), B=-1 (75%), C=0                        -> predicted stability  54.0 days
A=-2.5 (12.5%), B=-1.5 (72.5%), C=0                -> predicted stability  52.7 days

Solution

Q12 - Suggested next experiments. Both main effects are negative, so to increase stability we move down on both coded factors. Three candidates (with the C level set wherever is cheapest, since C does not matter):

  • A = -2, B = -2: enzyme 15%, feed 70%; predicted 51 days.

  • A = -3, B = -1: enzyme 10%, feed 75%; predicted 54 days.

  • A = -2.5, B = -1.5: enzyme 12.5%, feed 72.5%; predicted 53 days.

All three extrapolate outside the observed corners. Run one as a check first; if it lands close to the prediction, the model is still trustworthy and you can push further. If it lands well above or below, you have learned where the linear model stops being a good description of the surface.

Wrap-up#

Three case studies in one module, each adding one trick to the toolbox:

  • Center points confirm whether the design plane is straight.

  • Replicates give a real noise estimate (and a real R^2).

  • Three factors fit on a cube plot; the prediction equation grows by two main-effect terms and three interactions, but the workflow is identical to a 2x2.

Next: Module 4 scales the same toolkit to four factors in a bioreactor study (16 runs, full factorial) and walks through how to read a Pareto / half-normal plot and prune a model down to the terms that actually matter.