Factorial DOE: oil-company experiment#

Industrial designed experiment from an oil company. Four blend components are varied across 19 runs to improve volumetric heat capacity. Three of the components (A, B, D) are continuous and were varied in coded units; the fourth (C) is a binary factor: present or absent. The aim is to identify which factors and interactions actually drive the response so that future experimentation can focus there.

Data. oil-company-doe.csv from openmv.net. 19 rows, five columns: A, B, C, D, y.

What we do. Run the package’s analyze_experiment with an interactions model, read the coefficient table and ANOVA, build a Pareto plot of standardised effects, and check the residual diagnostics.

Adapted from the Design and analysis of experiments chapter of the Process Improvement using Data book (CC BY-SA 4.0).

[1]:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from process_improve.experiments.analysis import analyze_experiment

Load and inspect#

[2]:

url = "https://openmv.net/file/oil-company-doe.csv"
doe = pd.read_csv(url, index_col=0)
if doe.shape[1] < 5:
    doe = pd.read_csv(url)
doe.columns = [c.strip() for c in doe.columns]
print(f"Shape: {doe.shape}")
print(f"Columns: {list(doe.columns)}")
doe.head()

Shape: (19, 5)
Columns: ['A', 'B', 'C', 'D', 'y']

[2]:

	A	B	C	D	y
0	4.38	1.60	No	2.18	28.03
1	6.20	2.26	Yes	2.18	37.25
2	6.20	2.26	Yes	2.18	37.22
3	4.38	2.26	Yes	3.08	37.06
4	6.20	1.60	Yes	3.08	39.23

[3]:

doe.describe(include="all")

[3]:

	A	B	C	D	y
count	19.000000	19.000000	19	19.000000	19.000000
unique	NaN	NaN	2	NaN	NaN
top	NaN	NaN	No	NaN	NaN
freq	NaN	NaN	11	NaN	NaN
mean	5.562105	1.970526	NaN	2.685263	33.347895
std	0.795792	0.311582	NaN	0.424884	4.037329
min	4.380000	1.600000	NaN	2.180000	28.030000
25%	4.680000	1.600000	NaN	2.180000	29.565000
50%	6.200000	2.040000	NaN	2.780000	31.710000
75%	6.200000	2.260000	NaN	3.080000	37.140000
max	6.200000	2.260000	NaN	3.080000	39.230000

Fit the interactions model#

analyze_experiment accepts a design DataFrame and the name of the response column, builds a formula, fits it via statsmodels, and returns a dictionary keyed by the requested analysis types. With model="interactions" we ask for main effects plus all two-factor interactions; that is 4 + 6 = 10 terms, which fits comfortably in 19 runs.

[4]:

result = analyze_experiment(
    doe,
    response_column="y",
    model="interactions",
    analysis_type=[
        "coefficients",
        "anova",
        "effects",
        "significance",
        "residual_diagnostics",
    ],
)
ms = result["model_summary"]
print(f"formula     : {ms['formula']}")
print(f"R2          : {ms['r_squared']:.3f}")
print(f"R2 (adj)    : {ms['r_squared_adj']:.3f}")
print(f"R2 (pred)   : {ms['r_squared_pred']:.3f}")
print(f"obs / df_resid: {ms['n_obs']} / {ms['df_residual']}")

formula     : y ~ (A + B + C + D) ** 2
R2          : 0.991
R2 (adj)    : 0.979
R2 (pred)   : 0.963
obs / df_resid: 19 / 8

Coefficient table#

Sort by absolute t-value to put the most impactful terms at the top. Coefficients with a small p-value are the ones that the data picks out as important; coefficients whose confidence interval straddles zero are not.

[5]:

coef_df = pd.DataFrame(result["coefficients"]).set_index("term")
coef_df = coef_df.reindex(coef_df["t_value"].abs().sort_values(ascending=False).index)
coef_df.round(3)

[5]:

	coefficient	std_error	t_value	p_value	ci_low	ci_high
term
Intercept	61.982	12.449	4.979	0.001	33.274	90.690
B	-116.802	25.774	-4.532	0.002	-176.238	-57.367
A:B	18.047	4.362	4.137	0.003	7.987	28.107
A:D	-10.810	2.930	-3.689	0.006	-17.567	-4.052
D	63.135	19.425	3.250	0.012	18.342	107.929
A	-4.552	2.150	-2.117	0.067	-9.509	0.405
C[T.Yes]	9.418	4.552	2.069	0.072	-1.079	19.914
C[T.Yes]:D	1.207	0.719	1.679	0.132	-0.451	2.866
B:D	3.437	2.236	1.537	0.163	-1.720	8.594
B:C[T.Yes]	-1.741	1.151	-1.512	0.169	-4.395	0.913
A:C[T.Yes]	-0.326	0.424	-0.769	0.464	-1.303	0.651

ANOVA#

[6]:

anova = pd.DataFrame(result["anova_table"]).set_index("source")
anova.round(3)

[6]:

	df	sum_sq	mean_sq	F	p_value
source
C	1	217.794	None	651.088	0.000
A	1	16.460	None	49.206	0.000
A:C	1	0.198	None	0.591	0.464
B	1	2.502	None	7.479	0.026
B:C	1	0.765	None	2.287	0.169
D	1	9.319	None	27.858	0.001
C:D	1	0.943	None	2.818	0.132
A:B	1	5.725	None	17.114	0.003
A:D	1	4.552	None	13.608	0.006
B:D	1	0.790	None	2.362	0.163
Residual	8	2.676	None	NaN	NaN

Pareto plot of standardised effects#

Dividing each effect estimate by its standard error gives a unitless comparison of how strong each term is relative to the noise. Plotting the absolute values as a horizontal bar chart from largest to smallest is the classical Pareto plot.

[7]:

effects = pd.Series(result["effects"], dtype=float)
errors = pd.Series(result["effect_std_errors"], dtype=float).reindex(effects.index)
standardised = (effects / errors).abs().sort_values()

_fig, ax = plt.subplots(figsize=(7, 0.35 * max(4, len(standardised)) + 1.0))
ax.barh(standardised.index, standardised.values, color="#1f77b4")
ax.axvline(2, color="r", linestyle="--", linewidth=1, label="|t| = 2 (rough significance)")
ax.set_xlabel("|effect / standard error|")
ax.set_title("Pareto of standardised effects")
ax.legend(loc="best")
plt.tight_layout()
plt.show()

../../../_images/user_guide_case_studies_design-analysis-experiments_factorial-oil-company-doe_12_0.png

Residual diagnostics#

[8]:

diag = result["residual_diagnostics"]
fitted = np.asarray(diag["fitted_values"], dtype=float)
resid = np.asarray(diag["residuals"], dtype=float)

_fig, axes = plt.subplots(1, 2, figsize=(9, 3.2))
axes[0].scatter(fitted, resid, s=30, color="#1f77b4")
axes[0].axhline(0, color="k", linewidth=0.7)
axes[0].set_xlabel("Fitted")
axes[0].set_ylabel("Residual")
axes[0].set_title("Residuals vs fitted")

axes[1].hist(resid, bins=8, color="#1f77b4", edgecolor="k")
axes[1].set_xlabel("Residual")
axes[1].set_title("Residual distribution")
plt.tight_layout()
plt.show()

print(f"Shapiro-Wilk p-value : {diag['shapiro_wilk']['p_value']:.3f}")
print(f"Durbin-Watson        : {diag['durbin_watson']:.2f}")
print(f"Breusch-Pagan p-value: {diag['breusch_pagan']['p_value']:.3f}")

../../../_images/user_guide_case_studies_design-analysis-experiments_factorial-oil-company-doe_14_0.png

Shapiro-Wilk p-value : 0.000
Durbin-Watson        : 1.52
Breusch-Pagan p-value: 0.050

What to try next#

Drop the terms below the |t| = 2 line and re-fit. The model R squared will go down a touch but the adjusted R squared usually goes up because the noise terms penalise the unadjusted statistic.
Pass analysis_type="lack_of_fit" to ask whether the residual variation is consistent with pure replication error, which tells you whether a curvature term is missing.
For follow-up experimentation, the :func:process_improve.experiments.strategy.recommend_strategy function will propose where to run the next batch of experiments given the coefficients you have already estimated.