{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Full factorials: replicates, center points, and three factors\n", "\n", "**Source worksheets:** [yint.org/w3](https://yint.org/w3) and [yint.org/w4](https://yint.org/w4) - weeks 3 and 4 of the applied DoE short course.\n", "\n", "Modules 1 and 2 stopped at the four corners of a 2x2. In a real\n", "project you do three things on top of that, often before the experiment\n", "even finishes:\n", "\n", "1. **Replicate** corners so you can read the noise level, get a real\n", " confidence interval, and detect when a saturated model is hiding\n", " the residual error behind an automatic ``R^2 = 1``.\n", "2. Run a **center point** at the middle of the design to check whether\n", " the response surface is genuinely a plane, or whether it is starting\n", " to curve.\n", "3. Add a **third factor**. The cube plot is back; the same toolkit\n", " scales without any new mathematics.\n", "\n", "Module 3 works through three case studies that hit each of these in\n", "turn: a chemical side-product 2x2 with a center point (w3), an online\n", "shop 2x2 with replicates (w4), and a 3-factor stability study where\n", "one factor turns out to barely matter (w4)." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. tip::\n", "\n", " Two habits to build in this module:\n", "\n", " - Every experiment gets a *center point* if you can afford it.\n", " One extra run earns you a degree of freedom for the regression\n", " and tells you whether the linear model is the right model.\n", " - Replicates give a direct estimate of the noise level. Without\n", " replicates a saturated design returns ``R^2 = 1`` by construction,\n", " and any confidence interval relies on assumptions you cannot check\n", " from the data alone." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A. Side-product factorial with a center point (w3)\n", "\n", "A factorial experiment is run to **minimize** an unwanted side product\n", "in a chemical process. Two factors are varied:\n", "\n", "- **A = additive volume** [20 or 30 mL]\n", "- **B = baffles in the reactor?** [No or Yes]\n", "\n", "| Run | A (mL) | B (baffles) | Side product (g) |\n", "|----:|------:|------------:|-----------------:|\n", "| 1 | 20 | No | 89 |\n", "| 2 | 30 | No | 268 |\n", "| 3 | 20 | Yes | 179 |\n", "| 4 | 30 | Yes | 448 |\n", "\n", "The worksheet asks for the prediction equation, an interpretation of\n", "each coefficient, a comment on the interaction, and where to run the\n", "next experiment to *minimize* the side product." ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "import plotly.graph_objects as go\n", "import plotly.io as pio\n", "\n", "from process_improve.experiments import c, gather, lm, main_effects_plot, predict\n", "from process_improve.experiments.visualization import visualize_doe\n", "\n", "pio.renderers.default = \"notebook_connected\"\n", "\n", "# Standard order: A, B coded. Categorical B uses No=-1, Yes=+1.\n", "\n", "A = c(-1, +1, -1, +1, lo=20, hi=30, name=\"A\", units=\"mL\")\n", "B = c(-1, -1, +1, +1, lo=0, hi=1, name=\"B\", units=\"0=No, 1=Yes\")\n", "y = c(89, 268, 179, 448, name=\"y\")\n", "side = gather(A=A, B=B, y=y)\n", "side" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "model = lm(\"y ~ A + B + A:B\", side)\n", "params = model.get_parameters(drop_intercept=False)\n", "print(params.to_string())\n", "print()\n", "print(\"Prediction equation:\")\n", "print(f\" y = {params['Intercept']:.0f} \"\n", " f\"+ {params['A']:+.0f} * x_A \"\n", " f\"+ {params['B']:+.0f} * x_B \"\n", " f\"+ {params['A:B']:+.0f} * x_A * x_B\")" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "sq = visualize_doe(\n", " plot_type=\"square_plot\",\n", " design_data=side.to_dict(orient=\"records\"),\n", " response_column=\"y\",\n", " factors_to_plot=[\"A\", \"B\"],\n", " factor_labels={\"A\": \"Additive [mL]\", \"B\": \"Baffles\"},\n", " backend=\"plotly\",\n", ")\n", "fig = go.Figure(sq[\"plotly\"])\n", "fig.update_layout(width=520, height=440, title=\"Side product at the four corners\")\n", "fig" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. admonition:: Solution\n", "\n", " The fitted model in coded units is\n", " ``y = 246 + 112 * x_A + 67.5 * x_B + 22.5 * x_A * x_B``.\n", "\n", " - **Coefficient on A (= +112 g).** Moving the additive from 20 to\n", " 30 mL increases the side product on average by ``2 * 112 = 224 g``.\n", " That is a large response shift, and in the wrong direction for\n", " the objective of minimizing the side product.\n", " - **Coefficient on B (= +67.5 g).** Adding baffles makes things\n", " worse too, by ``2 * 67.5 = 135 g`` on average. Surprising for a\n", " \"should help mixing\" intuition, but the data are the data.\n", " - **Interaction A:B (= +22.5 g).** Modest in absolute terms but\n", " same sign as both main effects: baffles make the additive effect\n", " *worse*.\n", "\n", " **Where to run next.** To minimize the side product, move *down*\n", " in coded units on both factors: lower additive volume (below 20 mL)\n", " and no baffles. A reasonable next run is at A = 15 mL (``x_A = -2``),\n", " B = No (``x_B = -1``) - predicted ``y = 246 - 224 - 67.5 + 45 = -0.5 g``,\n", " i.e. close to zero. In practice you stop when the model says\n", " \"we cannot go lower\" or when the prediction is suspicious because\n", " you are extrapolating." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Adding two center-point runs\n", "\n", "Two extra experiments are run at the *center* of the additive range\n", "(25 mL), once with each level of baffles:\n", "\n", "| Run | A (mL) | B (baffles) | Side product (g) |\n", "|----:|------:|------------:|-----------------:|\n", "| 5 | 25 | No | 186 |\n", "| 6 | 25 | Yes | 300 |\n", "\n", "These do not change which terms are in the model, but they let us check\n", "whether the linear-and-interaction model is actually the right model.\n", "If the center-point responses match the model's predictions at\n", "``x_A = 0``, the surface is linear in A. If they sit well off the\n", "plane, the system has *curvature* and a quadratic term is needed." ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "A2 = c(-1, +1, -1, +1, 0, 0, lo=20, hi=30, name=\"A\", units=\"mL\")\n", "B2 = c(-1, -1, +1, +1, -1, +1, lo=0, hi=1, name=\"B\", units=\"0=No, 1=Yes\")\n", "y2 = c(89, 268, 179, 448, 186, 300, name=\"y\")\n", "side2 = gather(A=A2, B=B2, y=y2)\n", "\n", "model2 = lm(\"y ~ A + B + A:B\", side2)\n", "params2 = model2.get_parameters(drop_intercept=False)\n", "print(\"Coefficients with the center points included:\")\n", "print(params2.to_string())\n", "print()\n", "print(\"Predicted vs observed at the center points:\")\n", "for label, point, observed in [\n", " (\"(x_A=0, x_B=-1)\", dict(A=0, B=-1), 186),\n", " (\"(x_A=0, x_B=+1)\", dict(A=0, B=+1), 300),\n", "]:\n", " pred = float(predict(model2, **point).iloc[0])\n", " print(f\" {label}: predicted {pred:6.1f}, observed {observed} - diff {observed - pred:+.1f}\")" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. admonition:: Solution\n", "\n", " The coefficients barely move: intercept ``246 -> 245``,\n", " ``b_A = +112`` (unchanged), ``b_B`` shifts a touch from ``+67.5``\n", " to ``+64``, interaction stays at ``+22.5``. The center-point\n", " responses are within a handful of grams of the linear prediction.\n", "\n", " **What this means.** Inside the design window, the surface is\n", " essentially a plane - there is no evidence of curvature. The\n", " factorial alone is doing a good enough job. If the center point\n", " had landed far from the prediction (say 100 g off), you would have\n", " strong evidence of curvature and the next move would be axial points\n", " for a quadratic model (Module 7).\n", "\n", ".. admonition:: Guidance\n", "\n", " Center points are cheap insurance. In a 2x2, **one** center point\n", " buys you a curvature check; **two or three** center points also\n", " give you an honest noise estimate (and therefore a real confidence\n", " interval). Module 6 returns to this when the budget gets tight." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## B. Online shop 2x2 with replicates (w4)\n", "\n", "A small online business runs an 8-week experiment every Tuesday (same\n", "day of week, to remove the day-of-week effect). Two factors:\n", "\n", "- **S = free-shipping threshold** [\u20ac30 or \u20ac50]\n", "- **P = profile required before checkout?** [No or Yes]\n", "\n", "Each corner is *replicated*: two Tuesdays per combination, eight\n", "Tuesdays total.\n", "\n", "| S (euro) | P (profile) | Daily sales (euro) |\n", "|---------:|------------:|-------------------:|\n", "| 30 | No | 348 and 356 |\n", "| 50 | No | 359 and 363 |\n", "| 30 | Yes | 327 and 296 |\n", "| 50 | Yes | 243 and 257 |\n", "\n", "Now ``R^2`` is no longer exactly 1 - the replicates give the regression\n", "something to disagree with - and you can read the *noise level* off\n", "the residuals." ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "import numpy as np\n", "\n", "S = c(-1, -1, -1, -1, +1, +1, +1, +1, lo=30, hi=50, name=\"S\", units=\"euro\")\n", "P = c(-1, -1, +1, +1, -1, -1, +1, +1, lo=0, hi=1, name=\"P\", units=\"0=No, 1=Yes\")\n", "sales = c(348, 356, 327, 296, 359, 363, 243, 257, name=\"sales\")\n", "shop = gather(S=S, P=P, sales=sales)\n", "\n", "m = lm(\"sales ~ S + P + S:P\", shop)\n", "params = m.get_parameters(drop_intercept=False)\n", "print(params.to_string())\n", "print()\n", "\n", "# Estimate the noise level from the regression residuals.\n", "\n", "fitted = shop.apply(lambda r: float(predict(m, S=r[\"S\"], P=r[\"P\"]).iloc[0]), axis=1)\n", "residuals = shop[\"sales\"] - fitted\n", "print(f\"Standard error of the residuals: {np.std(residuals, ddof=4):.1f} euro\")\n", "print(f\"Range across replicates at each corner: \"\n", " f\"S- P- = {356-348} | S+ P- = {363-359} | \"\n", " f\"S- P+ = {327-296} | S+ P+ = {257-243}\")" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "fig = main_effects_plot(m, factor_labels={\"S\": \"Free-ship threshold\", \"P\": \"Profile required\"})\n", "fig.update_layout(width=620, height=380, title=\"Online shop main effects\")\n", "fig" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. admonition:: Solution\n", "\n", " The fitted model is\n", " ``sales = 318.6 - 13.1 * x_S - 37.9 * x_P - 17.6 * x_S * x_P``.\n", "\n", " - **Coefficient on S (= -13.1 euro).** Raising the free-shipping\n", " threshold from \u20ac30 to \u20ac50 *loses* about ``2 * 13.1 = \u20ac26`` per day.\n", " Customers like the lower threshold. Not surprising at all.\n", " - **Coefficient on P (= -37.9 euro).** Requiring a profile costs\n", " about ``2 * 37.9 = \u20ac76`` per day. This is the biggest single\n", " lever in the experiment. Some customers abandon the cart when\n", " forced to create an account.\n", " - **Interaction S:P (= -17.6 euro).** The negative interaction\n", " means the profile requirement hurts daily sales more strongly\n", " when the shipping threshold is also at the unfavorable level.\n", "\n", " The replicates' ranges - 8, 4, 31, 14 - show the noise is uneven\n", " across corners. The (S=30, P=Yes) corner is much more volatile,\n", " probably because that corner depresses overall sales and the\n", " absolute swings get more dramatic.\n", "\n", " **Practical advice.** Free shipping over \u20ac30 (the low setting),\n", " no profile required. Predicted daily sales at that corner:\n", " ``y = 318.6 + 13.1 + 37.9 - 17.6 = 352`` euro." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q6: a mistake on the last experiment\n", "\n", "The colleague running the last Tuesday set the shipping threshold to\n", "\u20ac60 by accident (intended: \u20ac50) and recorded \u20ac220 (instead of \u20ac257).\n", "Two changes: the factor level is now at ``x_S = +2`` rather than\n", "``+1``, and the response is different.\n", "\n", "Rule of thumb from the source material: **record what really happened,\n", "not what was supposed to happen.** The model accommodates the odd\n", "level naturally." ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "S2 = c(-1, -1, -1, -1, +1, +1, +1, +2, lo=30, hi=50, name=\"S\", units=\"euro\")\n", "P2 = c(-1, -1, +1, +1, -1, -1, +1, +1, lo=0, hi=1, name=\"P\", units=\"0=No, 1=Yes\")\n", "sales2 = c(348, 356, 327, 296, 359, 363, 243, 220, name=\"sales\")\n", "shop2 = gather(S=S2, P=P2, sales=sales2)\n", "m2 = lm(\"sales ~ S + P + S:P\", shop2)\n", "print(\"Coefficients (with the mistake recorded honestly):\")\n", "print(m2.get_parameters(drop_intercept=False).to_string())" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. admonition:: Solution\n", "\n", " The coefficients shift a little: ``b_S`` goes from ``-13.1`` to\n", " ``-13.4``; ``b_P`` from ``-37.9`` to ``-38.6``; ``b_S:P`` from\n", " ``-17.6`` to ``-17.9``. The qualitative conclusions are unchanged\n", " - both factors are bad for sales, and the interaction is bad for\n", " sales - but the model now correctly reflects the asymmetric\n", " coverage of the design.\n", "\n", " **What we learn from this scenario.** Honest recording is the\n", " easiest way to keep predictions valid. The library handles\n", " unbalanced coded values without complaint; trying to \"fix\" the\n", " model by pretending the run was at ``+1`` would make every future\n", " prediction slightly biased." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## C. A three-factor stability study (w4)\n", "\n", "A development team is trying to push product stability above 50 days\n", "with three factors:\n", "\n", "- **A = enzyme strength** [20% or 30%]\n", "- **B = feed concentration** [75% or 85%]\n", "- **C = mixer type** [R or W] (categorical)\n", "\n", "| A | B | C | Stability (days) |\n", "|----:|----:|----:|-----------------:|\n", "| - | - | - | 40 |\n", "| + | - | - | 27 |\n", "| - | + | - | 35 |\n", "| + | + | - | 21 |\n", "| - | - | + | 41 |\n", "| + | - | + | 27 |\n", "| - | + | + | 31 |\n", "| + | + | + | 20 |\n", "\n", "The worksheet asks which factors matter, what to do with the one that\n", "does not, and where to run the next experiments to reach 50 days." ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "A = c(-1, +1, -1, +1, -1, +1, -1, +1, lo=20, hi=30, name=\"A\", units=\"%\")\n", "B = c(-1, -1, +1, +1, -1, -1, +1, +1, lo=75, hi=85, name=\"B\", units=\"%\")\n", "C = c(-1, -1, -1, -1, +1, +1, +1, +1, lo=0, hi=1, name=\"C\", units=\"0=R, 1=W\")\n", "stab = c(40, 27, 35, 21, 41, 27, 31, 20, name=\"stab\")\n", "study = gather(A=A, B=B, C=C, stab=stab)\n", "\n", "m_full = lm(\"stab ~ A * B * C\", study)\n", "print(\"Full model (all main effects and interactions):\")\n", "print(m_full.get_parameters(drop_intercept=False).to_string())" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# A three-factor cube plot puts the eight observations on the vertices.\n", "\n", "cube = visualize_doe(\n", " plot_type=\"cube_plot\",\n", " design_data=study.to_dict(orient=\"records\"),\n", " response_column=\"stab\",\n", " factors_to_plot=[\"A\", \"B\", \"C\"],\n", " factor_labels={\"A\": \"Enzyme [%]\", \"B\": \"Feed [%]\", \"C\": \"Mixer\"},\n", " backend=\"plotly\",\n", ")\n", "fig = go.Figure(cube[\"plotly\"])\n", "fig.update_layout(width=560, height=480, title=\"Three-factor stability cube\")\n", "fig" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# Pareto-style bar chart of the effect magnitudes.\n", "\n", "pareto = visualize_doe(\n", " plot_type=\"pareto\",\n", " analysis_results={\n", " \"effects\": {\n", " term: 2 * coef\n", " for term, coef in m_full.get_parameters(drop_intercept=True).items()\n", " },\n", " },\n", " backend=\"plotly\",\n", ")\n", "fig = go.Figure(pareto[\"plotly\"])\n", "fig.update_layout(width=640, height=380, title=\"Pareto plot of effects on stability\")\n", "fig" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# Drop the C main effect (smallest in magnitude) and refit.\n", "\n", "m_reduced = lm(\"stab ~ A * B\", study)\n", "print(\"Reduced model (C dropped):\")\n", "print(m_reduced.get_parameters(drop_intercept=False).to_string())" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. admonition:: Solution\n", "\n", " **Q8 - Which factor matters most.** In the Pareto plot the\n", " coefficient magnitudes are ``A = 6.5 > B = 3.5 >> C = 0.5``.\n", " Mixer type (C) has almost no effect on stability across the range\n", " tested.\n", "\n", " **Q9 - Contour-plot check.** Plot stability against any two of\n", " A/B/C: contours along the A-axis are steeper than along the B-axis,\n", " which is steeper than along the C-axis. The visual confirms the\n", " regression.\n", "\n", " **Q10 - Does \"low effect\" mean \"unimportant\"?** No. It means\n", " stability is *insensitive* to mixer type over the range tested.\n", " That is useful information: pick whichever mixer is cheaper, safer,\n", " or more available - it will not hurt the response.\n", "\n", " **Q11 - Rebuild without C.** The intercept, ``b_A``, ``b_B`` and\n", " ``b_A:B`` are unchanged (full vs reduced model is identical for\n", " those terms because the design is orthogonal). The standard\n", " errors get smaller because you spent fewer degrees of freedom\n", " on a useless factor." ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# Three suggested next experiments to reach 50 days.\n", "\n", "candidates = [\n", " (\"A=-2 (15%), B=-2 (70%), C=0\", dict(A=-2, B=-2)),\n", " (\"A=-3 (10%), B=-1 (75%), C=0\", dict(A=-3, B=-1)),\n", " (\"A=-2.5 (12.5%), B=-1.5 (72.5%), C=0\", dict(A=-2.5, B=-1.5)),\n", "]\n", "for label, coded in candidates:\n", " pred = float(predict(m_reduced, **coded).iloc[0])\n", " print(f\"{label:50s} -> predicted stability {pred:5.1f} days\")" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. admonition:: Solution\n", "\n", " **Q12 - Suggested next experiments.** Both main effects are\n", " negative, so to *increase* stability we move *down* on both\n", " coded factors. Three candidates (with the C level set wherever\n", " is cheapest, since C does not matter):\n", "\n", " - ``A = -2, B = -2``: enzyme 15%, feed 70%; predicted 51 days.\n", " - ``A = -3, B = -1``: enzyme 10%, feed 75%; predicted 54 days.\n", " - ``A = -2.5, B = -1.5``: enzyme 12.5%, feed 72.5%; predicted 53 days.\n", "\n", " All three extrapolate outside the observed corners. Run one as a\n", " check first; if it lands close to the prediction, the model is\n", " still trustworthy and you can push further. If it lands well\n", " above or below, you have learned where the linear model stops\n", " being a good description of the surface." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Wrap-up\n", "\n", "Three case studies in one module, each adding one trick to the\n", "toolbox:\n", "\n", "- **Center points** confirm whether the design plane is straight.\n", "- **Replicates** give a real noise estimate (and a real ``R^2``).\n", "- **Three factors** fit on a cube plot; the prediction equation grows\n", " by two main-effect terms and three interactions, but the workflow\n", " is identical to a 2x2.\n", "\n", "**Next:** Module 4 scales the same toolkit to **four factors** in a\n", "bioreactor study (16 runs, full factorial) and walks through how to\n", "read a Pareto / half-normal plot and *prune* a model down to the\n", "terms that actually matter." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 4 }