{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Trade-off table, blocking, and the first move toward optimization\n", "\n", "**Source worksheets:** [yint.org/w8](https://yint.org/w8) and [yint.org/w9](https://yint.org/w9) - weeks 8 and 9 of the applied DoE short course.\n", "\n", "Module 5 ended with the *defining relation* and the *resolution* of a\n", "fractional design. This module zooms out to the **trade-off table**\n", "- the standard reference for picking ``k`` factors and ``p`` generators\n", "when budget is tight - then works through a small case study that\n", "combines a fractional factorial with **blocking** on the experimenter,\n", "and finishes by introducing the **sequential experimentation** mindset\n", "that drives Modules 7 and 8." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. tip::\n", "\n", " The trade-off table is the most frequently consulted quick reference\n", " in fractional-factorial design.\n", " The full table is at `yint.org/tradeoff `__." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Q1-Q8 - reading the trade-off table\n", "\n", "A few quick lookups:\n", "\n", "- **Five factors, sixteen runs**: $k = 5$, $p = 1$. One generator;\n", " conventionally $E = ABCD$. Defining relation $I = ABCDE$.\n", " Resolution V.\n", "- **Six factors, twenty runs**: 20 is not a power of 2. The nearest\n", " workable designs are 16 or 32 runs. At $k = 6$, $p = 2$\n", " (so $2^{6-2} = 16$) the standard generators are $E = ABC$,\n", " $F = BCD$; the defining relation has $2^{2} - 1 = 3$ words.\n", "\n", "The general pattern:\n", "\n", "> Each generator costs one half of the runs.\n", "> $p$ generators $\\rightarrow$ $2^{k-p}$ runs $\\rightarrow$\n", "> $2^{p} - 1$ words in the defining relation $\\rightarrow$\n", "> $2^{p} - 1$ alias chains.\n", "\n", "In Module 5 we saw $p = 1$ (half-fraction, one word $ABCD$).\n", "With $p = 2$ (quarter-fraction) the defining relation has *three*\n", "words: the two generators and their product." ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# 6 factors, 16 runs (k=6, p=2), generators E=ABC and F=BCD\n", "\n", "generators = [\"ABCE\", \"BCDF\"]\n", "\n", "# defining relation = I = ABCE = BCDF = ABCE * BCDF = (BC)^2 ADEF = ADEF\n", "\n", "print(f\"Generators (as words): {generators}\")\n", "print(f\"Defining relation: I = {' = '.join(generators)} = ADEF (= ABCE * BCDF, after cancellation)\")" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. admonition:: Solution\n", "\n", " - **Q3.** ``E = ABCD`` in the 2^(5-1) design, so the alias of E is\n", " the full four-factor interaction ABCD (all but invisible in a\n", " real chemistry system).\n", " - **Q4.** ``k = 6``, ``p = 2``, generators ``E = ABC`` and\n", " ``F = BCD`` give a resolution-IV quarter-fraction in 16 runs.\n", " - **Q7-Q8.** With ``p = 2`` generators the defining relation has\n", " ``2^p - 1 = 3`` words: each generator plus their product." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Q9-Q13 - half-fraction in four factors, plus blocking on the experimenter\n", "\n", "The set-up: four factors **A**, **B**, **C**, **D** to study; only 8\n", "runs in the budget; the work must be split between you and a colleague\n", "because the schedule does not allow one person to do all 8.\n", "\n", "A natural design is a **half-fraction** with generator ``D = ABC``\n", "(resolution IV). Splitting the work between two experimenters is a\n", "**blocking** problem: the experimenter is a **nuisance factor** - we\n", "do not care whether the value is \"you\" or \"colleague\", but if we do\n", "not control for it any drift in technique gets blamed on the real\n", "factors. Block on it by treating \"Person\" as an extra column ``E``\n", "whose generator is some interaction we are willing to lose.\n", "\n", "The standard choice is ``E = AB`` (block confounded with the AB\n", "two-factor interaction). Two design \"words\" gives us a resolution-III\n", "design overall: every main effect is now confounded with a *two-factor*\n", "interaction.\n", "\n", "Worksheet response values (Q13):\n", "``y = [120, 76, 106, 90, 72, 74, 90, 55]`` in standard order." ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "from process_improve.experiments import c, gather, lm\n", "\n", "A = c(-1, +1, -1, +1, -1, +1, -1, +1, name=\"A\")\n", "B = c(-1, -1, +1, +1, -1, -1, +1, +1, name=\"B\")\n", "C = c(-1, -1, -1, -1, +1, +1, +1, +1, name=\"C\")\n", "\n", "# Generator D = A*B*C\n", "\n", "D = c(-1, +1, +1, -1, +1, -1, -1, +1, name=\"D\")\n", "\n", "# Block factor E = \"Person\", with E = A*B\n", "\n", "E = c(+1, -1, -1, +1, +1, -1, -1, +1, name=\"E\")\n", "y = c(120, 76, 106, 90, 72, 74, 90, 55, name=\"y\")\n", "design = gather(A=A, B=B, C=C, D=D, E=E, y=y)\n", "design" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# Fit main effects plus the one two-factor interaction (A:C) that is\n", "# not already absorbed by a generator.\n", "\n", "m = lm(\"y ~ A + B + C + D + E + A:C\", design)\n", "print(m.get_parameters(drop_intercept=False).to_string())" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. admonition:: Solution\n", "\n", " With both generators ``D = ABC`` and ``E = AB`` the defining\n", " relation is\n", "\n", " ::\n", "\n", " I = ABCD = ABE = ABCD * ABE = CDE.\n", "\n", " Words: ``ABCD``, ``ABE``, ``CDE``. Shortest word has length 3\n", " (``ABE``, ``CDE``), so the design is **Resolution III**.\n", "\n", " The largest coefficients:\n", "\n", " - ``A = -11.6``, ``C = -12.6``, ``D = -8.1`` - three large, all\n", " negative.\n", " - ``B`` and ``E`` are tiny: the blocking column ``E`` showing zero\n", " is exactly what you want from a nuisance factor.\n", " - ``A:C`` is modest but the only two-factor interaction we kept\n", " in the model.\n", "\n", " **Confounds to be aware of**:\n", "\n", " - **A** is aliased with ``BE`` (multiply ABE by A) and with ``BCD``\n", " (multiply ABCD by A). The 11.6 coefficient could in principle\n", " be ``b_A + b_BE + b_BCD``.\n", " - **E** (\"Person\") is aliased with ``AB`` and with ``CD``. E\n", " coming out tiny tells you the operator effect and those two\n", " two-factor interactions together are negligible.\n", "\n", ".. admonition:: Guidance\n", "\n", " Blocking is the practical answer to \"the experimenter or batch\n", " probably affects the response.\" You **always** know which block a\n", " run belongs to (which operator ran it, which day, which batch of\n", " reagent), so you can always model it. Picking which interaction\n", " to confound with the block is a deliberate choice: lose the one\n", " you are least likely to believe in." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sequential experimentation - the bridge to optimization (w9)\n", "\n", "A response surface study is not one big design; it is a *sequence* of\n", "small designs, each pointing the next one in the right direction. The\n", "loop:\n", "\n", "1. **Predict** the outcome of the next experiment with the current model.\n", "2. **Run** the experiment.\n", "3. **Compare** the prediction with the measurement.\n", "\n", " - If they agree, the model is still useful. Decide:\n", "\n", " - extend the model with the new point and keep climbing, or\n", " - stop, because you are at the optimum.\n", "\n", " - If they disagree, the model has broken down. Switch to a\n", " higher-order model (add a center point, then axial points for\n", " a quadratic) or shift the design region.\n", "\n", "4. **Plan** where the next experiment will be.\n", "5. **Repeat**.\n", "\n", "The mantra: *the model is useful, but wrong* - useful because it gives\n", "you a defensible direction, wrong because the true surface curves and\n", "the linear approximation will break eventually. Knowing when it\n", "breaks is the entire skill." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. admonition:: Check yourself\n", "\n", " Before reading the next paragraph: in a 1-D problem (one factor),\n", " when is one-factor-at-a-time (OFAT) a *good* idea, and when is\n", " it a *bad* idea?" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. admonition:: Solution\n", "\n", " **OFAT is fine** when the system genuinely has one input - for\n", " example, finding the temperature that maximizes crystal size in a\n", " single-variable bioreactor process. Sequential OFAT is exactly\n", " what response surface optimization does in 1-D.\n", "\n", " **OFAT is bad** the moment the system has two or more inputs\n", " that *interact*. Tweaking factor A while holding B fixed and\n", " then tweaking B while holding A at its OFAT optimum is guaranteed\n", " to miss the diagonal of the response surface, which is where\n", " interactions live. Module 1 already showed this on the yogurt\n", " example: the four corners of a 2x2 told you what no OFAT walk\n", " ever would." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A worked 1-D optimization\n", "\n", "To make the response-surface loop concrete, we will optimize a 1-D\n", "process by hand. Pick a single factor (think: temperature, mixer\n", "speed, or any other physical knob), take small steps, fit a linear\n", "model, swap to a quadratic when the surface starts to curve, predict\n", "the peak, and confirm.\n", "\n", "The \"true\" system below is hidden from the optimizer; only the\n", "function call ``observe(t)`` is allowed - it returns the measured\n", "response at temperature ``t`` with a sprinkle of measurement noise." ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "import numpy as np\n", "import plotly.graph_objects as go\n", "import plotly.io as pio\n", "\n", "pio.renderers.default = \"notebook_connected\"\n", "\n", "rng = np.random.default_rng(seed=42)\n", "\n", "\n", "def observe(temperature_C: float) -> float:\n", " \"\"\"Return a noisy measurement: true peak at 60 degC, sigma = 1.\"\"\"\n", " true = -0.05 * (temperature_C - 60) ** 2 + 75.0\n", " return float(true + rng.normal(scale=1.0))\n", "\n", "\n", "# First three experiments: anchor a linear trend.\n", "\n", "xs = [40.0, 50.0, 60.0]\n", "ys = [observe(t) for t in xs]\n", "for t, y in zip(xs, ys, strict=True):\n", " print(f\" t = {t:5.1f} degC -> measured y = {y:.2f}\")" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# Fit a simple linear model on the first three points - is the surface\n", "# still going up?\n", "\n", "slope, intercept = np.polyfit(xs, ys, deg=1)\n", "print(f\"Linear fit (3 points): y = {intercept:.2f} + {slope:.3f} * t\")\n", "print(\"Slope is positive -> step further uphill.\")\n", "\n", "# Step uphill by +10 degC.\n", "\n", "next_t = max(xs) + 10\n", "xs.append(next_t)\n", "ys.append(observe(next_t))\n", "print(f\"\\nNext run at t = {next_t} degC -> y = {ys[-1]:.2f}\")" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# Add a couple more steps; switch to a quadratic fit once we have 5\n", "# points so curvature can show up.\n", "\n", "for next_t in [70.0, 80.0]:\n", " xs.append(next_t)\n", " ys.append(observe(next_t))\n", "\n", "coefs = np.polyfit(xs, ys, deg=2)\n", "poly = np.poly1d(coefs)\n", "predicted_peak = -coefs[1] / (2 * coefs[0])\n", "print(f\"Quadratic fit: y = {coefs[0]:.4f}*t^2 + {coefs[1]:.3f}*t + {coefs[2]:.2f}\")\n", "print(f\"Predicted peak temperature: {predicted_peak:.1f} degC\")\n", "print(f\"Predicted response at peak: {poly(predicted_peak):.2f}\")" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# Confirm the predicted optimum with a fresh run, then plot the journey.\n", "\n", "xs.append(predicted_peak)\n", "ys.append(observe(predicted_peak))\n", "print(f\"Confirmation run at t = {predicted_peak:.1f} degC -> y = {ys[-1]:.2f}\")\n", "\n", "grid = np.linspace(35, 85, 200)\n", "fig = go.Figure()\n", "fig.add_trace(go.Scatter(x=xs, y=ys, mode=\"markers+text\",\n", " text=[f\"{i+1}\" for i in range(len(xs))],\n", " textposition=\"top center\",\n", " name=\"Observations\", marker={\"size\": 9}))\n", "fig.add_trace(go.Scatter(x=grid, y=poly(grid), mode=\"lines\",\n", " name=\"Quadratic fit\",\n", " line={\"dash\": \"dash\"}))\n", "fig.update_layout(width=720, height=420,\n", " title=\"1-D sequential optimization\",\n", " xaxis_title=\"Temperature [degC]\",\n", " yaxis_title=\"Response y\")\n", "fig" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. admonition:: Solution\n", "\n", " The sequence above is exactly the steepest-ascent strategy:\n", "\n", " 1. Anchor with a small linear study (3 points).\n", " 2. As long as the slope is positive, step further uphill.\n", " 3. The moment the surface starts curving, switch to a quadratic\n", " model so the peak can be predicted.\n", " 4. **Confirm** the predicted optimum with one more run. If it\n", " matches, you are at the optimum (within noise). If it does\n", " not, the quadratic was still wrong and the search continues.\n", "\n", " For the simulated system above the true peak is at 60 degC.\n", " The quadratic fit on the first five points predicts a peak within\n", " a degree or two of that, depending on the noise.\n", "\n", ".. admonition:: Guidance\n", "\n", " The general loop (\"predict -> run -> compare -> decide\") is the\n", " single most important habit in DoE-driven optimization. Module 7\n", " generalizes it to two factors. The same loop runs in 4 or 5\n", " dimensions; the only thing that changes is the visualization." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Wrap-up\n", "\n", "Three transferable points from this module:\n", "\n", "- **The trade-off table** sets the budget conversation. Pick the\n", " resolution you can afford, name the generators, write the defining\n", " relation, and *then* run the experiment.\n", "- **Blocking** is how you keep nuisance factors honest. Confound\n", " them with the highest-order interaction you are willing to lose.\n", "- **Optimization is sequential.** No single design lands you on\n", " the peak; a string of small designs does.\n", "\n", "**Next:** Module 7 takes the same loop into **two factors** with the\n", "classical response-surface trio of factorial + center + axial points." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 4 }