{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Why experiment? The 2x2 mindset\n", "\n", "**Source worksheet:** [yint.org/w1](https://yint.org/w1) - week 1 of a twelve-week applied DoE course for industry practitioners.\n", "\n", "Most teams already run experiments. Few of them run *designed*\n", "experiments. The promise of Design of Experiments is that a small,\n", "structured set of runs answers several questions at once, with\n", "quantified uncertainty, and points to where the next experiment should\n", "be.\n", "\n", "Module 1 starts at the foundations: how to name what you are varying,\n", "how to read a 2x2 (four runs, two factors) directly from the data,\n", "and how those four numbers already tell you the **main effect** of each\n", "factor, whether they **interact**, and where to run a fifth experiment\n", "to do better than any of the four you already have.\n", "\n", "The two worked examples below come from the week-1 worksheet. Use the\n", "*Check yourself* prompts before reading each solution; that is where\n", "the learning sticks." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. tip::\n", "\n", " In an industrial project the hardest part of DoE is almost never the\n", " mathematics. It is naming the right factors, choosing realistic\n", " levels, and agreeing on a single response you will optimize. The\n", " vocabulary in question 1 is what keeps the kickoff meeting\n", " productive." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 1 - the vocabulary of an experiment\n", "\n", "A food company wants to find which inputs drive **mouth feel**, the\n", "subjective score given by a tasting panel. For each line below, fill in\n", "one of:\n", "\n", "> **factor**, **objective**, **response**, **numerical**, **categorical**.\n", "\n", "1. The product is prepared at low or high pressure. *This is an additional ____.*\n", "2. We measure the property *P* on the product. *P could also be a ____.*\n", "3. We want to maximize the average mouth-feel score. *This is the ____.*\n", "4. The product was prepared by adding ingredient *F*, or not. *F is a ____ factor.*\n", "5. The product was prepared with either 20 mg/L or 30 mg/L of ingredient *G*. *G is a ____ factor.*" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. admonition:: Check yourself\n", "\n", " Try labeling all five before reading the solution. The trap is\n", " item 4: it is a *factor*, and the *type of factor is categorical*.\n", " Both labels are correct, but at different levels of the question." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. admonition:: Solution\n", "\n", " 1. **factor** - varying pressure is an extra independent variable we control.\n", " 2. **response** - *P* is something we measure on the product. The\n", " mouth-feel score is also a response; an experiment can have more\n", " than one.\n", " 3. **objective** - \"maximize\" is the optimization goal, separate from\n", " the choice of factors and responses.\n", " 4. **categorical** factor - ingredient *F* is either present or\n", " absent; there is no in-between value.\n", " 5. **numerical** factor - 20 and 30 mg/L sit on a continuous scale.\n", " You can later run a center point at 25 mg/L; you cannot run a\n", " center point for *F*." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. admonition:: Guidance\n", "\n", " Two pieces of vocabulary that confuse new practitioners:\n", "\n", " - A *factor* is what you change. A *response* is what you measure.\n", " A run sets each factor to a specific *level*; a response comes\n", " out of the run as data.\n", " - \"Categorical vs numerical\" describes the *factor type*, not the\n", " experiment. A real study mixes both freely. In Module 5 we will\n", " fit a linear model to a mixed-factor design." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 2 - a two-factor yogurt study\n", "\n", "Friends making yogurt at home run a 2x2 experiment to find the tastiest\n", "batch. Two factors, each at two levels:\n", "\n", "- **A = fat content of the starter yogurt** [0% or 2%]\n", "- **B = fermentation time** [10 hours or 16 hours]\n", "\n", "They rate each of the four batches on a 1-to-10 taste scale:\n", "\n", "| A (fat %) | B (hours) | Rating |\n", "|----------:|----------:|-------:|\n", "| 0 | 10 | 5 |\n", "| 2 | 10 | 8 |\n", "| 0 | 16 | 6 |\n", "| 2 | 16 | 9 |\n", "\n", "The worksheet asks four things:\n", "\n", "1. Add contour lines to the cube plot.\n", "2. Calculate the average effect of the fat content of the starter yogurt.\n", "3. Calculate the average effect of fermentation time.\n", "4. Where should the next batch be run to push the taste score to 10?\n", "\n", "We will answer all four with code." ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "import plotly.graph_objects as go\n", "import plotly.io as pio\n", "\n", "from process_improve.experiments import c, gather, lm, main_effects_plot, predict\n", "from process_improve.experiments.visualization import visualize_doe\n", "\n", "pio.renderers.default = \"notebook_connected\"\n", "\n", "# Build the design in coded units: -1 is the low level, +1 the high level.\n", "\n", "A = c(-1, +1, -1, +1, lo=0, hi=2, name=\"A\", units=\"% fat\")\n", "B = c(-1, -1, +1, +1, lo=10, hi=16, name=\"B\", units=\"hours\")\n", "y = c(5, 8, 6, 9, name=\"y\")\n", "yogurt = gather(A=A, B=B, y=y)\n", "yogurt" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# The square_plot is the 2-factor analogue of a cube plot: response\n", "# values at the four corners of the design.\n", "\n", "square = visualize_doe(\n", " plot_type=\"square_plot\",\n", " design_data=yogurt.to_dict(orient=\"records\"),\n", " response_column=\"y\",\n", " factors_to_plot=[\"A\", \"B\"],\n", " factor_labels={\"A\": \"Fat content [%]\", \"B\": \"Time [hrs]\"},\n", " backend=\"plotly\",\n", ")\n", "fig = go.Figure(square[\"plotly\"])\n", "fig.update_layout(width=520, height=440, title=\"Yogurt 2x2: corner taste scores\")\n", "fig" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# Fit a 2x2 model: intercept, two main effects, and the A:B interaction.\n", "\n", "model = lm(\"y ~ A + B + A:B\", yogurt)\n", "params = model.get_parameters(drop_intercept=False)\n", "print(params.to_string())" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# A main effect in coded units is twice the corresponding coefficient,\n", "# because the model is centered and the factor moves from -1 to +1\n", "# (a span of 2 units).\n", "\n", "effect_A = 2 * params[\"A\"]\n", "effect_B = 2 * params[\"B\"]\n", "interaction_AB = 2 * params[\"A:B\"]\n", "\n", "print(f\"Main effect of fat content (A) : {effect_A:+.2f} taste points\")\n", "print(f\"Main effect of fermentation (B): {effect_B:+.2f} taste points\")\n", "print(f\"Interaction A:B : {interaction_AB:+.2f}\")" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# Main-effects plot: average response at each level of each factor.\n", "\n", "main_effects_plot(model, factor_labels={\"A\": \"Fat content\", \"B\": \"Time\"})" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# The model predicts higher ratings if we push both factors above\n", "# their +1 levels. Aiming for a rating of 10 we extrapolate to\n", "# A=+1.5 (fat = 2.5%), B=+1.5 (time = 17.5 hrs).\n", "\n", "predicted = float(predict(model, A=1.5, B=1.5).iloc[0])\n", "print(f\"Predicted rating at A=+1.5 (2.5% fat), B=+1.5 (17.5 hrs): {predicted:.2f}\")" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. admonition:: Solution\n", "\n", " 1. **Contour lines on the cube plot.** Connect points of equal\n", " response. With ratings 5 (lo, lo), 8 (hi, lo), 6 (lo, hi),\n", " 9 (hi, hi), the contours are nearly parallel straight lines\n", " running roughly diagonal from top-left to bottom-right, getting\n", " higher as you move toward the upper-right corner. The fact that\n", " they are parallel is the visual signature of *no interaction*.\n", "\n", " 2. **Average effect of fat content (A).** Compare the mean of the\n", " two high-fat corners (8 and 9) with the mean of the two low-fat\n", " corners (5 and 6): ``(8 + 9)/2 - (5 + 6)/2 = 8.5 - 5.5 = +3.0``\n", " rating points. The code above reproduces this as ``2 * b_A``.\n", "\n", " 3. **Average effect of fermentation time (B).** Same logic:\n", " ``(6 + 9)/2 - (5 + 8)/2 = 7.5 - 6.5 = +1.0`` rating point.\n", " Fermentation time matters, but only a third as much as fat.\n", "\n", " 4. **Where to run the next batch.** The interaction A:B is\n", " essentially zero, so the response surface is a tilted plane.\n", " Extending both factors above +1 should keep climbing. The\n", " prediction at A=+1.5 (2.5% fat), B=+1.5 (17.5 hrs) is exactly\n", " ``10.0`` - that is your next run. Anything beyond that is the\n", " model extrapolating; expect the real world to push back with\n", " a flavour ceiling well before 11." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. admonition:: Check yourself\n", "\n", " If the interaction A:B had been large (say, +2 instead of 0) the\n", " contour lines would not be parallel any more. Sketch what\n", " \"non-parallel contour lines\" mean for a recipe: one factor's effect\n", " changes depending on the level of the other. Recipes are full of\n", " these. Module 3 spends real time on them." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 3 - the bioreactor\n", "\n", "A researcher running a bioreactor wants to maximize the **conversion**\n", "of raw material to product. Unconverted raw material is wasted profit.\n", "Each experiment in the table below is the average of two duplicated\n", "runs.\n", "\n", "| T (\u00b0C) | S (g/L) | Conversion [%] |\n", "|-------:|--------:|---------------:|\n", "| 35 | 1.75 | 60 |\n", "| 41 | 1.25 | 64 |\n", "| 41 | 1.75 | 69 |\n", "| 35 | 1.25 | 53 |\n", "\n", "The worksheet asks:\n", "\n", "a. Draw a large cube plot of the system (use the full space; put\n", " *T* on the vertical axis).\n", "b. Add contour lines.\n", "c. Estimate the conversion at the center point *S* = 1.50 g/L,\n", " *T* = 38 \u00b0C.\n", "d. Calculate the average effect of temperature.\n", "e. Calculate the average effect of substrate concentration.\n", "f. Where would you run the next experiment to improve conversion?" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# Coded units. A is *substrate* (S), B is *temperature* (T). Real-unit\n", "# ranges and units are attached for clarity. (Putting temperature on\n", "# the y-axis just means picking B for the second factor.)\n", "\n", "S = c(-1, +1, +1, -1, lo=1.25, hi=1.75, name=\"S\", units=\"g/L\")\n", "T = c(-1, +1, -1, +1, lo=35, hi=41, name=\"T\", units=\"degC\")\n", "y = c(53, 69, 64, 60, name=\"y\")\n", "bio = gather(S=S, T=T, y=y)\n", "bio" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "square = visualize_doe(\n", " plot_type=\"square_plot\",\n", " design_data=bio.to_dict(orient=\"records\"),\n", " response_column=\"y\",\n", " factors_to_plot=[\"S\", \"T\"],\n", " factor_labels={\"S\": \"Substrate [g/L]\", \"T\": \"Temperature [degC]\"},\n", " backend=\"plotly\",\n", ")\n", "fig = go.Figure(square[\"plotly\"])\n", "fig.update_layout(width=520, height=440, title=\"Bioreactor 2x2: conversion at each corner\")\n", "fig" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "model = lm(\"y ~ S + T + S:T\", bio)\n", "params = model.get_parameters(drop_intercept=False)\n", "print(params.to_string())\n", "\n", "effect_S = 2 * params[\"S\"]\n", "effect_T = 2 * params[\"T\"]\n", "interaction_ST = 2 * params[\"S:T\"]\n", "\n", "print(f\"\\nMain effect of substrate concentration (S): {effect_S:+.2f} percentage points\")\n", "print(f\"Main effect of temperature (T) : {effect_T:+.2f} percentage points\")\n", "print(f\"Interaction S:T : {interaction_ST:+.2f}\")\n", "\n", "center = float(predict(model, S=0, T=0).iloc[0])\n", "print(f\"\\nPredicted conversion at the center point (S=1.50 g/L, T=38 degC): {center:.2f}%\")" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# Next experiment: both effects are positive, with temperature the\n", "# larger driver. Move along the gradient, e.g. half a step beyond +1\n", "# in T and a quarter step beyond +1 in S. In real units:\n", "# T_next = 41 + 0.5 * (41 - 38) = 42.5 degC\n", "# S_next = 1.75 + 0.25 * (1.75 - 1.50) = 1.81 g/L\n", "\n", "next_S, next_T = 1.25, 1.5\n", "predicted = float(predict(model, S=next_S, T=next_T).iloc[0])\n", "print(f\"Predicted conversion at S=+{next_S} ({1.5 + next_S*0.25:.2f} g/L), \"\n", " f\"T=+{next_T} ({38 + next_T*3:.1f} degC): {predicted:.2f}%\")" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. admonition:: Solution\n", "\n", " a, b. **Square plot with contours.** Plot the four corners (53, 64\n", " bottom; 60, 69 top) with temperature on the vertical axis. The\n", " contour lines run from lower-left (cool, lean) to upper-right (hot,\n", " rich), so increasing either factor helps. They are *almost* parallel\n", " - the small twist is the S:T interaction.\n", "\n", " c. **Center-point estimate.** For a 2x2 design with no curvature term\n", " the center prediction is the mean of the four corners:\n", " ``(53 + 64 + 69 + 60) / 4 = 61.5 %``. The model agrees because the\n", " regression passes exactly through the corners and the center.\n", "\n", " d. **Average effect of temperature.** Compare the two hot corners\n", " (64, 69) with the two cool corners (53, 60):\n", " ``(64 + 69)/2 - (53 + 60)/2 = 66.5 - 56.5 = +10`` percentage points.\n", "\n", " e. **Average effect of substrate.** Compare the two rich corners\n", " (60, 69) with the two lean corners (53, 64):\n", " ``(60 + 69)/2 - (53 + 64)/2 = 64.5 - 58.5 = +6`` percentage points.\n", "\n", " f. **Where next.** Temperature has the bigger pull, so move further\n", " along *T* and a little further along *S*. A reasonable next run is\n", " T = 42.5 degC, S = 1.81 g/L - the model predicts about 73 %. In\n", " practice, check temperature limits of the organism before\n", " committing." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. admonition:: Guidance\n", "\n", " The \"next run\" calculation above is **steepest ascent** by hand.\n", " Module 7 builds out the formal procedure (with step sizing,\n", " confirmation runs, and what to do when the surface starts curving).\n", " For now, the takeaway is that every 2x2 already tells you a\n", " direction; you do not need to wait for a \"real\" optimization\n", " design to move." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Wrap-up\n", "\n", "Four runs of a 2x2 design gave you:\n", "\n", "- A clean vocabulary for the experiment (factors, levels, responses,\n", " objective).\n", "- Two **main effects** per study, calculated directly from the corner\n", " averages.\n", "- A check on whether the factors **interact** (yogurt: no; bioreactor:\n", " small effect worth tracking).\n", "- An estimate of the **center-point** conversion without running it,\n", " and a direction for the **next experiment**.\n", "\n", "**Next:** Module 2 fits the same models with coded units and a small\n", "amount of linear algebra, which generalizes the manual calculation\n", "into something that scales beyond two factors and copes with replicates,\n", "noise, and confidence intervals." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 4 }