{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Doing less: fractional factorials\n",
    "\n",
    "**Source worksheets:** [yint.org/w6](https://yint.org/w6) and [yint.org/w7](https://yint.org/w7) - weeks 6 and 7 of the applied DoE short course.\n",
    "\n",
    "A 2^5 factorial costs 32 runs.  A 2^6 costs 64.  Real budgets rarely\n",
    "stretch that far - and most of the high-order interactions you would\n",
    "spend the runs on are noise anyway.  *Fractional factorials* let you\n",
    "buy back the budget by trading **resolution**: you replace a\n",
    "high-order interaction you do not believe in (say, the four-factor\n",
    "ABCD interaction) with a new factor (say, E).  Half the runs, almost\n",
    "all the answers.\n",
    "\n",
    "The cost is **aliasing** - some effects can no longer be separated\n",
    "from each other.  This module shows the trade in action with two\n",
    "worked examples, then introduces the vocabulary that DesignExpert,\n",
    "Minitab, and the DoE literature all use: *generators*, *defining\n",
    "relation*, *words*, *resolution*."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. tip::\n",
    "\n",
    "   The central trade-off is \"I will not estimate ``ABCD`` separately\n",
    "   from any single factor\"; you gain a 32 -> 16 (or 16 -> 8) drop\n",
    "   in runs.  In practice the high-order interaction was always\n",
    "   going to be tiny, so the trade is almost free."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Q1 - counting runs in full and half fractions\n",
    "\n",
    "A full factorial in five factors costs $2^5 = 32$ runs and fits 32\n",
    "coefficients: 1 intercept, 5 main effects, 10 two-factor interactions,\n",
    "10 three-factor interactions, 5 four-factor interactions, and 1\n",
    "five-factor interaction.  A **half fraction** uses one generator\n",
    "($p = 1$), so the design has $2^{5-1} = 16$ runs."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "from math import comb\n",
    "\n",
    "k = 5\n",
    "print(f\"Full factorial 2^{k} = {2 ** k} runs and {2 ** k} coefficients\")\n",
    "for j in range(k + 1):\n",
    "    label = f\"{j}-factor interactions\" if j else \"intercept\"\n",
    "    print(f\"  {label}: {comb(k, j)}\")\n",
    "print()\n",
    "print(f\"Half-fraction 2^({k}-1) = {2 ** (k - 1)} runs\")\n",
    "print(f\"Quarter-fraction 2^({k}-2) = {2 ** (k - 2)} runs\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Q2-Q4 - the stability system, both halves\n",
    "\n",
    "Going back to the three-factor stability study from Module 3.  The\n",
    "full $2^3$ design has 8 runs; a half-fraction has 4 runs.  With\n",
    "only 4 runs and 3 main effects to estimate, the design is *saturated*\n",
    "and we get the alias relationship $C = \\pm A \\cdot B$.\n",
    "\n",
    "The two halves are obtained by choosing the rows where $C = +A \\cdot B$\n",
    "and where $C = -A \\cdot B$.  They come from the same 8-run table:\n",
    "\n",
    "**Full table** for the $2^3$ design, columns $(A, B, C, y)$:\n",
    "\n",
    "```\n",
    "(-,-,-,40) (+,-,-,27) (-,+,-,35) (+,+,-,21)\n",
    "(-,-,+,41) (+,-,+,27) (-,+,+,31) (+,+,+,20)\n",
    "```\n",
    "\n",
    "**Half-fraction** with $C = +A \\cdot B$ (rows where $C = A \\cdot B$):\n",
    "\n",
    "```\n",
    "(-,-,+,41) (+,-,-,27) (-,+,-,35) (+,+,+,20)\n",
    "```\n",
    "\n",
    "**Half-fraction** with $C = -A \\cdot B$ (rows where $C = -A \\cdot B$):\n",
    "\n",
    "```\n",
    "(-,-,-,40) (+,-,+,27) (-,+,+,31) (+,+,-,21)\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "from process_improve.experiments import c, gather, lm\n",
    "\n",
    "# Half-fraction with C = +A*B\n",
    "\n",
    "A1 = c(-1, +1, -1, +1, name=\"A\")\n",
    "B1 = c(-1, -1, +1, +1, name=\"B\")\n",
    "C1 = c(+1, -1, -1, +1, name=\"C\")\n",
    "y1 = c(41, 27, 35, 20, name=\"y\")\n",
    "half_pos = gather(A=A1, B=B1, C=C1, y=y1)\n",
    "m_pos = lm(\"y ~ A + B + C\", half_pos)\n",
    "print(\"Half-fraction with C = +A*B:\")\n",
    "print(m_pos.get_parameters(drop_intercept=False).to_string())"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "# Half-fraction with C = -A*B\n",
    "\n",
    "A2 = c(-1, +1, -1, +1, name=\"A\")\n",
    "B2 = c(-1, -1, +1, +1, name=\"B\")\n",
    "C2 = c(-1, +1, +1, -1, name=\"C\")\n",
    "y2 = c(40, 27, 31, 21, name=\"y\")\n",
    "half_neg = gather(A=A2, B=B2, C=C2, y=y2)\n",
    "m_neg = lm(\"y ~ A + B + C\", half_neg)\n",
    "print(\"Half-fraction with C = -A*B:\")\n",
    "print(m_neg.get_parameters(drop_intercept=False).to_string())"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "# Full 2^3 model (Module 3) gave these coefficients:\n",
    "#   Intercept = 30.25, A = -6.5, B = -3.5, C = -0.5\n",
    "# The two half-fractions average back to the full coefficients.\n",
    "\n",
    "import pandas as pd\n",
    "\n",
    "full = pd.Series({\"Intercept\": 30.25, \"A\": -6.5, \"B\": -3.5, \"C\": -0.5})\n",
    "pos = m_pos.get_parameters(drop_intercept=False)\n",
    "neg = m_neg.get_parameters(drop_intercept=False)\n",
    "avg = (pos + neg) / 2\n",
    "out = pd.DataFrame({\"Half C=+AB\": pos, \"Half C=-AB\": neg, \"Average of halves\": avg, \"Full 2^3\": full})\n",
    "print(out.to_string())"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. admonition:: Solution\n",
    "\n",
    "   What you read off the halves:\n",
    "\n",
    "   - Half ``C = +A*B``: ``A = -7.25``, ``B = -3.25``, ``C = -0.25``.\n",
    "   - Half ``C = -A*B``: ``A = -5.75``, ``B = -3.75``, ``C = -0.75``.\n",
    "\n",
    "   The averages, ``A = -6.5``, ``B = -3.5``, ``C = -0.5``, match the\n",
    "   full ``2^3`` model exactly.  This is the beautiful property of\n",
    "   *complementary* half-fractions: each is biased by the\n",
    "   confounded interaction, but the bias has opposite sign so averaging\n",
    "   cancels it.\n",
    "\n",
    "   **The aliasing pattern.**  In the C=AB half:\n",
    "   ``b_A_hat = b_A + b_BC``, ``b_B_hat = b_B + b_AC``,\n",
    "   ``b_C_hat = b_C + b_AB``.  Plugging in the full-model values\n",
    "   ``b_BC = -0.75``, ``b_AC = +0.25``, ``b_AB = +0.25`` reproduces the\n",
    "   half's coefficients to the dollar.  In the C=-AB half the signs flip."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Q5-Q8 - half-fraction of the bioreactor (D = ABC)\n",
    "\n",
    "Back to the 16-run bioreactor from Module 4, but now imagine the\n",
    "budget only stretched to **8** runs.  Using the generator ``D = A*B*C``\n",
    "keeps the design balanced and gives a *resolution-IV* fraction: main\n",
    "effects are clear of two-factor interactions, but two-factor\n",
    "interactions are aliased *with each other*.\n",
    "\n",
    "The defining relation is ``I = ABCD``, so the alias pairs are\n",
    "``AB = CD``, ``AC = BD``, ``AD = BC``, and every main effect is\n",
    "aliased with the three-factor interaction obtained by multiplying\n",
    "through by ABCD."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "# Pick the 8 rows from the standard-order 16 where D == A*B*C.\n",
    "\n",
    "A = c(-1, +1, +1, -1, +1, -1, -1, +1, name=\"A\")\n",
    "B = c(-1, +1, -1, +1, -1, +1, -1, +1, name=\"B\")\n",
    "C = c(-1, -1, +1, +1, -1, -1, +1, +1, name=\"C\")\n",
    "D = c(-1, -1, -1, -1, +1, +1, +1, +1, name=\"D\")\n",
    "y = c(60, 61, 61, 94, 63, 70, 44, 77, name=\"y\")\n",
    "half_bio = gather(A=A, B=B, C=C, D=D, y=y)\n",
    "half_bio"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "# Fit main effects and the three independent two-factor groups.\n",
    "# (AB and CD share a column; the design only resolves the sum of their\n",
    "# coefficients.  Same for AC = BD and AD = BC.)\n",
    "\n",
    "m_half = lm(\"y ~ A + B + C + D + A:B + A:C + A:D\", half_bio)\n",
    "print(m_half.get_parameters(drop_intercept=False).to_string())"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. admonition:: Solution\n",
    "\n",
    "   The half-fraction's coefficients line up with the full-model\n",
    "   coefficients from Module 4 plus their aliased partners:\n",
    "\n",
    "   ::\n",
    "\n",
    "       half b_B    = +9.25   ~= full b_B  (+9.0)   + b_ACD (~0.25)\n",
    "       half b_C    = +2.75   ~= full b_C  (+4.0)   - b_ABD (~1.25)\n",
    "       half b_D    = -2.75   ~= full b_D  (-3.875) + b_ABC (~1.125)\n",
    "       half b_AB   = -5.75   ~= full b_AB (-0.5)   + b_CD  (-5.25)\n",
    "       half b_AC   = +0.75   ~= full b_AC (-0.5)   + b_BD  (+1.25)\n",
    "       half b_AD   = +7.25   ~= full b_AD (+0.875) + b_BC  (+6.375)\n",
    "\n",
    "   **Same qualitative conclusions as the full design**: B dominates,\n",
    "   D hurts, and one of the (AB=CD), (AD=BC) interactions is large.\n",
    "   The half-fraction cannot tell you which member of the pair is\n",
    "   responsible - you would resolve that with a *fold-over*: run the\n",
    "   other half and combine the data.\n",
    "\n",
    ".. admonition:: Guidance\n",
    "\n",
    "   Half-fractions are the cheapest way to **screen** many factors:\n",
    "   spend the first 8 runs on a half, see which main effects and\n",
    "   two-factor groups light up, then decide whether to spend the\n",
    "   second 8 to resolve the aliases or to move on to a new study\n",
    "   focused on the surviving factors."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Vocabulary you will meet (w7)\n",
    "\n",
    "These terms appear in Minitab, DesignExpert, JMP, and the DoE\n",
    "literature.  None of them are unique to ``process_improve``, but you\n",
    "will hit them every time you read a screening study.\n",
    "\n",
    "| Term | Plain English |\n",
    "|---|---|\n",
    "| **Factor** | Something we deliberately change. Measured *and* controlled. |\n",
    "| **Disturbance** | A real-world influence we cannot control and usually cannot measure. |\n",
    "| **Covariate** | A real-world influence we can *measure* but not *control*. Worth recording so we can model it. |\n",
    "| **Nuisance factor** | A controlled factor we do not care about scientifically (operator, batch, day). Handle with **blocking**. |\n",
    "| **Generator** | An equation like ``D = ABC`` that defines how an extra factor is built from the columns of a smaller factorial. |\n",
    "| **Defining relation** | The product of all generators with the identity, e.g. ``I = ABCD``. Tells you every alias. |\n",
    "| **Word** | Each ``ABCD``-style group in the defining relation. ``I = ABCD`` is a one-word relation. |\n",
    "| **Resolution** | Length of the shortest word in the defining relation. Res IV = main effects clear of two-factor interactions; Res V = also clear of two-factor x two-factor confounding. |"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. admonition:: Check yourself\n",
    "\n",
    "   - Q7.2 - A variable that cannot be measured or controlled is a\n",
    "     **disturbance**.\n",
    "   - Q7.3 - A variable measured but not controlled is a **covariate**.\n",
    "   - Q7.4 - Yes, you can have something controlled but not measured -\n",
    "     a held-constant condition.  In practice you measure it anyway,\n",
    "     because constants drift.\n",
    "   - Q7.5 - Refusing to randomize a hard-to-change factor means you\n",
    "     are confounding it with time and any drift in the equipment.\n",
    "     If the experimenter is the same person and the equipment is the\n",
    "     same, you are still confounding with operator fatigue, batch of\n",
    "     reagent, ambient temperature, and the order itself."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A worked example - the CalApp screening study\n",
    "\n",
    "The source worksheet (w7 Q6) ends with a small case study, *CalApp*,\n",
    "to make the vocabulary concrete.  A team is screening drivers of\n",
    "60-day app retention.  Three of the inputs are deliberately\n",
    "manipulated and become the factors of the design:\n",
    "\n",
    "- **A** = promotional offer (yes / no),\n",
    "- **B** = marketing message (variant 1 / variant 2),\n",
    "- **C** = in-app purchase price (low / high).\n",
    "\n",
    "Six other variables describe each user or device but are *not* set\n",
    "by the experimenter:\n",
    "\n",
    "- **E** = the user's age,\n",
    "- **N** = the user's gender,\n",
    "- **S** = the user's connection type (cellular or wifi),\n",
    "- **R** = the device's free memory (RAM),\n",
    "- **F** = which advertising network served the install (G or H),\n",
    "- **D** = whether the device is Apple or Android.\n",
    "\n",
    "For each of those six, decide: is it a **factor**, a **covariate**,\n",
    "a **disturbance**, or a **nuisance** variable?  The solution below\n",
    "walks through them."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. admonition:: Solution\n",
    "\n",
    "   For the CalApp screening example (Q7.6):\n",
    "\n",
    "   - **E** (user age): **covariate** - measured, not controlled.\n",
    "   - **N** (gender): **covariate** - measured, not controlled.\n",
    "   - **S** (cell vs wifi): **covariate** that could also be a **nuisance**\n",
    "     factor if it correlates with engagement.\n",
    "   - **R** (free RAM): **covariate** - measured, not controlled.\n",
    "   - **F** (ad network G vs H): could be a **factor** (you choose it)\n",
    "     or a **nuisance** factor depending on whether it is part of the\n",
    "     study's question.\n",
    "   - **D** (Apple vs Android): could be a **factor**, a **nuisance**\n",
    "     factor (block on it), or a **covariate** depending on the\n",
    "     hypothesis."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Wrap-up\n",
    "\n",
    "Two transferable habits:\n",
    "\n",
    "- **Fractional design first**, full design only if needed.  A half-\n",
    "  fraction usually answers 80% of the question for 50% of the budget.\n",
    "- **Read the defining relation before fitting**.  Knowing the alias\n",
    "  pattern upfront tells you which conclusions are *robust* and which\n",
    "  need a fold-over to resolve.\n",
    "\n",
    "**Next:** Module 6 returns to the trade-off table from w8 and starts\n",
    "the move into **optimization** with a 1-D response surface study,\n",
    "introducing path-of-steepest-ascent thinking that Module 7 then\n",
    "generalizes to two dimensions."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}