{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Control charts for rubber colour\n",
    "\n",
    "A short series of 100 colour readings on rubber product. Each reading is a single value that should sit close to a target, with run-to-run variation small relative to the spread that signals a process upset. The job of a control chart is to draw a target line and limits, then flag readings that fall outside.\n",
    "\n",
    "**Data.** `rubber-colour.csv` from [openmv.net](https://openmv.net/info/rubber-colour). One column, no missing values, no ordering metadata; the values are assumed to be in sample order.\n",
    "\n",
    "**What we do.** Build three charts:\n",
    "\n",
    "1. A classical Shewhart chart using the standard mean and standard deviation, which mirrors what the R `qcc` package produces with `type=\"xbar.one\"`.\n",
    "2. A robust Shewhart chart using the median and a MAD-based scale estimate, which is less sensitive to the same outliers it is trying to flag.\n",
    "3. A Holt-Winters chart that blends recent history with the long-run target, useful when the series drifts.\n",
    "\n",
    "**Adapted from** the *Process monitoring* chapter of the [Process Improvement using Data](https://learnche.org/pid) book (CC BY-SA 4.0)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "from process_improve.monitoring.control_charts import ControlChart"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load and look"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = pd.read_csv(\"https://openmv.net/file/rubber-colour.csv\")\n",
    "y = data[\"Colour\"].astype(float)\n",
    "print(f\"{len(y)} readings, mean={y.mean():.2f}, sd={y.std(ddof=1):.2f}, range=[{y.min()}, {y.max()}]\")\n",
    "y.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Classical Shewhart chart\n",
    "\n",
    "`ControlChart(variant=\"xbar.no.subgroup\", style=\"regular\")` plots each observation individually against limits computed from the sample mean and sample standard deviation. The R `qcc` package's `type=\"xbar.one\"` produces the same target and a very similar standard deviation estimate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cc_regular = ControlChart(variant=\"xbar.no.subgroup\", style=\"regular\")\n",
    "cc_regular.calculate_limits(y)\n",
    "print(f\"target = {cc_regular.target:.2f}, s = {cc_regular.s:.2f}\")\n",
    "print(f\"flagged indices: {list(cc_regular.idx_outside_3S)}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_chart(y: pd.Series, target: float, s: float, flagged: list[int], title: str) -> None:\n",
    "    upper = target + 3 * s\n",
    "    lower = target - 3 * s\n",
    "    _fig, ax = plt.subplots(figsize=(9, 3.2))\n",
    "    ax.plot(y.values, marker=\"o\", linestyle=\"-\", color=\"#1f77b4\", markersize=3, linewidth=0.7)\n",
    "    ax.axhline(target, color=\"k\", linewidth=1)\n",
    "    ax.axhline(upper, color=\"r\", linewidth=1, linestyle=\"--\")\n",
    "    ax.axhline(lower, color=\"r\", linewidth=1, linestyle=\"--\")\n",
    "    if flagged:\n",
    "        ax.scatter(flagged, y.values[flagged], color=\"red\", zorder=5, s=40)\n",
    "    ax.set_xlabel(\"Sample\")\n",
    "    ax.set_ylabel(\"Colour\")\n",
    "    ax.set_title(title)\n",
    "    plt.tight_layout()\n",
    "    plt.show()\n",
    "\n",
    "\n",
    "plot_chart(y, cc_regular.target, cc_regular.s, list(cc_regular.idx_outside_3S), \"Shewhart chart (regular)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Robust Shewhart chart\n",
    "\n",
    "Replacing the mean with the median and the standard deviation with a MAD-based scale estimate prevents extreme observations from inflating the limits and hiding themselves. On a series with even a single outlier the robust chart usually flags more points than the classical chart, which is the desired behaviour: *flag, then investigate*."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cc_robust = ControlChart(variant=\"xbar.no.subgroup\", style=\"robust\")\n",
    "cc_robust.calculate_limits(y)\n",
    "print(f\"target = {cc_robust.target:.2f}, s = {cc_robust.s:.2f}\")\n",
    "print(f\"flagged indices: {list(cc_robust.idx_outside_3S)}\")\n",
    "plot_chart(y, cc_robust.target, cc_robust.s, list(cc_robust.idx_outside_3S), \"Shewhart chart (robust)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Holt-Winters chart\n",
    "\n",
    "The default `ControlChart()` is a Holt-Winters chart. It blends two smoothing constants, `lambda_1` and `lambda_2`, with `lambda_1=lambda_2=0.5` by default. This makes the chart respond to genuine process shifts while staying stable under random scatter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cc_hw = ControlChart()\n",
    "cc_hw.calculate_limits(y)\n",
    "print(f\"target = {cc_hw.target:.2f}, s = {cc_hw.s:.2f}\")\n",
    "y_star = cc_hw.df[\"y_star\"].astype(float).values\n",
    "\n",
    "_fig, ax = plt.subplots(figsize=(9, 3.2))\n",
    "ax.plot(y.values, marker=\"o\", markersize=3, linewidth=0.7, color=\"#1f77b4\", label=\"observed\")\n",
    "if len(y_star) and np.isfinite(y_star).any():\n",
    "    ax.plot(y_star, color=\"orange\", linewidth=1.2, label=\"smoothed (y*)\")\n",
    "ax.axhline(cc_hw.target, color=\"k\", linewidth=1, label=f\"target {cc_hw.target:.1f}\")\n",
    "ax.axhline(cc_hw.target + 3 * cc_hw.s, color=\"r\", linewidth=1, linestyle=\"--\")\n",
    "ax.axhline(cc_hw.target - 3 * cc_hw.s, color=\"r\", linewidth=1, linestyle=\"--\")\n",
    "ax.set_xlabel(\"Sample\")\n",
    "ax.set_ylabel(\"Colour\")\n",
    "ax.set_title(\"Holt-Winters chart\")\n",
    "ax.legend(loc=\"best\", fontsize=8)\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What to try next\n",
    "\n",
    "- Pass an explicit `target` and `s` to `calculate_limits()` to use known design values rather than estimates from the same data being charted. This is the right choice once you have a reference period of stable operation.\n",
    "- Investigate the flagged readings: do they cluster in time? Are they before or after a maintenance event? The chart only flags; the investigation belongs to you.\n",
    "- For multivariate process data, fit a PCA model and chart SPE and Hotelling's T squared instead. See the [PCA on tablet spectra](../latent-variable-modelling/pca-spectral-data.ipynb) case study."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}