{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Control charts for rubber colour\n", "\n", "A short series of 100 colour readings on rubber product. Each reading is a single value that should sit close to a target, with run-to-run variation small relative to the spread that signals a process upset. The job of a control chart is to draw a target line and limits, then flag readings that fall outside.\n", "\n", "**Data.** `rubber-colour.csv` from [openmv.net](https://openmv.net/info/rubber-colour). One column, no missing values, no ordering metadata; the values are assumed to be in sample order.\n", "\n", "**What we do.** Build three charts:\n", "\n", "1. A classical Shewhart chart using the standard mean and standard deviation, which mirrors what the R `qcc` package produces with `type=\"xbar.one\"`.\n", "2. A robust Shewhart chart using the median and a MAD-based scale estimate, which is less sensitive to the same outliers it is trying to flag.\n", "3. A Holt-Winters chart that blends recent history with the long-run target, useful when the series drifts.\n", "\n", "**Adapted from** the *Process monitoring* chapter of the [Process Improvement using Data](https://learnche.org/pid) book (CC BY-SA 4.0)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "\n", "from process_improve.monitoring.control_charts import ControlChart" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load and look" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = pd.read_csv(\"https://openmv.net/file/rubber-colour.csv\")\n", "y = data[\"Colour\"].astype(float)\n", "print(f\"{len(y)} readings, mean={y.mean():.2f}, sd={y.std(ddof=1):.2f}, range=[{y.min()}, {y.max()}]\")\n", "y.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Classical Shewhart chart\n", "\n", "`ControlChart(variant=\"xbar.no.subgroup\", style=\"regular\")` plots each observation individually against limits computed from the sample mean and sample standard deviation. The R `qcc` package's `type=\"xbar.one\"` produces the same target and a very similar standard deviation estimate." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cc_regular = ControlChart(variant=\"xbar.no.subgroup\", style=\"regular\")\n", "cc_regular.calculate_limits(y)\n", "print(f\"target = {cc_regular.target:.2f}, s = {cc_regular.s:.2f}\")\n", "print(f\"flagged indices: {list(cc_regular.idx_outside_3S)}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def plot_chart(y: pd.Series, target: float, s: float, flagged: list[int], title: str) -> None:\n", " upper = target + 3 * s\n", " lower = target - 3 * s\n", " _fig, ax = plt.subplots(figsize=(9, 3.2))\n", " ax.plot(y.values, marker=\"o\", linestyle=\"-\", color=\"#1f77b4\", markersize=3, linewidth=0.7)\n", " ax.axhline(target, color=\"k\", linewidth=1)\n", " ax.axhline(upper, color=\"r\", linewidth=1, linestyle=\"--\")\n", " ax.axhline(lower, color=\"r\", linewidth=1, linestyle=\"--\")\n", " if flagged:\n", " ax.scatter(flagged, y.values[flagged], color=\"red\", zorder=5, s=40)\n", " ax.set_xlabel(\"Sample\")\n", " ax.set_ylabel(\"Colour\")\n", " ax.set_title(title)\n", " plt.tight_layout()\n", " plt.show()\n", "\n", "\n", "plot_chart(y, cc_regular.target, cc_regular.s, list(cc_regular.idx_outside_3S), \"Shewhart chart (regular)\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Robust Shewhart chart\n", "\n", "Replacing the mean with the median and the standard deviation with a MAD-based scale estimate prevents extreme observations from inflating the limits and hiding themselves. On a series with even a single outlier the robust chart usually flags more points than the classical chart, which is the desired behaviour: *flag, then investigate*." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cc_robust = ControlChart(variant=\"xbar.no.subgroup\", style=\"robust\")\n", "cc_robust.calculate_limits(y)\n", "print(f\"target = {cc_robust.target:.2f}, s = {cc_robust.s:.2f}\")\n", "print(f\"flagged indices: {list(cc_robust.idx_outside_3S)}\")\n", "plot_chart(y, cc_robust.target, cc_robust.s, list(cc_robust.idx_outside_3S), \"Shewhart chart (robust)\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Holt-Winters chart\n", "\n", "The default `ControlChart()` is a Holt-Winters chart. It blends two smoothing constants, `lambda_1` and `lambda_2`, with `lambda_1=lambda_2=0.5` by default. This makes the chart respond to genuine process shifts while staying stable under random scatter." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cc_hw = ControlChart()\n", "cc_hw.calculate_limits(y)\n", "print(f\"target = {cc_hw.target:.2f}, s = {cc_hw.s:.2f}\")\n", "y_star = cc_hw.df[\"y_star\"].astype(float).values\n", "\n", "_fig, ax = plt.subplots(figsize=(9, 3.2))\n", "ax.plot(y.values, marker=\"o\", markersize=3, linewidth=0.7, color=\"#1f77b4\", label=\"observed\")\n", "if len(y_star) and np.isfinite(y_star).any():\n", " ax.plot(y_star, color=\"orange\", linewidth=1.2, label=\"smoothed (y*)\")\n", "ax.axhline(cc_hw.target, color=\"k\", linewidth=1, label=f\"target {cc_hw.target:.1f}\")\n", "ax.axhline(cc_hw.target + 3 * cc_hw.s, color=\"r\", linewidth=1, linestyle=\"--\")\n", "ax.axhline(cc_hw.target - 3 * cc_hw.s, color=\"r\", linewidth=1, linestyle=\"--\")\n", "ax.set_xlabel(\"Sample\")\n", "ax.set_ylabel(\"Colour\")\n", "ax.set_title(\"Holt-Winters chart\")\n", "ax.legend(loc=\"best\", fontsize=8)\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What to try next\n", "\n", "- Pass an explicit `target` and `s` to `calculate_limits()` to use known design values rather than estimates from the same data being charted. This is the right choice once you have a reference period of stable operation.\n", "- Investigate the flagged readings: do they cluster in time? Are they before or after a maintenance event? The chart only flags; the investigation belongs to you.\n", "- For multivariate process data, fit a PCA model and chart SPE and Hotelling's T squared instead. See the [PCA on tablet spectra](../latent-variable-modelling/pca-spectral-data.ipynb) case study." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 5 }