Process Monitoring#

Class for ControlChart: robust control charts with a balance between CUSUM and Shewhart properties.

process_improve.monitoring.control_charts.rho(x, k=2.52)[source]#

Bi-weight rho function.

Fixed constant of k=2.52 is from p 289 of the paper https://onlinelibrary.wiley.com/doi/abs/10.1002/for.1125

Parameters:
Return type:

float

process_improve.monitoring.control_charts.psi(x, k=2.0)[source]#

Pre-clean based on the Huber y-function.

Can be interpreted as replacing unexpected high or low values by a more likely value. From p 288 of the paper https://onlinelibrary.wiley.com/doi/abs/10.1002/for.1125

Parameters:
Return type:

float

class process_improve.monitoring.control_charts.ControlChart(style='robust', variant='HW')[source]#

Bases: object

Create control chart instance objects.

Parameters:
__init__(style='robust', variant='HW')[source]#

Create/initialize a control chart.

Args: style (str, optional): Which style control chart to calculate. Defaults to “robust”.

Other choice is ‘regular’ (i.e. not-robust) calculations. User should then ensure that no outliers are present in the data.

variant (str, optional): Many variants of control charts are available. The variant

string is compared case-insensitively (it is normalised via .strip().lower() on assignment), so 'HW', 'hw', and 'Hw' are all equivalent.

The default is a Holt-Winters (‘hw’) chart, with automatic determination of control chart parameters. This chart is a blend of infinite history (CUSUM) charts, and an instantaneous (no history taken into account) Shewhart chart. The exact blend is specified by parameters ld_1 (lambda 1) and ld_2 (lambda 2).

Other variants are:

‘xbar.no.subgroup’ [Shewhart chart, with no subgroups]. In other words, each observation is independently plotted on the control chart.

‘cusum’ (CUmulative SUM) chart, which uses all the history of the chart.

Parameters:
Return type:

None

calculate_limits(y, target=None, s=None, **kwargs)[source]#

Find for a given vector y, the control chart target and limits.

Works for both the Holt-Winters (‘hw’) and ‘xbar.no.subgroup’ variants.

For the Holt-Winters variant, when there are fewer than

min(20, max(10, np.ceil(0.10 * N)))

measurements (where N is the length of the input vector), the target and standard deviation are estimated directly from the data and any provided target / s are ignored for that small-sample case. Otherwise, if target and s are numeric, those values are used; if not, they are estimated.

Parameters:
Return type:

None

process_improve.monitoring.metrics.calculate_cpk(df, which_column, specifications=(nan, nan), trim_percentile=2.5)[source]#

Calculate the process capability, Cpk, near either the lower or the upper limit [will be automatically determined which].

Process capability, nearer the lower limit = (avg - lower_spec)/(3 x std deviation) Process capability, nearer the upper limit = (upper_spec - avg)/(3 x std deviation)

Parameters:
  • df (pd.DataFrame) – Raw data, at least one column is numeric.

  • which_column (str) – Indicates which is the column of data that should be used for the Cpk calculation.

  • specifications (tuple of (lower, upper), optional) –

    A 2-tuple (lower_spec, upper_spec) of the lower and upper specification limits. Each element may be:

    • a numeric value, when the specification is constant over time;

    • a string, interpreted as a column name in df whose values give the per-row specification (use this when the specification changes over time);

    • None, in which case the corresponding spec is estimated from the data using trim_percentile (a percentile-based robust limit).

    Default is (np.nan, np.nan), which treats both specs as numeric NaN and yields NaN for the corresponding side of the Cpk calculation.

  • trim_percentile (float, optional) – Controls two things. (1) When a specification limit is missing, trim_percentile is used as a percentile on the data (in percent) to estimate that limit: the lower spec is set to np.nanpercentile(data, trim_percentile) and the upper spec to np.nanpercentile(data, 100 - trim_percentile). Default 2.5 therefore yields the 2.5th and 97.5th percentiles. (2) When trim_percentile > 0 the centre/spread used in the Cpk formula switch from mean/std to robust alternatives (median and Sn); when 0 the classical mean/std are used.

Returns:

A bunch with the following fields:

  • cpk: the Cpk value (the limiting, i.e. smaller, of the two sides).

  • center: the center (mean or median) of the limiting side.

  • spread: the spread (standard deviation or Sn) of the limiting side.

  • rsd: the relative standard deviation of the limiting side, as a percentage, (spread / center) * 100.

Return type:

sklearn.utils.Bunch