Bivariate Analysis#

Backwards-compatible re-exporter for process_improve.bivariate.

The implementation now lives in process_improve.bivariate._elbow_peak (ENG-23 / #305): the renamed file makes filename-ranked tooling (Jump-to-File, fuzzy search, codecov reports) less ambiguous about which methods.py is being shown.

Every public name remains importable as before:

from process_improve.bivariate.methods import find_elbow_point, find_line_intersection
process_improve.bivariate.methods.find_elbow_point(x, y, max_iter=41)[source]#

Find the elbow point when plotting numeric entries in x vs numeric values in list y.

Return the index into the vectors x and y [the vectors must have the same length], where the elbow point occurs. Returns -1 if every value in x or y is missing.

Using a robust linear fit, sorts the samples in X (independent variable) and takes the first 5 samples from the left, and the last 5 from the right, then fits two linear regressions and computes the intersection of the two fitted lines. The window size is then grown over max_iter (default 41) evenly spaced steps, via numpy.linspace, up to roughly half the data, accumulating one intersection point per step.

The elbow is taken as the data point whose (x, y) location is closest to the median of the accumulated intersection points; the median location is where the intersections should stabilise.

Will probably not work well on few data points. If so, try fitting a spline to the raw data and then repeat with the interpolated data.

Parameters:
Return type:

int | float

process_improve.bivariate.methods.find_line_intersection(m1, b1, m2, b2)[source]#

Find the intersection point of two lines.

From Stackoverflow: stackoverflow.com/questions/20677795/how-do-i-compute-the-intersection-point-of-two-lines

Returns a tuple: (x, y) where the two lines intersect, given slopes m1 and m2, and intercepts b1 and b2.

Parameters:
Return type:

tuple

process_improve.bivariate.methods.fit_robust_lm(x, y)[source]#

Fits a robust linear model between Numpy vectors x and y, with an intercept. Returns a length-2 array [intercept, slope] (the params attribute returned by statsmodels.RLM); no extra checking on data consistency is done.

See also: regression.repeated_median_slope

Parameters:
Return type:

ndarray