Multivariate Analysis#
Latent variable methods summarize high-dimensional, correlated data into a small number of underlying variables that capture the dominant structure. A latent variable is an unobservable quantity - your overall health, for example - that manifests through measurable indicators (blood pressure, cholesterol, heart rate). In process data, a single underlying phenomenon such as a feed quality change can shift dozens of correlated sensors simultaneously. Latent variable models recover those phenomena from the measurements.
When to Use These Methods#
Latent variable methods are the right tool when your data has one or more of these characteristics (common in process industries):
Many correlated variables - traditional regression struggles with collinearity, but latent variable methods thrive on it.
More variables than observations (K > N) - ordinary least squares cannot be computed, but PCA/PLS handle this naturally.
Missing values - sensors fail, lab samples are skipped. The algorithms in this package handle incomplete data natively.
Low signal-to-noise ratio - by separating systematic variation from noise, latent variable models act as multivariate filters.
Need for visualization - score plots and loading plots reveal structure that is invisible in univariate views.
Five common applications drive their use:
Process understanding - confirm existing knowledge or discover unexpected variable relationships through score and loading plots.
Troubleshooting - after a problem occurs, screen variables to isolate the most relevant ones using contribution plots.
Optimization - move along favorable directions in the latent variable space to improve yield, quality, or throughput.
Predictive modeling - build inferential sensors that predict hard-to-measure quality variables from readily available process data.
Process monitoring - extend univariate control charts (Shewhart, CUSUM) to the multivariate case with SPE and Hotelling’s T² charts.
Available Methods#
This package provides three multivariate methods, each suited to different data structures and goals:
Method |
Type |
Use when … |
|---|---|---|
Unsupervised |
You have a single data matrix X and want to explore, monitor, or reduce dimensionality. |
|
Supervised |
You have predictor variables X and response variables Y and want to build a predictive or explanatory model. |
|
Multi-block |
Your data is naturally organized in T-shaped blocks (materials, formulations, conditions, quality) as in batch processes. |
|
Multi-block |
You have several X-blocks (e.g. one block per processing zone, plant unit, or sensor group) and want a consensus model that respects the block structure rather than dumping every variable into a single big-X. |