Multivariate Analysis#

Latent variable methods summarize high-dimensional, correlated data into a small number of underlying variables that capture the dominant structure. A latent variable is an unobservable quantity - your overall health, for example - that manifests through measurable indicators (blood pressure, cholesterol, heart rate). In process data, a single underlying phenomenon such as a feed quality change can shift dozens of correlated sensors simultaneously. Latent variable models recover those phenomena from the measurements.

When to Use These Methods#

Latent variable methods are the right tool when your data has one or more of these characteristics (common in process industries):

Many correlated variables - traditional regression struggles with collinearity, but latent variable methods thrive on it.
More variables than observations (K > N) - ordinary least squares cannot be computed, but PCA/PLS handle this naturally.
Missing values - sensors fail, lab samples are skipped. The algorithms in this package handle incomplete data natively.
Low signal-to-noise ratio - by separating systematic variation from noise, latent variable models act as multivariate filters.
Need for visualization - score plots and loading plots reveal structure that is invisible in univariate views.

Five common applications drive their use:

Process understanding - confirm existing knowledge or discover unexpected variable relationships through score and loading plots.
Troubleshooting - after a problem occurs, screen variables to isolate the most relevant ones using contribution plots.
Optimization - move along favorable directions in the latent variable space to improve yield, quality, or throughput.
Predictive modeling - build inferential sensors that predict hard-to-measure quality variables from readily available process data.
Process monitoring - extend univariate control charts (Shewhart, CUSUM) to the multivariate case with SPE and Hotelling’s T² charts.

Available Methods#

This package provides three multivariate methods, each suited to different data structures and goals:

Method	Type	Use when …
PCA	Unsupervised	You have a single data matrix X and want to explore, monitor, or reduce dimensionality.
PLS	Supervised	You have predictor variables X and response variables Y and want to build a predictive or explanatory model.
TPLS	Multi-block	Your data is naturally organized in T-shaped blocks (materials, formulations, conditions, quality) as in batch processes.
MBPCA / MBPLS	Multi-block	You have several X-blocks (e.g. one block per processing zone, plant unit, or sensor group) and want a consensus model that respects the block structure rather than dumping every variable into a single big-X.