Projection to Latent Structures (PLS)
======================================

PLS finds directions in **X** that are maximally correlated with **Y**. It
is the method of choice when you have predictor variables and response
variables that you want to relate to each other.

Mathematical Background
-----------------------

PLS simultaneously decomposes both the X and Y matrices:

.. math::

   \mathbf{X} = \mathbf{T}\mathbf{P}^T + \mathbf{E}_X

   \mathbf{Y} = \mathbf{U}\mathbf{Q}^T + \mathbf{E}_Y

where:

- **T** (N × A) - X-scores (projections of observations in X-space).
- **U** (N × A) - Y-scores (projections in Y-space).
- **P** (K × A) - X-loadings.
- **Q** (M × A) - Y-loadings.
- **E_X**, **E_Y** - residual matrices.

The algorithm pursues three objectives simultaneously: explain the variance
in X, explain the variance in Y, and maximize the *covariance* between the
two sets of scores. This is what distinguishes PLS from simply applying PCA
to X and then regressing on Y (which is PCR - Principal Components
Regression).

Why PLS Over Alternatives
-------------------------

**vs Multiple Linear Regression (MLR):**

- MLR requires more observations than variables (N > K) and fails with
  collinear X variables. PLS handles both situations naturally.
- PLS provides built-in noise reduction: by projecting onto a few latent
  variables it filters out measurement noise.
- PLS gives consistency checks through SPE and T², which MLR lacks.
- PLS handles missing values in X natively.

**vs Principal Components Regression (PCR):**

- PCR first applies PCA to X alone, then regresses on Y. The PCA step may
  choose directions that explain X variance but are *irrelevant* to Y.
- PLS uses both X and Y simultaneously, so it finds directions that are
  both well-represented in X and predictive of Y. This typically requires
  fewer components for the same predictive quality.

**Multiple Y columns:** PLS builds a single model for multiple correlated
response variables, using the correlation between Y columns to improve
predictions for each one.

Interpreting Scores
-------------------

PLS produces two sets of scores:

- **T** (``model.scores_``) - the X-scores, always available even for new
  observations where Y is unknown. These are the primary scores for
  interpretation and monitoring.
- **U** (``model.y_scores_``) - the Y-scores, available only when Y data
  exists. Useful for examining inner-model relationships.

A critical difference from PCA: *PCA scores explain only X variance; PLS
scores are calculated to also explain Y*. This means PLS components may not
capture the largest X variation - they capture the variation most relevant to
predicting Y.

Score interpretation otherwise follows PCA: look for clusters, outliers, and
time trends in the T-scores.

Interpreting Loadings and Weights
---------------------------------

PLS has several related vectors that describe variable importance:

- **X-weights** (``model.x_weights_``) - the raw weight vectors **w** used
  during the iterative algorithm. Each **w** is found on deflated data.
- **X-loadings** (``model.x_loadings_``) - the regression coefficients **p**
  relating X to T.
- **Direct weights** (``model.direct_weights_``) - also called **r** or
  **w*** in the literature. These show the effect of each *original*
  (undeflated) variable on the scores. Prefer these for interpretation:
  they account for all prior components and give a clearer picture of each
  variable's total contribution.
- **Y-loadings** (``model.y_loadings_``) - how each Y variable contributes
  to the latent structure.

A powerful visualization technique is to **overlay the X and Y loadings** on
the same plot. Since X and Y variables originate from the same physical
system (just artificially separated into cause and effect), their joint
loading plot reveals the interconnections between process conditions and
quality outcomes.

SPE, T², Contributions, and Outlier Detection
----------------------------------------------

These diagnostics work identically to PCA (see :doc:`pca`):

- ``model.spe_`` and ``model.spe_limit()`` - conformity to the X-space
  correlation structure.
- ``model.hotellings_t2_`` and ``model.hotellings_t2_limit()`` - extremity
  within the model.
- ``model.score_contributions()`` - decompose scores back to original
  variables.
- ``model.detect_outliers()`` - combined statistical + robust ESD detection.

Predictions
-----------

After fitting, ``model.predict(X_new)`` returns a ``Bunch`` with:

- ``predictions_`` - the predicted Y values.
- ``scores_`` - the X-scores for the new observations.
- ``spe_`` - SPE values for the new observations.
- ``hotellings_t2_`` - T² values for the new observations.

The underlying regression relationship is captured in
``model.beta_coefficients_``, which maps directly from (preprocessed) X to
predicted Y.

Model Selection
---------------

Choosing the number of components is even more critical for PLS than for PCA,
because PLS can overfit more aggressively: it will find directions that
correlate X with Y in the training data even if those correlations are
spurious. Cross-validation is essential.

The same PRESS / Wold's criterion approach described in
:doc:`cross_validation` applies. A practical check: if the training R² is
much higher than the test-set R² (gap > 0.15–0.20), overfitting is likely
and you should reduce the number of components.

Missing Data and Troubleshooting
--------------------------------

See the :doc:`pca` page - the same algorithms (TSR, NIPALS, SCP), settings,
and troubleshooting advice apply to PLS models.