Model Evaluation and Visualization
==================================

After fitting a PCA or PLS model, the next questions are practical: how many
components should the model keep, how well does it predict, and which
variables drive it? This page is a worked tour of the evaluation and plotting
tools, built around a PLS model that relates a set of process measurements
(``X``) to quality outcomes (``Y``). Each example assumes ``X`` and ``Y`` are
already scaled, for example with ``MCUVScaler``.

Choosing the Number of Components
---------------------------------

Calibration fit always improves as components are added, so it cannot tell
you when to stop. ``PLS.select_n_components`` cross-validates the model and
reports the root-mean-square error of cross-validation (RMSECV) together with
the validated explained variance:

.. code-block:: python

   from process_improve.multivariate import PLS

   result = PLS.select_n_components(X, Y, max_components=8, cv=5)
   print(result.n_components)        # recommended component count
   print(result.rmsecv["total"])     # RMSECV per component count

See :doc:`cross_validation` for the full description, including
``PLS.cross_validate`` for beta-coefficient error bars.

Explained Variance
------------------

Once the model is fitted, ``explained_variance_plot`` shows how much variance
each component captures, both per component and cumulatively:

.. code-block:: python

   model = PLS(n_components=result.n_components).fit(X, Y)
   model.explained_variance_plot()

For PCA the bars refer to variance in the X-block; for PLS they refer to the
Y-block. The same method is available on a fitted PCA model.

Correlation Loadings
--------------------

``correlation_loadings_plot`` places each variable by its correlation with
two components' scores. A variable's squared distance from the origin is the
fraction of its variance explained by those two components, so every variable
lies inside the unit circle. Concentric ellipses mark variance-explained
thresholds:

.. code-block:: python

   model.correlation_loadings_plot(pc_horiz=1, pc_vert=2)

For PLS the X- and Y-variables are overlaid, which reveals how process
variables relate to quality outcomes. The ellipse thresholds are
configurable. The 50% and 100% ellipses are the convention - the outer
ellipse is the unit circle, the inner one marks variables that are well
explained - but any fractions work:

.. code-block:: python

   model.correlation_loadings_plot(variance_ellipses=(0.75, 0.95))

Observed versus Predicted
-------------------------

``predictions_vs_observed_plot`` draws a parity plot of the calibration
predictions against the observed Y, with a ``y = x`` reference line and an
RMSE annotation:

.. code-block:: python

   model.predictions_vs_observed_plot(y_observed=Y, variable="quality")

Points close to the reference line indicate accurate predictions; systematic
departures from it point to model bias.

Regression Coefficients
-----------------------

``coefficient_plot`` shows the PLS regression coefficients as a bar chart,
one bar per X-variable, for a chosen Y-variable:

.. code-block:: python

   model.coefficient_plot(variable="quality")

Tall bars mark the X-variables that most strongly drive the prediction. To
see how *reliable* each coefficient is, pair this plot with the
cross-validated error bars from ``PLS.cross_validate`` (see
:doc:`cross_validation`).

Comparing Two Data Blocks
-------------------------

The RV coefficient and its modified form RV2 measure how much common
structure two matrices, measured on the same observations, share. They are a
multivariate generalization of a squared correlation:

.. code-block:: python

   from process_improve.multivariate import rv_coefficient, rv2_coefficient

   rv_coefficient(X, Y)     # in [0, 1]; 1 means identical configurations
   rv2_coefficient(X, Y)    # modified RV, unbiased for high-dimensional data

Use ``rv2_coefficient`` when the blocks have many more variables than
observations: the ordinary RV coefficient is biased upwards in that regime
and tends towards 1 even for unrelated blocks.