T-shaped Partial Least Squares (TPLS)#
TPLS is a multi-block method designed for T-shaped data structures that arise naturally in batch processes, formulation studies, and similar settings where information is organized in interconnected blocks.
When to Use TPLS#
Use TPLS when your data has a natural multi-block structure that standard PLS cannot represent. Typical applications include:
Pharmaceutical manufacturing - formulation recipes, raw material properties, process conditions, and tablet quality form distinct blocks.
Chemical reaction optimization - catalyst properties, feed compositions, operating conditions, and product quality.
Food processing - ingredient properties, recipes, process settings, and sensory or nutritional outcomes.
Biotechnology - media composition, strain properties, fermentation trajectories, and yield/quality metrics.
If your data fits naturally into a single X matrix and a single Y matrix, standard PLS is simpler and should be preferred.
Data Structure#
TPLS operates on four interconnected data blocks:
Block |
Name |
Description |
|---|---|---|
D |
Properties |
Intrinsic properties of raw materials or design factors. Rows = materials, columns = properties. |
F |
Formulations |
How materials are combined in each batch/experiment. Rows = batches, columns = materials. Column names must match D’s row index. |
Z |
Conditions |
Process conditions or final-state measurements for each batch. Rows = batches, columns = condition variables. |
Y |
Quality |
Response or quality variables for each batch. Rows = batches, columns = quality variables. |
The “T-shape” comes from the way D and F link: D describes the materials (rows) while F describes how those same materials (columns) are used in each batch (rows). F, Z, and Y must all have the same number of rows (batches).
Basic Usage#
Data blocks are organized using DataFrameDict - a dictionary of
DataFrames, optionally grouped:
from process_improve.multivariate.methods import TPLS, DataFrameDict
data = DataFrameDict(
{
"D": {"Group_A": properties_a, "Group_B": properties_b},
"F": {"Group_A": formulas_a, "Group_B": formulas_b},
"Z": {"Conditions": process_conditions},
"Y": {"Quality": quality_responses},
}
)
model = TPLS(n_components=3, d_matrix=data["D"])
model.fit(data)
Key requirements:
Column names in each F group must match the row index of the corresponding D group - this is how TPLS knows which material properties correspond to which formulation amounts.
All F, Z, and Y DataFrames must have the same number of rows.
The
d_matrixparameter passed to the constructor should be the D block (material properties).