Designed Experiments#

Various factorial designs.

process_improve.experiments.designs_factorial.full_factorial(nfactors, names=None)[source]#

Create a full factorial (2^k) design for the case when there are nfactors [integer] number of factors.

The optional list of names can be provided. The entries in the list should be strings. If not provided, the names will be created.

Parameters:
  • nfactors (int)

  • names (list | None)

Return type:

list

process_improve.experiments.models.forg(x, prec=3)[source]#

Yanked from the code for Statsmodels / iolib / summary.py and adjusted.

Parameters:
Return type:

str

class process_improve.experiments.models.Model(OLS_instance, model_spec, aliasing=None, name=None)[source]#

Bases: OLS

Just a thin wrapper around the OLS class from Statsmodels.

Parameters:
  • OLS_instance (Any)

  • model_spec (str)

  • aliasing (dict | None)

  • name (str | None)

summary(alpha=0.05, print_to_screen=True)[source]#

Side effect: prints to the screen.

Parameters:
Return type:

Any

get_parameters(drop_intercept=True)[source]#

Get the parameter values; return them in a Pandas dataframe.

Parameters:

drop_intercept (bool)

Return type:

DataFrame

get_factor_names(level=1)[source]#

Get the factors in a model which correspond to a certain level.

1 : pure factors 2 : 2-factor interactions and quadratic terms 3 : 3-factor interactions and cubic terms 4 : etc

Parameters:

level (int)

Return type:

list[str]

get_response_name()[source]#

Get the name of the response variable from the model specification.

Return type:

str

get_title()[source]#

Get the model’s title, if it has one. Always returns a string.

Return type:

str

get_aliases(aliasing_up_to_level=2, drop_intercept=True, websafe=False)[source]#

Return a list, containing strings, representing the aliases of the fitted effects.

aliasing_up_to_level: up to which level of interactions shown

drop_intercept: default is True, but sometimes it is interesting to

know which effects are aliased with the intercept

websafe: default is False; if True, will print the first term

in the aliasing in bold, since that is the nominally estimated effect.

Parameters:
  • aliasing_up_to_level (int)

  • drop_intercept (bool | None)

  • websafe (bool | None)

Return type:

list

process_improve.experiments.models.predict(model, **kwargs)[source]#

Make predictions from the model.

Parameters:
Return type:

Any

process_improve.experiments.models.lm(model_spec, data, name=None, alias_threshold=0.995)[source]#

Create a linear model.

Parameters:
Return type:

Model

process_improve.experiments.models.summary(model, show=True, aliasing_up_to_level=3)[source]#

Print a summary to the screen of the model.

Appends, if there is any aliasing, a summary of those aliases, up to the (integer) level of interaction: aliasing_up_to_level.

Parameters:
  • model (Model)

  • show (bool | None)

  • aliasing_up_to_level (int)

Return type:

Any

Design evaluation: quality metrics for experimental designs.

Provides evaluate_design(), which computes properties and quality metrics of an existing design matrix. Supported metrics include efficiency values (D/I/G), prediction variance, VIF, condition number, power analysis, alias structure, confounding, resolution, defining relation, clear effects, minimum aberration, and degrees of freedom.

Example

>>> from process_improve.experiments import evaluate_design, generate_design, Factor
>>> factors = [Factor(name="A", low=0, high=10), Factor(name="B", low=0, high=10)]
>>> result = generate_design(factors, design_type="full_factorial", center_points=0)
>>> metrics = evaluate_design(result, model="interactions", metric=["d_efficiency", "vif"])
process_improve.experiments.evaluate.evaluate_design(design_matrix, model=None, metric='d_efficiency', effect_size=None, alpha=0.05, sigma=None)[source]#

Compute quality metrics for an experimental design.

Parameters:
  • design_matrix (DataFrame or DesignResult) – The design to evaluate. If a DesignResult is passed, the coded design matrix and any generator / defining-relation metadata are extracted automatically.

  • model (str or None) – Model type: "main_effects", "interactions", "quadratic", or an explicit patsy formula. None defaults to "interactions".

  • metric (str or list[str]) – One or more metric names to compute. Valid names: "alias_structure", "confounding", "resolution", "defining_relation", "power", "d_efficiency", "i_efficiency", "g_efficiency", "prediction_variance", "degrees_of_freedom", "vif", "condition_number", "clear_effects", "minimum_aberration".

  • effect_size (float or None) – Expected effect size for power calculation. When None, a power curve over a range of effect sizes is returned instead.

  • alpha (float) – Significance level for power calculation (default 0.05).

  • sigma (float or None) – Estimated noise standard deviation. Defaults to 1.0 when needed but not provided.

Returns:

Results keyed by metric name. The structure of each value depends on the metric - see individual metric documentation.

Return type:

dict[str, Any]

Examples

>>> from process_improve.experiments import evaluate_design, generate_design, Factor
>>> factors = [Factor(name="A", low=0, high=10), Factor(name="B", low=0, high=10)]
>>> result = generate_design(factors, design_type="full_factorial", center_points=0)
>>> metrics = evaluate_design(result, model="main_effects", metric="d_efficiency")
>>> metrics["d_efficiency"]
100.0

Experiment analysis: fit models, ANOVA, diagnostics, residuals.

Provides analyze_experiment(), the main analytical workhorse for designed experiments (Tool 3 in the DOE tool architecture).

Uses statsmodels and scipy for the heavy lifting, with thin custom code for lack-of-fit, curvature test, Lenth’s method, pred-R², adequate precision, and confirmation run testing.

process_improve.experiments.analysis.build_formula(response, factors, model=None)[source]#

Build a patsy/statsmodels formula string.

Parameters:
  • response (str) – Name of the response column.

  • factors (list[str]) – Factor column names.

  • model (str or None) – "main_effects", "interactions", "quadratic", or an explicit formula string. None defaults to "interactions".

Returns:

A formula like "Y ~ A + B + A:B".

Return type:

str

class process_improve.experiments.analysis.AnalysisResult(ols_result=None, formula='', results=<factory>)[source]#

Bases: object

Container returned by analyze_experiment().

Holds the fitted OLS result and all requested analysis outputs.

Parameters:
  • ols_result (RegressionResultsWrapper)

  • formula (str)

  • results (dict[str, Any])

ols_result: RegressionResultsWrapper = None#
formula: str = ''#
results: dict[str, Any]#
process_improve.experiments.analysis.analyze_experiment(design_matrix, responses=None, model=None, analysis_type='anova', significance_level=0.05, transform=None, coding='coded', new_points=None, observed_at_new=None, response_column=None)[source]#

Fit models, run ANOVA, compute effects, diagnose residuals.

Parameters:
  • design_matrix (DataFrame) – Factor settings per run. May also contain the response column(s).

  • responses (DataFrame, Series, or None) – Response column(s). If None, response_column must name a column already present in design_matrix.

  • model (str or None) – "main_effects", "interactions", "quadratic", an explicit formula, or None (defaults to "interactions").

  • analysis_type (str or list[str]) – One or more of: "anova", "effects", "coefficients", "significance", "residual_diagnostics", "lack_of_fit", "curvature_test", "model_selection", "box_cox", "lenth_method", "confidence_intervals", "prediction", "confirmation_test".

  • significance_level (float) – Default 0.05.

  • transform (str or None) – "log", "sqrt", "inverse", "box_cox", or None.

  • coding (str) – "coded" or "actual".

  • new_points (DataFrame or None) – For prediction or confirmation testing.

  • observed_at_new (list[float] or None) – Observed values at new_points (for confirmation testing).

  • response_column (str or None) – Name of the response column when it lives inside design_matrix.

Returns:

Results keyed by analysis type. Always includes "model_summary" with R², adj-R², pred-R², and adequate precision.

Return type:

dict[str, Any]

Examples

>>> import pandas as pd
>>> from process_improve.experiments.analysis import analyze_experiment
>>> df = pd.DataFrame({
...     "A": [-1, 1, -1, 1], "B": [-1, -1, 1, 1],
...     "y": [28, 36, 18, 31],
... })
>>> result = analyze_experiment(df, response_column="y", analysis_type="coefficients")
>>> result["coefficients"][0]["term"]
'Intercept'

Strategy Recommender#

Multi-stage experimental strategy recommender.

Given a DOE problem specification (factors, responses, budget, constraints, domain, prior knowledge), recommend a multi-stage experimental strategy using deterministic decision rules from Montgomery, NIST, and Stat-Ease SCOR.

Quick start:

from process_improve.experiments.strategy import recommend_strategy

result = recommend_strategy(
    factors=[Factor(name="A", low=0, high=100), ...],
    responses=[Response(name="Yield", goal="maximize")],
    budget=40,
    domain="fermentation",
)
process_improve.experiments.strategy.recommend_strategy(*, factors, responses=None, budget=None, constraints=None, hard_to_change_factors=None, prior_knowledge=None, existing_data=None, domain=None, detail_level='intermediate')[source]#

Recommend a multi-stage experimental strategy.

Given a DOE problem description, apply deterministic decision rules to recommend a staged experimental plan (screening → optimisation → confirmation).

Parameters:
  • factors (list[Factor]) – All candidate experimental factors.

  • responses (list[Response] or None) – Response variables with optimisation goals.

  • budget (int or None) – Total run budget across all stages. None = no constraint.

  • constraints (list[Constraint] or None) – Factor-space constraints (linear or nonlinear).

  • hard_to_change_factors (list[str] or None) – Factor names that are expensive to reset between runs.

  • prior_knowledge (str or None) – Free-text description of what the user already knows.

  • existing_data (DataFrame or None) – Prior experimental data (summary extracted internally).

  • domain (str or None) – Application domain (e.g. "fermentation"). Defaults to "general".

  • detail_level (str) – "novice" or "intermediate" (default).

Returns:

JSON-serialisable dictionary with the ExperimentalStrategy fields.

Return type:

dict

Examples

>>> from process_improve.experiments.factor import Factor, Response
>>> factors = [Factor(name=chr(65+i), low=0, high=100) for i in range(7)]
>>> result = recommend_strategy(factors=factors, budget=40, domain="fermentation")
>>> result["total_estimated_runs"] <= 40
True

Deterministic rule engine for DOE strategy recommendation.

Implements ~50 decision rules from Montgomery, NIST, and Stat-Ease SCOR to recommend multi-stage experimental strategies. No LLM or randomness - identical inputs always produce identical outputs.

process_improve.experiments.strategy.engine.recommend_strategy(*, factors, responses=None, budget=None, constraints=None, hard_to_change_factors=None, prior_knowledge=None, existing_data=None, domain=None, detail_level='intermediate')[source]#

Recommend a multi-stage experimental strategy.

Given a DOE problem description, apply deterministic decision rules to recommend a staged experimental plan (screening → optimisation → confirmation).

Parameters:
  • factors (list[Factor]) – All candidate experimental factors.

  • responses (list[Response] or None) – Response variables with optimisation goals.

  • budget (int or None) – Total run budget across all stages. None = no constraint.

  • constraints (list[Constraint] or None) – Factor-space constraints (linear or nonlinear).

  • hard_to_change_factors (list[str] or None) – Factor names that are expensive to reset between runs.

  • prior_knowledge (str or None) – Free-text description of what the user already knows.

  • existing_data (DataFrame or None) – Prior experimental data (summary extracted internally).

  • domain (str or None) – Application domain (e.g. "fermentation"). Defaults to "general".

  • detail_level (str) – "novice" or "intermediate" (default).

Returns:

JSON-serialisable dictionary with the ExperimentalStrategy fields.

Return type:

dict

Examples

>>> from process_improve.experiments.factor import Factor, Response
>>> factors = [Factor(name=chr(65+i), low=0, high=100) for i in range(7)]
>>> result = recommend_strategy(factors=factors, budget=40, domain="fermentation")
>>> result["total_estimated_runs"] <= 40
True

Pydantic models for the DOE strategy recommender.

Defines the input specification (DOEProblemSpec), the output (ExperimentalStrategy, ExperimentalStage, TransitionRule), and supporting types (DomainType, PriorKnowledge).

class process_improve.experiments.strategy.models.DomainType(value)[source]#

Bases: str, Enum

Application domain for domain-specific strategy adjustments.

pharma_formulation = 'pharma_formulation'#
fermentation = 'fermentation'#
food_science = 'food_science'#
extraction = 'extraction'#
analytical_method = 'analytical_method'#
cell_culture = 'cell_culture'#
bioprocess = 'bioprocess'#
general = 'general'#
class process_improve.experiments.strategy.models.PriorKnowledge(*, raw_text='', confidence=0.0, known_significant_factors=<factory>, known_ranges_reliable=False, has_supporting_data=False)[source]#

Bases: BaseModel

Parsed prior knowledge with a confidence score.

Parameters:
  • raw_text (str) – The original free-text description provided by the user.

  • confidence (float) – Confidence score between 0.0 (no knowledge) and 1.0 (confirmed).

  • known_significant_factors (list[str]) – Factor names identified as significant in the prior knowledge.

  • known_ranges_reliable (bool) – Whether the user’s factor ranges are informed by prior data.

  • has_supporting_data (bool) – Whether the prior knowledge is backed by experimental data.

raw_text: str#
confidence: float#
known_significant_factors: list[str]#
known_ranges_reliable: bool#
has_supporting_data: bool#
model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class process_improve.experiments.strategy.models.TransitionRule(*, condition, action, fallback)[source]#

Bases: BaseModel

Rule governing the transition between consecutive experimental stages.

Parameters:
  • condition (str) – Human-readable condition, e.g. "2-5 significant factors identified".

  • action (str) – Action to take when the condition is met, e.g. "proceed_to_rsm".

  • fallback (str) – Action if the condition is not met, e.g. "broaden_factor_ranges".

condition: str#
action: str#
fallback: str#
model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class process_improve.experiments.strategy.models.ExperimentalStage(*, stage_number, stage_name, design_type, design_params=<factory>, factors=<factory>, estimated_runs=0, purpose='', success_criteria=<factory>, transition_rules=<factory>)[source]#

Bases: BaseModel

One stage in a multi-stage experimental strategy.

Parameters:
  • stage_number (int) – 1-based stage index.

  • stage_name (str) – Human-readable name, e.g. "Screening", "Optimization".

  • design_type (str) – Design type key, e.g. "plackett_burman", "ccd", "bbd".

  • design_params (dict) – Design-specific parameters (resolution, center_points, alpha, etc.).

  • factors (list[str]) – Factor names involved in this stage.

  • estimated_runs (int) – Estimated number of experimental runs.

  • purpose (str) – Brief description of what this stage accomplishes.

  • success_criteria (dict) – Criteria for deeming this stage successful.

  • transition_rules (list[TransitionRule]) – Rules governing the transition to the next stage.

stage_number: int#
stage_name: str#
design_type: str#
design_params: dict[str, Any]#
factors: list[str]#
estimated_runs: int#
purpose: str#
success_criteria: dict[str, Any]#
transition_rules: list[TransitionRule]#
model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class process_improve.experiments.strategy.models.ExperimentalStrategy(*, strategy_id='', stages=<factory>, total_estimated_runs=0, budget_allocation=<factory>, assumptions=<factory>, risks=<factory>, alternative_strategies=<factory>, domain='general', detail_level='intermediate', reasoning=<factory>)[source]#

Bases: BaseModel

Complete multi-stage experimental strategy recommendation.

Parameters:
  • strategy_id (str) – Deterministic hash of the input specification.

  • stages (list[ExperimentalStage]) – Ordered list of experimental stages.

  • total_estimated_runs (int) – Sum of estimated runs across all stages.

  • budget_allocation (dict[str, int]) – Stage name to allocated run count mapping.

  • assumptions (list[str]) – Key assumptions underlying the recommendation.

  • risks (list[str]) – Risks and potential issues with the strategy.

  • alternative_strategies (list[str]) – Brief descriptions of alternative approaches.

  • domain (str) – The domain used for domain-specific adjustments.

  • detail_level (str) – The detail level used for explanations.

  • reasoning (list[str]) – Step-by-step explanation of the decision logic.

strategy_id: str#
stages: list[ExperimentalStage]#
total_estimated_runs: int#
budget_allocation: dict[str, int]#
assumptions: list[str]#
risks: list[str]#
alternative_strategies: list[str]#
domain: str#
detail_level: str#
reasoning: list[str]#
model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class process_improve.experiments.strategy.models.DOEProblemSpec(*, factors, responses=<factory>, budget=None, constraints=None, hard_to_change_factors=None, prior_knowledge=None, existing_data_summary=None, domain=DomainType.general, detail_level='intermediate')[source]#

Bases: BaseModel

Validated input specification for the strategy recommender.

Wraps all inputs into a single object for pipeline processing.

Parameters:
  • factors (list[Factor]) – All candidate experimental factors.

  • responses (list[Response]) – Response variables with optimisation goals.

  • budget (int or None) – Total run budget across all stages.

  • constraints (list[Constraint] or None) – Factor-space constraints.

  • hard_to_change_factors (list[str] or None) – Factor names that are expensive to reset between runs.

  • prior_knowledge (PriorKnowledge or None) – Parsed prior knowledge with confidence score.

  • existing_data_summary (dict or None) – Summary of any existing experimental data.

  • domain (DomainType) – Application domain.

  • detail_level (str) – "novice" or "intermediate".

factors: list[Factor]#
responses: list[Response]#
budget: int | None#
constraints: list[Constraint] | None#
hard_to_change_factors: list[str] | None#
prior_knowledge: PriorKnowledge | None#
existing_data_summary: dict[str, Any] | None#
domain: DomainType#
detail_level: Literal['novice', 'intermediate']#
property n_factors: int#

Total number of factors.

property factor_names: list[str]#

Ordered list of factor names.

property n_continuous: int#

Number of continuous factors.

property n_categorical: int#

Number of categorical factors.

property n_mixture: int#

Number of mixture factors.

property has_mixture: bool#

Whether any mixture factors are present.

property has_hard_to_change: bool#

Whether any hard-to-change factors are specified.

property has_constraints: bool#

Whether any constraints are specified.

property goal_includes_optimization: bool#

Whether any response has an optimisation goal.

model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Domain-specific strategy templates for DOE recommendations.

Each domain template provides preferred design choices, budget weight adjustments, and domain-specific advice. Templates are Python dicts (not YAML) because they encode algorithmic adjustments, not reference data.

Sources:
  • ICH Q8/Q9/Q10 for pharma QbD

  • Stat-Ease SCOR framework

  • NIST Engineering Statistics Handbook section 5.3.3

  • Montgomery, Design and Analysis of Experiments, 10th ed.

process_improve.experiments.strategy.domain_templates.get_domain_template(domain)[source]#

Return the domain template for the given domain string.

Parameters:

domain (str) – Domain key (e.g. "fermentation"). Falls back to "general" if the key is not recognised.

Returns:

The domain template dictionary.

Return type:

dict

Budget allocation logic for multi-stage DOE strategies.

Implements the 25-40-55-15 framework:
  • Screening: 25-40 % of total budget

  • Optimisation: 40-55 %

  • Confirmation: 5-15 % (minimum 3 runs)

Sources:
  • Montgomery, Design and Analysis of Experiments, 10th ed. (25% rule)

  • Stat-Ease SCOR framework

  • NIST Engineering Statistics Handbook section 5.3.3

process_improve.experiments.strategy.budget.estimate_screening_runs(n_factors, design_type)[source]#

Estimate the number of runs for a screening design.

Parameters:
  • n_factors (int) – Number of factors to screen.

  • design_type (str) – One of "plackett_burman", "definitive_screening", "fractional_factorial", "full_factorial".

Returns:

Estimated run count including center points.

Return type:

int

process_improve.experiments.strategy.budget.estimate_rsm_runs(n_factors, design_type, center_points=3)[source]#

Estimate the number of runs for an RSM design.

Parameters:
  • n_factors (int) – Number of factors (typically 2-5 after screening).

  • design_type (str) – One of "ccd", "box_behnken", "ccd_face_centered", "d_optimal".

  • center_points (int) – Number of center point replicates (default 3).

Returns:

Estimated run count.

Return type:

int

process_improve.experiments.strategy.budget.estimate_confirmation_runs(min_runs=3)[source]#

Return the number of confirmation runs.

Parameters:

min_runs (int) – Minimum confirmation runs (default 3).

Returns:

Confirmation run count (always at least 3).

Return type:

int

process_improve.experiments.strategy.budget.allocate_budget(total_budget, n_factors, needs_screening, needs_rsm, screening_design='plackett_burman', rsm_design='box_behnken', domain_weights=None, min_confirmation=3, center_points=3)[source]#

Allocate a total run budget across experimental stages.

Parameters:
  • total_budget (int or None) – Total runs across all stages. If None, computes an ideal budget.

  • n_factors (int) – Total number of candidate factors.

  • needs_screening (bool) – Whether a screening stage is needed.

  • needs_rsm (bool) – Whether an RSM optimisation stage is needed.

  • screening_design (str) – Preferred screening design type.

  • rsm_design (str) – Preferred RSM design type.

  • domain_weights (dict or None) – Stage-to-fraction mapping from the domain template.

  • min_confirmation (int) – Minimum confirmation runs (domain-dependent).

  • center_points (int) – Center points for RSM design.

Returns:

Keys: "screening", "optimization", "confirmation", "total", "ideal_total", "is_tight", "warnings".

Return type:

dict