Designed Experiments#
Various factorial designs.
- process_improve.experiments.designs_factorial.full_factorial(nfactors, names=None)[source]#
Create a full factorial (2^k) design for the case when there are nfactors [integer] number of factors.
The optional list of names can be provided. The entries in the list should be strings. If not provided, the names will be created.
- process_improve.experiments.models.forg(x, prec=3)[source]#
Yanked from the code for Statsmodels / iolib / summary.py and adjusted.
- class process_improve.experiments.models.Model(OLS_instance, model_spec, aliasing=None, name=None)[source]#
Bases:
OLSJust a thin wrapper around the OLS class from Statsmodels.
- get_parameters(drop_intercept=True)[source]#
Get the parameter values; return them in a Pandas dataframe.
- get_factor_names(level=1)[source]#
Get the factors in a model which correspond to a certain level.
1 : pure factors 2 : 2-factor interactions and quadratic terms 3 : 3-factor interactions and cubic terms 4 : etc
- get_response_name()[source]#
Get the name of the response variable from the model specification.
- Return type:
- get_aliases(aliasing_up_to_level=2, drop_intercept=True, websafe=False)[source]#
Return a list, containing strings, representing the aliases of the fitted effects.
aliasing_up_to_level: up to which level of interactions shown
- drop_intercept: default is True, but sometimes it is interesting to
know which effects are aliased with the intercept
- websafe: default is False; if True, will print the first term
in the aliasing in bold, since that is the nominally estimated effect.
- process_improve.experiments.models.predict(model, **kwargs)[source]#
Make predictions from the model.
- process_improve.experiments.models.lm(model_spec, data, name=None, alias_threshold=0.995)[source]#
Create a linear model.
- process_improve.experiments.models.summary(model, show=True, aliasing_up_to_level=3)[source]#
Print a summary to the screen of the model.
Appends, if there is any aliasing, a summary of those aliases, up to the (integer) level of interaction: aliasing_up_to_level.
Design evaluation: quality metrics for experimental designs.
Provides evaluate_design(), which computes properties and quality metrics
of an existing design matrix. Supported metrics include efficiency values
(D/I/G), prediction variance, VIF, condition number, power analysis, alias
structure, confounding, resolution, defining relation, clear effects, minimum
aberration, and degrees of freedom.
Example
>>> from process_improve.experiments import evaluate_design, generate_design, Factor
>>> factors = [Factor(name="A", low=0, high=10), Factor(name="B", low=0, high=10)]
>>> result = generate_design(factors, design_type="full_factorial", center_points=0)
>>> metrics = evaluate_design(result, model="interactions", metric=["d_efficiency", "vif"])
- process_improve.experiments.evaluate.evaluate_design(design_matrix, model=None, metric='d_efficiency', effect_size=None, alpha=0.05, sigma=None)[source]#
Compute quality metrics for an experimental design.
- Parameters:
design_matrix (DataFrame or DesignResult) – The design to evaluate. If a
DesignResultis passed, the coded design matrix and any generator / defining-relation metadata are extracted automatically.model (str or None) – Model type:
"main_effects","interactions","quadratic", or an explicit patsy formula.Nonedefaults to"interactions".metric (str or list[str]) – One or more metric names to compute. Valid names:
"alias_structure","confounding","resolution","defining_relation","power","d_efficiency","i_efficiency","g_efficiency","prediction_variance","degrees_of_freedom","vif","condition_number","clear_effects","minimum_aberration".effect_size (float or None) – Expected effect size for power calculation. When None, a power curve over a range of effect sizes is returned instead.
alpha (float) – Significance level for power calculation (default 0.05).
sigma (float or None) – Estimated noise standard deviation. Defaults to 1.0 when needed but not provided.
- Returns:
Results keyed by metric name. The structure of each value depends on the metric - see individual metric documentation.
- Return type:
Examples
>>> from process_improve.experiments import evaluate_design, generate_design, Factor >>> factors = [Factor(name="A", low=0, high=10), Factor(name="B", low=0, high=10)] >>> result = generate_design(factors, design_type="full_factorial", center_points=0) >>> metrics = evaluate_design(result, model="main_effects", metric="d_efficiency") >>> metrics["d_efficiency"] 100.0
Experiment analysis: fit models, ANOVA, diagnostics, residuals.
Provides analyze_experiment(), the main analytical workhorse for
designed experiments (Tool 3 in the DOE tool architecture).
Uses statsmodels and scipy for the heavy lifting, with thin custom code for lack-of-fit, curvature test, Lenth’s method, pred-R², adequate precision, and confirmation run testing.
- process_improve.experiments.analysis.build_formula(response, factors, model=None)[source]#
Build a patsy/statsmodels formula string.
- class process_improve.experiments.analysis.AnalysisResult(ols_result=None, formula='', results=<factory>)[source]#
Bases:
objectContainer returned by
analyze_experiment().Holds the fitted OLS result and all requested analysis outputs.
- ols_result: RegressionResultsWrapper = None#
- process_improve.experiments.analysis.analyze_experiment(design_matrix, responses=None, model=None, analysis_type='anova', significance_level=0.05, transform=None, coding='coded', new_points=None, observed_at_new=None, response_column=None)[source]#
Fit models, run ANOVA, compute effects, diagnose residuals.
- Parameters:
design_matrix (DataFrame) – Factor settings per run. May also contain the response column(s).
responses (DataFrame, Series, or None) – Response column(s). If None,
response_columnmust name a column already present in design_matrix.model (str or None) –
"main_effects","interactions","quadratic", an explicit formula, or None (defaults to"interactions").analysis_type (str or list[str]) – One or more of:
"anova","effects","coefficients","significance","residual_diagnostics","lack_of_fit","curvature_test","model_selection","box_cox","lenth_method","confidence_intervals","prediction","confirmation_test".significance_level (float) – Default 0.05.
transform (str or None) –
"log","sqrt","inverse","box_cox", orNone.coding (str) –
"coded"or"actual".new_points (DataFrame or None) – For prediction or confirmation testing.
observed_at_new (list[float] or None) – Observed values at new_points (for confirmation testing).
response_column (str or None) – Name of the response column when it lives inside design_matrix.
- Returns:
Results keyed by analysis type. Always includes
"model_summary"with R², adj-R², pred-R², and adequate precision.- Return type:
Examples
>>> import pandas as pd >>> from process_improve.experiments.analysis import analyze_experiment >>> df = pd.DataFrame({ ... "A": [-1, 1, -1, 1], "B": [-1, -1, 1, 1], ... "y": [28, 36, 18, 31], ... }) >>> result = analyze_experiment(df, response_column="y", analysis_type="coefficients") >>> result["coefficients"][0]["term"] 'Intercept'
Strategy Recommender#
Multi-stage experimental strategy recommender.
Given a DOE problem specification (factors, responses, budget, constraints, domain, prior knowledge), recommend a multi-stage experimental strategy using deterministic decision rules from Montgomery, NIST, and Stat-Ease SCOR.
Quick start:
from process_improve.experiments.strategy import recommend_strategy
result = recommend_strategy(
factors=[Factor(name="A", low=0, high=100), ...],
responses=[Response(name="Yield", goal="maximize")],
budget=40,
domain="fermentation",
)
- process_improve.experiments.strategy.recommend_strategy(*, factors, responses=None, budget=None, constraints=None, hard_to_change_factors=None, prior_knowledge=None, existing_data=None, domain=None, detail_level='intermediate')[source]#
Recommend a multi-stage experimental strategy.
Given a DOE problem description, apply deterministic decision rules to recommend a staged experimental plan (screening → optimisation → confirmation).
- Parameters:
factors (list[Factor]) – All candidate experimental factors.
responses (list[Response] or None) – Response variables with optimisation goals.
budget (int or None) – Total run budget across all stages.
None= no constraint.constraints (list[Constraint] or None) – Factor-space constraints (linear or nonlinear).
hard_to_change_factors (list[str] or None) – Factor names that are expensive to reset between runs.
prior_knowledge (str or None) – Free-text description of what the user already knows.
existing_data (DataFrame or None) – Prior experimental data (summary extracted internally).
domain (str or None) – Application domain (e.g.
"fermentation"). Defaults to"general".detail_level (str) –
"novice"or"intermediate"(default).
- Returns:
JSON-serialisable dictionary with the
ExperimentalStrategyfields.- Return type:
Examples
>>> from process_improve.experiments.factor import Factor, Response >>> factors = [Factor(name=chr(65+i), low=0, high=100) for i in range(7)] >>> result = recommend_strategy(factors=factors, budget=40, domain="fermentation") >>> result["total_estimated_runs"] <= 40 True
Deterministic rule engine for DOE strategy recommendation.
Implements ~50 decision rules from Montgomery, NIST, and Stat-Ease SCOR to recommend multi-stage experimental strategies. No LLM or randomness - identical inputs always produce identical outputs.
- process_improve.experiments.strategy.engine.recommend_strategy(*, factors, responses=None, budget=None, constraints=None, hard_to_change_factors=None, prior_knowledge=None, existing_data=None, domain=None, detail_level='intermediate')[source]#
Recommend a multi-stage experimental strategy.
Given a DOE problem description, apply deterministic decision rules to recommend a staged experimental plan (screening → optimisation → confirmation).
- Parameters:
factors (list[Factor]) – All candidate experimental factors.
responses (list[Response] or None) – Response variables with optimisation goals.
budget (int or None) – Total run budget across all stages.
None= no constraint.constraints (list[Constraint] or None) – Factor-space constraints (linear or nonlinear).
hard_to_change_factors (list[str] or None) – Factor names that are expensive to reset between runs.
prior_knowledge (str or None) – Free-text description of what the user already knows.
existing_data (DataFrame or None) – Prior experimental data (summary extracted internally).
domain (str or None) – Application domain (e.g.
"fermentation"). Defaults to"general".detail_level (str) –
"novice"or"intermediate"(default).
- Returns:
JSON-serialisable dictionary with the
ExperimentalStrategyfields.- Return type:
Examples
>>> from process_improve.experiments.factor import Factor, Response >>> factors = [Factor(name=chr(65+i), low=0, high=100) for i in range(7)] >>> result = recommend_strategy(factors=factors, budget=40, domain="fermentation") >>> result["total_estimated_runs"] <= 40 True
Pydantic models for the DOE strategy recommender.
Defines the input specification (DOEProblemSpec), the output
(ExperimentalStrategy, ExperimentalStage, TransitionRule),
and supporting types (DomainType, PriorKnowledge).
- class process_improve.experiments.strategy.models.DomainType(value)[source]#
-
Application domain for domain-specific strategy adjustments.
- pharma_formulation = 'pharma_formulation'#
- fermentation = 'fermentation'#
- food_science = 'food_science'#
- extraction = 'extraction'#
- analytical_method = 'analytical_method'#
- cell_culture = 'cell_culture'#
- bioprocess = 'bioprocess'#
- general = 'general'#
- class process_improve.experiments.strategy.models.PriorKnowledge(*, raw_text='', confidence=0.0, known_significant_factors=<factory>, known_ranges_reliable=False, has_supporting_data=False)[source]#
Bases:
BaseModelParsed prior knowledge with a confidence score.
- Parameters:
raw_text (str) – The original free-text description provided by the user.
confidence (float) – Confidence score between 0.0 (no knowledge) and 1.0 (confirmed).
known_significant_factors (list[str]) – Factor names identified as significant in the prior knowledge.
known_ranges_reliable (bool) – Whether the user’s factor ranges are informed by prior data.
has_supporting_data (bool) – Whether the prior knowledge is backed by experimental data.
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class process_improve.experiments.strategy.models.TransitionRule(*, condition, action, fallback)[source]#
Bases:
BaseModelRule governing the transition between consecutive experimental stages.
- Parameters:
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class process_improve.experiments.strategy.models.ExperimentalStage(*, stage_number, stage_name, design_type, design_params=<factory>, factors=<factory>, estimated_runs=0, purpose='', success_criteria=<factory>, transition_rules=<factory>)[source]#
Bases:
BaseModelOne stage in a multi-stage experimental strategy.
- Parameters:
stage_number (int) – 1-based stage index.
stage_name (str) – Human-readable name, e.g.
"Screening","Optimization".design_type (str) – Design type key, e.g.
"plackett_burman","ccd","bbd".design_params (dict) – Design-specific parameters (resolution, center_points, alpha, etc.).
estimated_runs (int) – Estimated number of experimental runs.
purpose (str) – Brief description of what this stage accomplishes.
success_criteria (dict) – Criteria for deeming this stage successful.
transition_rules (list[TransitionRule]) – Rules governing the transition to the next stage.
- transition_rules: list[TransitionRule]#
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class process_improve.experiments.strategy.models.ExperimentalStrategy(*, strategy_id='', stages=<factory>, total_estimated_runs=0, budget_allocation=<factory>, assumptions=<factory>, risks=<factory>, alternative_strategies=<factory>, domain='general', detail_level='intermediate', reasoning=<factory>)[source]#
Bases:
BaseModelComplete multi-stage experimental strategy recommendation.
- Parameters:
strategy_id (str) – Deterministic hash of the input specification.
stages (list[ExperimentalStage]) – Ordered list of experimental stages.
total_estimated_runs (int) – Sum of estimated runs across all stages.
budget_allocation (dict[str, int]) – Stage name to allocated run count mapping.
assumptions (list[str]) – Key assumptions underlying the recommendation.
risks (list[str]) – Risks and potential issues with the strategy.
alternative_strategies (list[str]) – Brief descriptions of alternative approaches.
domain (str) – The domain used for domain-specific adjustments.
detail_level (str) – The detail level used for explanations.
reasoning (list[str]) – Step-by-step explanation of the decision logic.
- stages: list[ExperimentalStage]#
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class process_improve.experiments.strategy.models.DOEProblemSpec(*, factors, responses=<factory>, budget=None, constraints=None, hard_to_change_factors=None, prior_knowledge=None, existing_data_summary=None, domain=DomainType.general, detail_level='intermediate')[source]#
Bases:
BaseModelValidated input specification for the strategy recommender.
Wraps all inputs into a single object for pipeline processing.
- Parameters:
factors (list[Factor]) – All candidate experimental factors.
responses (list[Response]) – Response variables with optimisation goals.
budget (int or None) – Total run budget across all stages.
constraints (list[Constraint] or None) – Factor-space constraints.
hard_to_change_factors (list[str] or None) – Factor names that are expensive to reset between runs.
prior_knowledge (PriorKnowledge or None) – Parsed prior knowledge with confidence score.
existing_data_summary (dict or None) – Summary of any existing experimental data.
domain (DomainType) – Application domain.
detail_level (str) –
"novice"or"intermediate".
- prior_knowledge: PriorKnowledge | None#
- domain: DomainType#
- detail_level: Literal['novice', 'intermediate']#
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Domain-specific strategy templates for DOE recommendations.
Each domain template provides preferred design choices, budget weight adjustments, and domain-specific advice. Templates are Python dicts (not YAML) because they encode algorithmic adjustments, not reference data.
- Sources:
ICH Q8/Q9/Q10 for pharma QbD
Stat-Ease SCOR framework
NIST Engineering Statistics Handbook section 5.3.3
Montgomery, Design and Analysis of Experiments, 10th ed.
- process_improve.experiments.strategy.domain_templates.get_domain_template(domain)[source]#
Return the domain template for the given domain string.
Budget allocation logic for multi-stage DOE strategies.
- Implements the 25-40-55-15 framework:
Screening: 25-40 % of total budget
Optimisation: 40-55 %
Confirmation: 5-15 % (minimum 3 runs)
- Sources:
Montgomery, Design and Analysis of Experiments, 10th ed. (25% rule)
Stat-Ease SCOR framework
NIST Engineering Statistics Handbook section 5.3.3
- process_improve.experiments.strategy.budget.estimate_screening_runs(n_factors, design_type)[source]#
Estimate the number of runs for a screening design.
- process_improve.experiments.strategy.budget.estimate_rsm_runs(n_factors, design_type, center_points=3)[source]#
Estimate the number of runs for an RSM design.
- process_improve.experiments.strategy.budget.estimate_confirmation_runs(min_runs=3)[source]#
Return the number of confirmation runs.
- process_improve.experiments.strategy.budget.allocate_budget(total_budget, n_factors, needs_screening, needs_rsm, screening_design='plackett_burman', rsm_design='box_behnken', domain_weights=None, min_confirmation=3, center_points=3)[source]#
Allocate a total run budget across experimental stages.
- Parameters:
total_budget (int or None) – Total runs across all stages. If
None, computes an ideal budget.n_factors (int) – Total number of candidate factors.
needs_screening (bool) – Whether a screening stage is needed.
needs_rsm (bool) – Whether an RSM optimisation stage is needed.
screening_design (str) – Preferred screening design type.
rsm_design (str) – Preferred RSM design type.
domain_weights (dict or None) – Stage-to-fraction mapping from the domain template.
min_confirmation (int) – Minimum confirmation runs (domain-dependent).
center_points (int) – Center points for RSM design.
- Returns:
Keys:
"screening","optimization","confirmation","total","ideal_total","is_tight","warnings".- Return type: