Experimental Strategy Recommendation ===================================== Before running any experiments, the most important question is: *"How should I plan my experimental program?"* The ``recommend_strategy`` function answers this by generating a complete multi-stage experimental plan - screening, optimization, and confirmation - using deterministic decision rules from Montgomery, NIST, and the Stat-Ease SCOR framework. The recommender is fully deterministic: identical inputs always produce identical outputs. There is no randomness and no LLM - just ~50 codified rules that encode best practices from the DOE literature. When to Use This Tool --------------------- - **Before your first experiment** - plan the entire workflow upfront so that budget and time are spent efficiently. - **When you have many candidate factors** - the tool decides whether a screening stage is needed and which design to use. - **When budget is limited** - it allocates runs across stages to maximize information per experiment. - **When working in a specialized domain** - domain-specific templates (fermentation, cell culture, pharma, etc.) adjust design choices and center-point requirements automatically. Concepts -------- Multi-stage workflows ~~~~~~~~~~~~~~~~~~~~~ Most experimental programs follow a three-stage sequence: 1. **Screening** - Identify the vital few factors from many candidates. Typical designs: Plackett-Burman, Definitive Screening Design (DSD), or fractional factorial. 2. **Optimization** - Fit a response surface model for the significant factors. Typical designs: Central Composite Design (CCD), Box-Behnken, or D-optimal. 3. **Confirmation** - Run replicates at the predicted optimum to verify that the model predictions hold. Each stage has *transition rules* that tell you what to do next based on the results. For example, after screening: - 0–1 significant factors found: broaden factor ranges or check the measurement system. - 2–5 significant factors: proceed to optimization. - 6+ significant factors: sub-group factors or run additional screening. - Curvature detected at center points: augment the factorial to a CCD. Budget allocation ~~~~~~~~~~~~~~~~~ When a budget is specified, runs are allocated across stages using the 25-40-55-15 framework (Montgomery / Stat-Ease): - **Screening**: 25–40% of the budget - **Optimization**: 40–55% of the budget - **Confirmation**: 5–15% of the budget Domain templates can shift these weights. For example, fermentation allocates more to optimization (50%) because biological variability demands extra center points for reliable error estimation. Quick Start ----------- A 7-factor fermentation problem with a budget of 40 runs: .. code-block:: python from process_improve.experiments.factor import Factor, Response from process_improve.experiments.strategy import recommend_strategy factors = [ Factor(name="Temperature", low=25, high=40, units="degC"), Factor(name="pH", low=5.0, high=7.5), Factor(name="Glucose", low=10, high=50, units="g/L"), Factor(name="Yeast extract", low=1, high=10, units="g/L"), Factor(name="Agitation", low=100, high=400, units="rpm"), Factor(name="Aeration", low=0.5, high=2.0, units="vvm"), Factor(name="Inoculum", low=2, high=10, units="%v/v"), ] responses = [Response(name="Yield", goal="maximize", units="g/L")] strategy = recommend_strategy( factors=factors, responses=responses, budget=40, domain="fermentation", ) for stage in strategy["stages"]: print(f"Stage {stage['stage_number']}: {stage['stage_name']}") print(f" Design: {stage['design_type']}, Runs: {stage['estimated_runs']}") print(f" Purpose: {stage['purpose']}") This outputs:: Stage 1: Screening Design: plackett_burman, Runs: 8 Purpose: Screen 7 candidate factors to identify the vital few. Stage 2: Optimization Design: ccd, Runs: 19 Purpose: Fit quadratic response surface model for the 3 significant factors. ... Stage 3: Confirmation Design: replicates_at_optimum, Runs: 3 Purpose: Run replicates at the predicted optimum to verify the model predictions. ... The engine selected Plackett-Burman screening (the fermentation domain default), a CCD for response surface optimization, and 3 confirmation replicates - all within the 40-run budget. Interpreting the Output ----------------------- ``recommend_strategy`` returns a dictionary with these keys: .. list-table:: :widths: 25 75 :header-rows: 1 * - Key - Description * - ``stages`` - Ordered list of experimental stages. Each stage contains ``stage_number``, ``stage_name``, ``design_type``, ``design_params``, ``factors``, ``estimated_runs``, ``purpose``, ``success_criteria``, and ``transition_rules``. * - ``total_estimated_runs`` - Sum of estimated runs across all stages. * - ``budget_allocation`` - Dictionary mapping stage names to allocated run counts. * - ``reasoning`` - Step-by-step explanation of the decision logic. * - ``assumptions`` - Key assumptions underlying the recommendation (e.g. factor ranges are wide enough, measurement system is adequate). * - ``risks`` - Potential issues and warnings (e.g. tight budget, split-plot requirements). * - ``alternative_strategies`` - Brief descriptions of other approaches worth considering. * - ``strategy_id`` - Deterministic hash of the input - same inputs always produce the same ID. * - ``domain`` - The application domain used. * - ``detail_level`` - ``"novice"`` or ``"intermediate"``. To inspect transition rules after screening: .. code-block:: python for rule in strategy["stages"][0]["transition_rules"]: print(f"If {rule['condition']}:") print(f" -> {rule['action']}") print(f" Otherwise -> {rule['fallback']}") Working with Budget Constraints ------------------------------- The budget parameter controls how many total runs are available. The engine adjusts stage complexity accordingly: .. code-block:: python for b in [60, 40, 20, None]: result = recommend_strategy(factors=factors, budget=b, domain="fermentation") print(f"Budget={str(b):>4s}: {result['total_estimated_runs']:>2d} runs, " f"{len(result['stages'])} stages") :: Budget= 60: 30 runs, 3 stages Budget= 40: 30 runs, 3 stages Budget= 20: 18 runs, 3 stages Budget=None: 30 runs, 3 stages With a tight budget, the engine reduces center points, chooses more economical designs, and may issue warnings in ``result["risks"]`` about underpowered designs. When ``budget=None``, the ideal allocation is used without constraint. Using Prior Knowledge --------------------- If you already know something about which factors matter, pass a free-text description via the ``prior_knowledge`` parameter. The engine parses keywords to set a confidence level: - **High confidence** (0.9): "confirmed", "validated", "published", "well-established" - **Medium confidence** (0.7): "literature suggests", "preliminary data", "pilot study" - **Low confidence** (0.4): "suspect", "expected", "based on theory" - **No knowledge** (0.1): "no prior data", "first time", "exploratory" High confidence (>= 0.8 with supporting data) skips the screening stage entirely: .. code-block:: python # No prior knowledge - full screening s1 = recommend_strategy(factors=factors, budget=40, domain="fermentation") print(f"No prior: {len(s1['stages'])} stages") # Low confidence - still screens s2 = recommend_strategy( factors=factors, budget=40, domain="fermentation", prior_knowledge="We suspect Temperature and pH are important.", ) print(f"Low confidence: {len(s2['stages'])} stages") # High confidence - screening skipped s3 = recommend_strategy( factors=factors, budget=40, domain="fermentation", prior_knowledge=( "Published and validated results confirm Temperature " "and pH are significant." ), ) print(f"High confidence: {len(s3['stages'])} stages") :: No prior: 3 stages Low confidence: 3 stages High confidence: 1 stages Domain-Specific Strategies -------------------------- The ``domain`` parameter selects a domain template that adjusts screening design preferences, RSM design choices, center-point counts, and budget weights. Eight domains are available: .. list-table:: :widths: 22 30 48 :header-rows: 1 * - Domain - Screening / RSM preference - Notes * - ``"fermentation"`` - Plackett-Burman / CCD - Extra center points (5+) for biological variability. * - ``"cell_culture"`` - DSD / Box-Behnken - Minimizes runs for expensive, slow experiments (14–21 days). * - ``"pharma_formulation"`` - DSD / Face-centered CCD - ICH QbD framework; design space definition for regulatory submissions. * - ``"food_science"`` - Fractional factorial / BBD - Mixture handling; avoids extreme factor combinations. * - ``"extraction"`` - Fractional factorial / CCD - Rotatable CCD for good boundary prediction. * - ``"analytical_method"`` - Fractional factorial / CCD - AQbD / ICH Q2/Q14; includes robustness study stage. * - ``"bioprocess"`` - Plackett-Burman / CCD - Scale-up considerations for bench-to-production transfer. * - ``"general"`` - Rule-engine defaults - No domain-specific adjustments. Comparing two domains on the same factors shows how design choices differ: .. code-block:: python for domain in ["fermentation", "cell_culture"]: result = recommend_strategy(factors=factors, budget=40, domain=domain) screening = result["stages"][0] print(f"{domain:>15s}: {screening['design_type']}, " f"{screening['estimated_runs']} screening runs") :: fermentation: plackett_burman, 8 screening runs cell_culture: definitive_screening, 15 screening runs Fermentation uses Plackett-Burman (efficient, many-factor screening), while cell culture uses a Definitive Screening Design because it combines screening and curvature detection in a single stage - saving an entire experimental cycle when each run takes 2–3 weeks. Hard-to-Change Factors ---------------------- When some factors are expensive or time-consuming to reset between runs (e.g. reactor temperature, equipment configuration), flag them with ``hard_to_change_factors``. The engine wraps affected stages in a split-plot structure: .. code-block:: python result = recommend_strategy( factors=factors, budget=40, domain="fermentation", hard_to_change_factors=["Temperature"], ) for stage in result["stages"]: params = stage["design_params"] if params.get("split_plot"): print(f"{stage['stage_name']}: split-plot design") print(f" Whole-plot (hard to change): {params['whole_plot_factors']}") print(f" Sub-plot (easy to change): {params['subplot_factors']}") :: Screening: split-plot design Whole-plot (hard to change): ['Temperature'] Sub-plot (easy to change): ['pH', 'Glucose', 'Yeast extract', 'Agitation', ...] With split-plot designs, runs are grouped within whole-plot factor levels to minimize the number of hard-to-change factor resets. The output risks will include a reminder that standard ANOVA gives incorrect p-values for split-plot experiments - a restricted maximum likelihood (REML) analysis is needed instead. Multiple Responses ------------------ When optimizing for more than one response, define each with its own goal: .. code-block:: python responses = [ Response(name="Yield", goal="maximize", units="g/L"), Response(name="Purity", goal="maximize", units="%"), Response(name="Cost", goal="minimize", units="USD/kg"), ] result = recommend_strategy( factors=factors, responses=responses, budget=40, domain="fermentation", ) The strategy structure is the same - the engine plans the experimental stages needed to build models for all responses simultaneously. After running the experiments, use :func:`~process_improve.experiments.optimize_responses` with desirability functions to find the best trade-off across responses. See Also -------- - :doc:`/api/experiments` - Full API reference for all DOE functions. - :func:`~process_improve.experiments.generate_design` - Generate the actual design matrix once you know which design to use. - :func:`~process_improve.experiments.analyze_experiment` - Analyze the results after running experiments. - :func:`~process_improve.experiments.optimize_responses` - Find optimal factor settings for single or multiple responses.