Authoring an MCP tool#

Note

This is the step-by-step for adding a new agent-callable tool. Tracks ENG-26. See Architecture overview for how the tool layer fits together.

Anatomy of a tool#

Every tool is three things in one module:

a pydantic input contract - a BaseModel with model_config = ConfigDict(extra="forbid") so unknown keys are rejected;
a wrapper function decorated with @tool_spec(...) that takes the parsed model as its single positional argument and returns a JSON-serialisable dict;
a registration so discovery can find it.

The @tool_spec decorator (process_improve/tool_spec.py) attaches the JSON-schema spec (derived from the input_model) and registers the function in the global _TOOL_REGISTRY. get_tool_specs() returns specs in registry order; discover_tools() imports each subpackage’s tools module so the decorators run.

Step by step#

Pick the home. Domain tools live in <subpackage>/tools.py (small subpackages) or, where the surface is large, one module per tool under <subpackage>/_tools/<tool_name>.py with tools.py as the aggregator - this is the pattern in experiments/ (ENG-02).

Define the input model. Use Field(...) with descriptions and validation (ge/le, min_length, Literal[...]). The descriptions become the tool’s JSON schema that the LLM reads, so write them for a caller.

from pydantic import BaseModel, ConfigDict, Field

class SummariseInput(BaseModel):
    model_config = ConfigDict(extra="forbid")
    data: list[float] = Field(..., min_length=1, description="The values to summarise.")

Write the wrapper and decorate it. Narrow the except to the canonical expected set (see Error-Handling Style Guide) and pass the result through clean(...) so numpy / pandas types serialise:

from process_improve.tool_spec import clean, tool_spec

@tool_spec(
    name="summarise_values",
    description="Return the mean and standard deviation of a list of numbers.",
    input_model=SummariseInput,
    examples='# "summarise [1, 2, 3]" -> ``summarise_values(data=[1, 2, 3])``',
    category="univariate",
)
def summarise_values(spec: SummariseInput) -> dict:
    try:
        import numpy as np  # noqa: PLC0415 - keep heavy imports lazy

        arr = np.asarray(spec.data, dtype=float)
        return clean({"mean": arr.mean(), "std": arr.std(ddof=1)})
    except (ValueError, TypeError) as exc:
        logger.exception("Tool summarise_values failed")
        return {"error": str(exc)}

Register it. Importing the module must run the decorator. If you use the per-tool layout, the subpackage’s tools.py imports each tool module in a fixed order (the order fixes the spec-emission order) and tracks the names; if you add tools inline in tools.py they register in source order. Do not reorder existing imports - the tool-spec output is asserted stable.
Confirm discovery. tool_spec.discover_tools() imports your subpackage’s tools module. If the subpackage is new, add its dotted ...tools path to the discovery list in tool_spec.py.

Conventions#

Keep heavy imports (numpy, pandas, statsmodels, the domain algorithm) inside the wrapper function (# noqa: PLC0415) so importing the tools module stays cheap.
Return {"error": "..."} for expected failures; let unexpected exceptions propagate (the server redacts them).
Always wrap the payload in clean(...).

Verifying#

from process_improve.tool_spec import get_tool_specs

specs = {s["name"]: s for s in get_tool_specs()}
assert "summarise_values" in specs
assert specs["summarise_values"]["input_schema"]["additionalProperties"] is False

Add a test under tests/ that drives the tool through the same path the MCP server uses, plus an assertion that the spec is present and well-formed (see the existing tests/test_experiments_tools.py and tests/test_tool_spec.py).