Question 1

What does Tool Pouch do?

Accepted Answer

Tool Pouch is the agent-reliability layer. (1) Pre-deploy: `tool-pouch scan` injects 12+ tool failures into your agent and reports the silent hallucinations. (2) Production: `tool_pouch.wrap_openai(client)` captures every chat.completions.create request to your local store or any log destination, with sub-millisecond overhead and built-in PII redaction. (3) Incident response: `tool-pouch replay <trace_id>` re-runs any captured trace under chaos so you can answer 'would this reproduce?' before you ship a fix.

Question 2

How is Tool Pouch different from LangSmith, Braintrust, or Datadog?

Accepted Answer

Eval tools (LangSmith, Braintrust) score outputs on happy-path tasks. Observability tools (Datadog, Honeycomb) watch production but can't replay anything under chaos. Tool Pouch sits between them: it stress-tests pre-deploy, captures production traces, and replays them on demand to surface flaky failure modes you can't see in static dashboards. It works alongside your existing tools. Pipe captures to Datadog with `JSONLogger`.

Question 3

What does the production wrap actually capture?

Accepted Answer

For each call to chat.completions.create or messages.create, Tool Pouch captures the request (model, messages, tools, system prompt), the assistant response (text + tool calls), the duration, and any error. Streaming responses are accumulated and committed on stream exhaustion. PII is redacted at capture time by default; `redact=None` opts out, `redact=tool_pouch.redact.builtin(extra_patterns=[...])` extends the built-in pack.

Question 4

Is the production wrap safe to ship?

Accepted Answer

Yes. The proxy enqueues to a background writer thread on a non-blocking queue (sub-millisecond p99); serialization, redaction, truncation, and destination IO all run off the request path. Destinations are fail-open: a misbehaving sink logs to stderr and never propagates. Per-trace size limits prevent runaway payloads. Multi-process safety is built in via `os.register_at_fork`. Set `TOOL_POUCH_DISABLE_WRAP=1` to short-circuit capture in CI or tests.

Question 5

Where do production traces go?

Accepted Answer

Three destinations ship in OSS. (1) `LocalStore`: SQLite at ~/.tool_pouch/tool_pouch.db, ideal for dev/staging. (2) `JSONLogger`: NDJSON to a writable stream (default stderr); pipe into Datadog, Honeycomb, Loki, CloudWatch via your existing log agent. (3) `HTTPSink(url=...)`: batched POST for custom backends. Tool Pouch Cloud will become a fourth destination after launch with replay-from-anywhere and team workflows.

Question 6

What replay modes are available?

Accepted Answer

(1) `--frozen`: deterministic walk-through, no model or tool calls. Used to review what happened. (2) `--frozen-tools`: re-call your model against the captured input, but stub tool calls with their captured outputs. Strict by default; pair with `--loose-tools` (match by name only) or `--match-closest` (nearest-neighbor by args) for noisier matches. (3) Default chaos mode: re-call everything against your real model and tools, then inject the same scenarios as `tool-pouch scan` and report verdict. Pair with `--repeat 100` for failure-rate aggregation.

Question 7

How do I install and integrate?

Accepted Answer

`pip install tool-pouch`, then `tool-pouch init && tool-pouch scan --quick` for a 15-second pre-deploy stress test. To enable production capture: `client = tool_pouch.wrap_openai(OpenAI())` (or `wrap_anthropic`). One line. Every subsequent call is captured. The judge defaults to your agent's provider, so one API key is enough.

Stress-test agents. Capture production.
Replay incidents on demand.

It's 3am. Your agent did something stupid. Now what?

Pick the layer your agent needs.

Pre-deploy

Production

Incident response

One line. Every request. Zero surprises.

"Would it reproduce?" Answered in one command.

Capture once. Pipe anywhere.

LocalStore

JSONLogger

HTTPSink

Replay incidents from anywhere. Triage with your team.

Common questions.

Stress-test agents. Capture production.Replay incidents on demand.