LLM Workbench
Frequently asked questions
What is LLM Workbench?
It turns every run of your LLM agent into a tamper-evident, model-agnostic, human-gated bundle: trace events, artifacts, gates, and cost — signed, exportable, and replayable. Instead of opaque API calls scattered across logs, each run becomes a self-contained record you own.
How is this different from LangSmith, Langfuse, or Helicone?
Those are hosted observability dashboards — your telemetry lives in their database. LLM Workbench is protocol-first: each run is a self-contained, cryptographically signed bundle (with a sha256 integrity hash) you can export, verify, and replay anywhere. Human approval gates and run replay/fork are first-class, not add-ons.
What's a "run bundle"?
One portable artifact capturing a whole run — the workflow, every trace event (model I/O, tool calls, gate decisions), the artifacts produced, the rule set, token usage and cost — plus an integrity hash so you can prove it wasn't altered.
How do I add it to my code?
One import. Swap `generateText` for `tracedGenerateText` from `@llm-workbench/ai-sdk`, pass a session handle, and every call emits trace events, spans, artifacts, and cost automatically — your returned result is unchanged.
Which models and providers does it support?
Model-agnostic — anything you call through the Vercel AI SDK (OpenAI, Anthropic, others). The bundle records provider/model per step, so one run can span multiple models with a single unified trace.
What are "human gates"?
Policy-defined pause points (PAUSE_BEFORE, PAUSE_AFTER, CHECKPOINT) where a run halts for a human to approve, reject, or edit before continuing — and the decision is recorded in the bundle.
Can I replay or fork a run?
Yes — the signed bundle lets you replay a run deterministically, or fork from any step to explore a different path, with full lineage tracked.
Where does my data go? Is it private?
The public demo runs entirely in your browser — no account, no persistence. Authenticated runs persist to your own database, and because every run is an exportable bundle, you're never locked in.
Is it open source? Is it a product?
LLM Workbench is a proprietary platform (the source isn't public). You can use the full thing via the live demo and playground; commercial licensing details are in COMMERCIAL.md.
How do I try it?
Hit "View a demo run" at /runs/demo — no sign-up, it rotates through seeded agent runs. Sign in to open the playground and build your own.