Skip to content
LLM Workbench

LLM Workbench

Frequently asked questions

  1. What is LLM Workbench?

    It turns every run of your LLM agent into a tamper-evident, model-agnostic, human-gated bundle: trace events, artifacts, gates, and cost — signed, exportable, and replayable. Instead of opaque API calls scattered across logs, each run becomes a self-contained record you own.

  2. How is this different from LangSmith, Langfuse, or Helicone?

    Those are hosted observability dashboards — your telemetry lives in their database. LLM Workbench is protocol-first: each run is a self-contained, cryptographically signed bundle (with a sha256 integrity hash) you can export, verify, and replay anywhere. Human approval gates and run replay/fork are first-class, not add-ons.

  3. What's a "run bundle"?

    One portable artifact capturing a whole run — the workflow, every trace event (model I/O, tool calls, gate decisions), the artifacts produced, the rule set, token usage and cost — plus an integrity hash so you can prove it wasn't altered.

  4. How do I add it to my code?

    One import. Swap `generateText` for `tracedGenerateText` from `@llm-workbench/ai-sdk`, pass a session handle, and every call emits trace events, spans, artifacts, and cost automatically — your returned result is unchanged.

  5. Which models and providers does it support?

    Model-agnostic — anything you call through the Vercel AI SDK (OpenAI, Anthropic, others). The bundle records provider/model per step, so one run can span multiple models with a single unified trace.

  6. What are "human gates"?

    Policy-defined pause points (PAUSE_BEFORE, PAUSE_AFTER, CHECKPOINT) where a run halts for a human to approve, reject, or edit before continuing — and the decision is recorded in the bundle.

  7. Can I replay or fork a run?

    Yes — the signed bundle lets you replay a run deterministically, or fork from any step to explore a different path, with full lineage tracked.

  8. Where does my data go? Is it private?

    The public demo runs entirely in your browser — no account, no persistence. Authenticated runs persist to your own database, and because every run is an exportable bundle, you're never locked in.

  9. Is it open source? Is it a product?

    LLM Workbench is a proprietary platform (the source isn't public). You can use the full thing via the live demo and playground; commercial licensing details are in COMMERCIAL.md.

  10. How do I try it?

    Hit "View a demo run" at /runs/demo — no sign-up, it rotates through seeded agent runs. Sign in to open the playground and build your own.