● validated AI, from input to output
AI engineering is hard.
Your pipeline shouldn't be.
Forge turns LLM work into typed, cached, validated pipeline steps. Define structures, wire steps together, A/B test strategies on cheap open models, and ship a drag-and-drop inference page — no deploy required.
built at the
3×
typical overspend from using frontier models to patch one-time engineering problems
$0.00
cost of a cached step — finished work is never recomputed or re-billed
100%
of step I/O schema-validated, implicitly, on every run
the whole loop, one tool
Structure → Pipeline → Test → Ship
Seven windows per project. Each one removes a piece of AI engineering you'd otherwise hand-roll.
01 / Structure
Define the shape of every answer.
Build pydantic / zod models in a UI or in code. Every step's input and output is validated against a structure — schema validation is implicit on all I/O, never a separate step you can forget.
- ▸UI builder and code view of the same models
- ▸`text` instead of `string` for non-technical reviewers
- ▸`tuple[text, quote]` links extractions back to the source

02 / Pipeline
Linear pipelines that flag their own breakages.
Each step declares its input and output model. When an output flows into a step expecting something else, the UI marks the pipeline invalid and run buttons go dark — before you spend a cent on inference.
- ▸PDF, Word, Excel, JSON and web inputs
- ▸Custom TypeScript function steps and batch LLM steps
- ▸Per-step caching: never recompute or re-bill a finished step

03 / A/B Test
Benchmark strategies, not vibes.
Run the same task as a single LLM call, a Recursive Language Model, a frontier coding agent, or an agent loop on a cheap open model. An LLM judge picks the winner on accuracy, schema pass-rate, latency and cost.
- ▸Cheap Chinese models first, frontier only if needed
- ▸Syntax-highlighted logs from RLM and agent runs
- ▸Accuracy / pass / latency / cost on every strategy card

04 / Inference
Drag a document on. No deploy.
Run the live pipeline by dropping a file onto an input slot, then share a link to just this page. Cached steps return instantly at $0.00; progress streams in as each step completes.
- ▸Shareable run page — nothing to deploy
- ▸Per-step run buttons, dark when the pipeline is invalid
- ▸Validated output with click-to-source quotes

05 / Logging
Every run, fully traced.
Each inference run is recorded with a per-step trace, token counts and cost. Detailed technical logs with syntax highlighting, plus a high-level HTML view with progress bars for everyone else.
- ▸Filter by model, step or run id
- ▸Download any run's logs
- ▸`_latest` plus timestamped artifacts for every step

06 / Documents
Content-addressed. Never pay twice.
Every uploaded document hashes to one id, so identical files dedupe automatically and cached work is reused. PDFs get both a PDF view and a text view; runs save their inputs and outputs alongside.
- ▸Hash-based dedupe across inference and training uploads
- ▸Previous runs stored as typed JSON with their structure attached
- ▸Inference / training labels on every file

07 / Billing
See exactly where spend goes.
Cost is attributed per model and per processing service. Watch the cheap/open vs frontier split, daily spend by model, and how much caching saved you versus recomputing everything.
- ▸Daily spend stacked by model
- ▸Open / cheap vs frontier model split
- ▸Cache savings computed against a no-cache baseline

✓ validated text
Click any extracted quote.
Jump to its source.
Every span an LLM extracts is validated against the original document and stored with langextract-style character offsets. One click takes a reviewer from the model's claim to the exact highlighted passage — that's what makes the output trustworthy enough to act on.
text → list[tuple[text, quote]] · closest-match fallback · offsets stored with the run

default project
Company Legal Evaluation
Forge ships with a working example: ingest a legal PDF, extract atomic rules with source quotes, combine with company data, and output a legal considerations report — every step typed and cached.
Stop paying frontier prices
for pipeline problems.
Cheap open models, properly structured and validated, beat an expensive model on a messy prompt. Forge is how you get there.
⚡ Book a demo