● validated AI, from input to output

AI engineering is hard.
Your pipeline shouldn't be.

Forge turns LLM work into typed, cached, validated pipeline steps. Define structures, wire steps together, A/B test strategies on cheap open models, and ship a drag-and-drop inference page — no deploy required.

built at theAntlerhackathon · independent, not funded
forge — pipeline builder
forge — pipeline builder

typical overspend from using frontier models to patch one-time engineering problems

$0.00

cost of a cached step — finished work is never recomputed or re-billed

100%

of step I/O schema-validated, implicitly, on every run

the whole loop, one tool

Structure → Pipeline → Test → Ship

Seven windows per project. Each one removes a piece of AI engineering you'd otherwise hand-roll.

01 / Structure

Define the shape of every answer.

Build pydantic / zod models in a UI or in code. Every step's input and output is validated against a structure — schema validation is implicit on all I/O, never a separate step you can forget.

  • UI builder and code view of the same models
  • `text` instead of `string` for non-technical reviewers
  • `tuple[text, quote]` links extractions back to the source
forge — structure
forge — structure

02 / Pipeline

Linear pipelines that flag their own breakages.

Each step declares its input and output model. When an output flows into a step expecting something else, the UI marks the pipeline invalid and run buttons go dark — before you spend a cent on inference.

  • PDF, Word, Excel, JSON and web inputs
  • Custom TypeScript function steps and batch LLM steps
  • Per-step caching: never recompute or re-bill a finished step
forge — pipeline
forge — pipeline

03 / A/B Test

Benchmark strategies, not vibes.

Run the same task as a single LLM call, a Recursive Language Model, a frontier coding agent, or an agent loop on a cheap open model. An LLM judge picks the winner on accuracy, schema pass-rate, latency and cost.

  • Cheap Chinese models first, frontier only if needed
  • Syntax-highlighted logs from RLM and agent runs
  • Accuracy / pass / latency / cost on every strategy card
forge — a/b test
forge — a/b test

04 / Inference

Drag a document on. No deploy.

Run the live pipeline by dropping a file onto an input slot, then share a link to just this page. Cached steps return instantly at $0.00; progress streams in as each step completes.

  • Shareable run page — nothing to deploy
  • Per-step run buttons, dark when the pipeline is invalid
  • Validated output with click-to-source quotes
forge — inference
forge — inference

05 / Logging

Every run, fully traced.

Each inference run is recorded with a per-step trace, token counts and cost. Detailed technical logs with syntax highlighting, plus a high-level HTML view with progress bars for everyone else.

  • Filter by model, step or run id
  • Download any run's logs
  • `_latest` plus timestamped artifacts for every step
forge — logging
forge — logging

06 / Documents

Content-addressed. Never pay twice.

Every uploaded document hashes to one id, so identical files dedupe automatically and cached work is reused. PDFs get both a PDF view and a text view; runs save their inputs and outputs alongside.

  • Hash-based dedupe across inference and training uploads
  • Previous runs stored as typed JSON with their structure attached
  • Inference / training labels on every file
forge — documents
forge — documents

07 / Billing

See exactly where spend goes.

Cost is attributed per model and per processing service. Watch the cheap/open vs frontier split, daily spend by model, and how much caching saved you versus recomputing everything.

  • Daily spend stacked by model
  • Open / cheap vs frontier model split
  • Cache savings computed against a no-cache baseline
forge — billing
forge — billing

✓ validated text

Click any extracted quote.
Jump to its source.

Every span an LLM extracts is validated against the original document and stored with langextract-style character offsets. One click takes a reviewer from the model's claim to the exact highlighted passage — that's what makes the output trustworthy enough to act on.

text → list[tuple[text, quote]] · closest-match fallback · offsets stored with the run

forge — source document viewer
forge — source document viewer

default project

Company Legal Evaluation

Forge ships with a working example: ingest a legal PDF, extract atomic rules with source quotes, combine with company data, and output a legal considerations report — every step typed and cached.

01Legal Documentpdf → SourceDocument
02Doc → TextSourceDocument → SourceDocument
03Extract Atomic RulesSourceDocument → list[AtomicRule]
04Sort Rules by Severitycustom ts fn · $0 LLM cost
05Company Data + Combineword → EvaluationInput
06Legal ConsiderationsEvaluationInput → LegalConsiderations

Stop paying frontier prices
for pipeline problems.

Cheap open models, properly structured and validated, beat an expensive model on a messy prompt. Forge is how you get there.

⚡ Book a demo