The control plane for inference spend.
o10 sits in the path above Vercel AI Gateway, OpenRouter, and Bedrock. Set a budget envelope and a quality floor — o10 routes every call to the cheapest compliant model that clears it, and enforces the number instead of just reporting it.
Your AI spend is a surprise,
not a plan.
A prompt change doubles the bill
A small edit to a system prompt or a retry policy can double inference cost overnight. Nobody signs off on it.
Spend is fragmented
Calls scatter across gateways and cloud accounts. No single ledger, no single owner, no single number.
The bill lands after the fact
The dashboards you have report what was spent — a month late, when the money is already gone.
You can see the leak. You can't pull a lever on it.
o10 enforces.
Cost dashboards tell you what you spent. o10 sits in the request path and changes what you spend — no engineering sprint, no model migration.
One plane, every venue.
The same model is often priced differently across venues. o10 routes every call to the cheapest compliant supply — and starts in shadow mode before it ever changes a route.
The same model is priced differently across venues — $9.40 here, $1.85 on committed Bedrock capacity. o10 routes every call to the cheapest one that clears your floor. Start in shadow — o10 mirrors traffic and shows what would route and save, with evals that prove the cheaper model clears your quality floor — then flip to enforce. That ramp is the trust story.
Answer the four CFO questions
in under a minute. Then act on the answers.
Every answer comes with a lever. The use case that can't defend its unit economics gets auto-rightsized or capped at the control plane — not added to a slide for next quarter.
Procure inference like capital.
The make-vs-buy decision, modelled.
At what volume does a workload pay to move off a volatile per-token API onto committed Bedrock capacity or self-hosted open-weight models? o10 finds the crossover for each use case.
Routing through Bedrock also draws down committed cloud spend the company has already signed — turning a sunk commitment into realized value.
Control which model sees which data.
Zero-retention, no-training
Enforce zero-data-retention and no-training at the policy layer. Sensitive data routes only to approved models that honour it.
enforced per callJurisdiction-aware routing
Route in-region by policy. Data classified for the UK or KSA never leaves an approved venue in that jurisdiction.
UK · KSAImmutable audit trail
Every call leaves an immutable record — model, venue, policy, jurisdiction, cost. A defensible ledger, per request.
per-call recordThe same answer, 638× the price.
Pick a workload and a quality floor. o10 estimates what you'd have saved by routing to the cheapest compliant model. The embarrassing number that books the meeting.
traffic in the audit for your number.
Start in shadow. Pay from what you save.
Governance fee
The control plane, policy engine, audit ledger, and routing across every venue. Priced like infrastructure, not a percentage of your bill.
Share of verified savings
A share of savings o10 proves against your own shadow baseline. You win only when the customer wins — no savings, no gainshare.
o10 mirrors traffic and shows what would have routed and saved. Nothing changes.
A verified savings figure against your own baseline, per use case.
Flip the switch. o10 holds the envelope in the path, on Monday.
Finance sets the envelope.
Engineering keeps the keys.
Owns the number.
- The budget envelope, set and enforced.
- The unit-economic floor each use case must clear.
- The kill criteria when it can't.
- A forecast tied to a business volume driver.
Keeps the keys.
- Model selection, latency, and reliability.
- Data residency and approved venues.
- Architecture — no six-week ticket per target change.
- Full visibility into every routed call.
See what you're overpaying.
Paste a week of traffic. Get the number that books the audit.
See what you're overpaying →Common questions
What is o10?
o10 is the control plane for inference spend. It routes every AI inference call to the cheapest model that clears your quality floor — across Vercel AI Gateway, OpenRouter, Amazon Bedrock, and owned capacity. Shadow mode proves savings without changing production; enforce mode holds budget envelopes in the path. Evals define per-use-case quality floors; KYI governs the supply chain for board reporting; an immutable ledger records model, venue, policy, and cost on every call.
What is the difference between shadow and enforce mode?
Shadow mode mirrors live traffic and shows what would have routed and saved — without changing production responses. Enforce mode places o10 in the request path and actively routes each call to the cheapest model that clears your quality floor, holding budget envelopes and policy on every request. Teams always start in shadow to build a verified per-use-case baseline; finance signs off before enforce flips. Both modes write to the immutable ledger; only enforce changes spend.
How much can inference routing save?
o10 has observed up to 638× compliant price spread for the same quality floor across venues in June 2026 benchmarks. Workload-specific monthly savings typically range from 40–94% depending on use case, eval floor, and venue mix — RAG and batch at lean floors show the largest percentages. These are not guarantees: shadow mode proves your organization's number against your traffic. Teams without routing in the path leave an estimated 40–70% of compliant savings uncaptured.
What is Know Your Inference (KYI)?
Know Your Inference (KYI) is a governance framework by Shen Pandi that scores inference systems across five weighted pillars: Performance (25%), Economics (25%), Integration (20%), Strategy (20%), and Risk (10%). Each pillar scores 0–100; the composite rolls into a confidence level and board-signable recommendation. KYI runs continuously in the o10 control plane — not as a one-off audit — so every routed call and eval updates the score. A composite floor of 65 triggers enforcement levers: cap, rightsizing, or sunset per policy.
Does o10 replace my AI gateway?
No. o10 does not replace your AI gateway or developer-facing APIs. It sits above gateways and clouds, adding spend enforcement, eval-gated routing, policy, and CFO-grade ledger — not proxy compatibility. Teams keep Vercel AI Gateway, OpenRouter, or LiteLLM for access; o10 changes which model and venue serve each request based on cost, eval floor, and governance rules. The split is intentional: gateways provide doors; control planes enforce economics.
What is a quality floor?
A quality floor is the minimum eval score a model must achieve for a specific use case before o10 routes production traffic to it. Floors are per workload — support, RAG, code, and batch clear at different bars — and measured by replaying representative traffic through eval suites, not assumed from vendor benchmarks. Once a cheaper candidate passes the floor, o10 can route to it in shadow (proof) or enforce (live). Floors without evals are hopes; evals without floors are expensive defaults.
Which providers does o10 support?
o10 unifies routing policy and ledger across Vercel AI Gateway (per-token API), OpenRouter (multi-provider aggregator), Amazon Bedrock (per-token and committed capacity), and owned or open-weight infrastructure. A single control plane sits above all venues — you do not need separate dashboards per provider. o10 selects the cheapest compliant supply per call while honoring data residency, zero-retention, and model approval rules. Committed Bedrock drawdown and open-weight routing are first-class venues, not afterthoughts.
How are savings verified?
Savings are verified against your own shadow baseline per use case — not industry averages or vendor marketing claims. o10 mirrors a week or more of production traffic, segments by workload, and compares what you actually spent versus what you would have spent on the cheapest eval-passing route at the same quality floor. Finance signs off on the delta before enforce mode flips. Gainshare pricing ties o10 fees to this verified number, so savings must be real and auditable.