o10The control plane for inference spend

The control plane for inference spend.

o10 sits in the path above Vercel AI Gateway, OpenRouter, and Bedrock. Set a budget envelope and a quality floor — o10 routes every call to the cheapest compliant model that clears it, and enforces the number instead of just reporting it.

In path above
3 venues
Spread observed
638×
Live in
a day
The Book · cost per 1M tokens
Saved / month
$0
vs. current routing
envelope
cheapest compliantmost expensive
Routed to
Quality floor
Routed spend$0
drag the envelope ← → to set the budget
01The problem

Your AI spend is a surprise,
not a plan.

A prompt change doubles the bill

A small edit to a system prompt or a retry policy can double inference cost overnight. Nobody signs off on it.

4+

Spend is fragmented

Calls scatter across gateways and cloud accounts. No single ledger, no single owner, no single number.

T+30

The bill lands after the fact

The dashboards you have report what was spent — a month late, when the money is already gone.

You can see the leak. You can't pull a lever on it.

02The shift
Dashboards observe.
o10 enforces.

Cost dashboards tell you what you spent. o10 sits in the request path and changes what you spend — no engineering sprint, no model migration.

in the path · not after the fact
03How it works

One plane, every venue.

The same model is often priced differently across venues. o10 routes every call to the cheapest compliant supply — and starts in shadow mode before it ever changes a route.

Works with Vercel AI Gateway · OpenRouter · Amazon Bedrock · owned / open-weight
routing live
enforce · in the path
YOUR APP live traffic o10 CONTROL PLANE quality floor · evals data & jurisdiction route → cheapest IN THE PATH CHEAPEST COMPLIANT SUPPLY Gateway PER-TOKEN API $9.40 /1M TOKENS Aggregator MULTI-PROVIDER $8.10 /1M TOKENS Committed capacity RESERVED / OWNED $1.85 /1M TOKENS ← o10 would route here venues: Vercel AI Gateway · OpenRouter · Amazon Bedrock · owned / open-weight
in the path · routing live
calls routed: 0
saving$312K/mo

The same model is priced differently across venues — $9.40 here, $1.85 on committed Bedrock capacity. o10 routes every call to the cheapest one that clears your floor. Start in shadow — o10 mirrors traffic and shows what would route and save, with evals that prove the cheaper model clears your quality floor — then flip to enforce. That ramp is the trust story.

04The four questions

Answer the four CFO questions
in under a minute. Then act on the answers.

Q1The fully loaded cost of each AI use case in production.→ live ledger per use case
Q2The cost per business outcome for each one.→ efficiency ratio, not token totals
Q3Which use cases can't defend their unit economics.→ flagged against the floor
Q4The forecast for next quarter, tied to a volume driver.→ bound to a business metric
THE o10 DIFFERENCE

Every answer comes with a lever. The use case that can't defend its unit economics gets auto-rightsized or capped at the control plane — not added to a slide for next quarter.

05Capex / opex

Procure inference like capital.

Cost vs. monthly volume
per-token API (opex) committed / owned (capex)

The make-vs-buy decision, modelled.

At what volume does a workload pay to move off a volatile per-token API onto committed Bedrock capacity or self-hosted open-weight models? o10 finds the crossover for each use case.

Routing through Bedrock also draws down committed cloud spend the company has already signed — turning a sunk commitment into realized value.

A capital-allocation decision a CFO owns — the thing no FinOps dashboard touches.
06Governance & sovereignty

Control which model sees which data.

/ POLICY

Zero-retention, no-training

Enforce zero-data-retention and no-training at the policy layer. Sensitive data routes only to approved models that honour it.

enforced per call
/ RESIDENCY

Jurisdiction-aware routing

Route in-region by policy. Data classified for the UK or KSA never leaves an approved venue in that jurisdiction.

UK · KSA
/ LEDGER

Immutable audit trail

Every call leaves an immutable record — model, venue, policy, jurisdiction, cost. A defensible ledger, per request.

per-call record
07Value-add · Know Your Inference

o10 enforces the spend.
KYI governs the chain.

Routing controls the bill. Know Your Inference is the framework above it — scoring every use case across performance, economics, integration, strategy, and risk, so your AI supply chain is sustainable, governable, and defensible to a board.

Explore the KYI framework
76/100 Recommended · sample
Performance 25%81
Economics 25%86
Integration 20%74
Strategy 20%58
Risk 10%79
08The spread

The same answer, 638× the price.

Pick a workload and a quality floor. o10 estimates what you'd have saved by routing to the cheapest compliant model. The embarrassing number that books the meeting.

Workload
Quality floor
Estimate only · paste a week of real
traffic in the audit for your number.
Estimated monthly saving
$0
0% of current spend, same quality floor
Current spend
$0
o10 routed
$0
Price spread
Cuts to spend
09Pricing & adoption

Start in shadow. Pay from what you save.

/ CONTROL

Governance fee

flat platform subscription

The control plane, policy engine, audit ledger, and routing across every venue. Priced like infrastructure, not a percentage of your bill.

/ GAINSHARE

Share of verified savings

measured against shadow baseline

A share of savings o10 proves against your own shadow baseline. You win only when the customer wins — no savings, no gainshare.

01
Shadow

o10 mirrors traffic and shows what would have routed and saved. Nothing changes.

02
Prove

A verified savings figure against your own baseline, per use case.

03
Enforce

Flip the switch. o10 holds the envelope in the path, on Monday.

10Built for the compact

Finance sets the envelope.
Engineering keeps the keys.

The CFO / board

Owns the number.

  • The budget envelope, set and enforced.
  • The unit-economic floor each use case must clear.
  • The kill criteria when it can't.
  • A forecast tied to a business volume driver.
The CIO / platform

Keeps the keys.

  • Model selection, latency, and reliability.
  • Data residency and approved venues.
  • Architecture — no six-week ticket per target change.
  • Full visibility into every routed call.
o10 enforces the line
o10Set the envelope. o10 holds it.

See what you're overpaying.

Paste a week of traffic. Get the number that books the audit.

See what you're overpaying
verified savings to date · $48.2M across customer ledgers
FAQFrequently asked questions

Common questions

What is o10?

o10 is the control plane for inference spend. It routes every AI inference call to the cheapest model that clears your quality floor — across Vercel AI Gateway, OpenRouter, Amazon Bedrock, and owned capacity. Shadow mode proves savings without changing production; enforce mode holds budget envelopes in the path. Evals define per-use-case quality floors; KYI governs the supply chain for board reporting; an immutable ledger records model, venue, policy, and cost on every call.

What is the difference between shadow and enforce mode?

Shadow mode mirrors live traffic and shows what would have routed and saved — without changing production responses. Enforce mode places o10 in the request path and actively routes each call to the cheapest model that clears your quality floor, holding budget envelopes and policy on every request. Teams always start in shadow to build a verified per-use-case baseline; finance signs off before enforce flips. Both modes write to the immutable ledger; only enforce changes spend.

How much can inference routing save?

o10 has observed up to 638× compliant price spread for the same quality floor across venues in June 2026 benchmarks. Workload-specific monthly savings typically range from 40–94% depending on use case, eval floor, and venue mix — RAG and batch at lean floors show the largest percentages. These are not guarantees: shadow mode proves your organization's number against your traffic. Teams without routing in the path leave an estimated 40–70% of compliant savings uncaptured.

What is Know Your Inference (KYI)?

Know Your Inference (KYI) is a governance framework by Shen Pandi that scores inference systems across five weighted pillars: Performance (25%), Economics (25%), Integration (20%), Strategy (20%), and Risk (10%). Each pillar scores 0–100; the composite rolls into a confidence level and board-signable recommendation. KYI runs continuously in the o10 control plane — not as a one-off audit — so every routed call and eval updates the score. A composite floor of 65 triggers enforcement levers: cap, rightsizing, or sunset per policy.

Does o10 replace my AI gateway?

No. o10 does not replace your AI gateway or developer-facing APIs. It sits above gateways and clouds, adding spend enforcement, eval-gated routing, policy, and CFO-grade ledger — not proxy compatibility. Teams keep Vercel AI Gateway, OpenRouter, or LiteLLM for access; o10 changes which model and venue serve each request based on cost, eval floor, and governance rules. The split is intentional: gateways provide doors; control planes enforce economics.

What is a quality floor?

A quality floor is the minimum eval score a model must achieve for a specific use case before o10 routes production traffic to it. Floors are per workload — support, RAG, code, and batch clear at different bars — and measured by replaying representative traffic through eval suites, not assumed from vendor benchmarks. Once a cheaper candidate passes the floor, o10 can route to it in shadow (proof) or enforce (live). Floors without evals are hopes; evals without floors are expensive defaults.

Which providers does o10 support?

o10 unifies routing policy and ledger across Vercel AI Gateway (per-token API), OpenRouter (multi-provider aggregator), Amazon Bedrock (per-token and committed capacity), and owned or open-weight infrastructure. A single control plane sits above all venues — you do not need separate dashboards per provider. o10 selects the cheapest compliant supply per call while honoring data residency, zero-retention, and model approval rules. Committed Bedrock drawdown and open-weight routing are first-class venues, not afterthoughts.

How are savings verified?

Savings are verified against your own shadow baseline per use case — not industry averages or vendor marketing claims. o10 mirrors a week or more of production traffic, segments by workload, and compares what you actually spent versus what you would have spent on the cheapest eval-passing route at the same quality floor. Finance signs off on the delta before enforce mode flips. Gainshare pricing ties o10 fees to this verified number, so savings must be real and auditable.