KYIKnow Your Inference

Know your
inference.

A framework for governing your AI supply chain — not just the bill. KYI evaluates every inference system across five dimensions, so a board can see what creates value, what carries risk, and what to do about it.

01The premise
Cheaper tokens miss the point.

The industry optimises per-token cost. But up to 90% of an AI system's operational life is inference — where value, reliability, and risk are actually decided. Just as electricity's impact came not from cheaper lighting but from the industries it enabled, inference's value lies in more valuable, reliable, governable capabilities — not a smaller unit price.

the electricity parallel · value is created downstream, not at the meter
02The framework

Five pillars. One score.
A recommendation a board can sign.

KYI scores every inference use case across performance, economics, integration, strategy, and risk — then rolls them into a single weighted score, a confidence level, and a recommendation. Pick a use case to see its profile.

Use case under assessment
dashed ring = unit-economic / quality floor (65)
Composite KYI score
0/100
scores below the floor are flagged in debit red · weighted to a single number
Performance
25% · beyond speed
  • Accuracy & quality
  • Latency & throughput
  • Reliability & consistency
  • Operational excellence
Economics
25% · TCO & value
  • Direct cost analysis
  • Total cost of ownership
  • Value creation
  • Return on investment
Integration
20% · harmony
  • Technical integration
  • Data integration
  • Process alignment
  • Org change mgmt
Strategy
20% · advantage
  • Competitive advantage
  • Market positioning
  • Strategic options
  • Capability building
Risk
10% · mitigation
  • Technical risk
  • Business risk
  • Regulatory & compliance
  • Operational risk
03Evals

A floor you can't measure
is just a hope.

o10 routes to the cheapest model that clears your quality floor. Evals are what make that floor real — they replay your traffic against every candidate model and score it, so "good enough" is measured, not asserted. The cheapest model that passes is the one o10 routes to.

Eval suite
Quality floor 85
o10 routes to
Cost / 1M
Same eval suite runs continuously in production — if the routed model drifts below the floor, o10 re-routes automatically.
/ DEFINE

Define the floor

Per use case, not a global average. A support bot and a code copilot clear at different bars — set each from a real eval suite.

/ PROVE

Prove equivalence in shadow

Before any switch, evals show the cheaper model clears your floor on your own traffic. The savings number ships with its proof.

/ CATCH DRIFT

Catch drift, automatically

Models and prompts shift. Continuous evals flag regressions the moment a routed model slips below the floor — and re-route.

04The AI supply chain

Govern the whole chain,
not just the invoice.

Inference is a supply chain: capacity sourced across venues, routed in the path, scored for value and risk, and reported to the board. o10 is the control plane; KYI is the layer that makes it governable and sustainable.

Layer 5 · Assurance
Board & regulator

Cost per outcome, the KYI score, the recommendation, and an immutable audit trail — a defensible record the board and the regulator can sign.

Layer 4 · Govern
KYI framework
PerformanceEconomicsIntegrationStrategyRisk

Every use case scored across five pillars, against a floor — sustainable, governable, and tied to value creation rather than unit price.

Layer 3 · Enforce
o10 control plane

In the request path: routes every call to the cheapest compliant model, holds the budget envelope, and enforces the data & jurisdiction policy.

Layer 2 · Prove
Evals

The quality floor, made measurable. Evals replay traffic against every candidate model, prove equivalence before a switch, and run continuously to catch drift.

Layer 1 · Source
Inference supply
Vercel AI GatewayOpenRouterAmazon BedrockOwned / open-weight
05Architecture · delivered as a service

Your supply chain, mapped.

o10 watches which prompts run on which models, classifies every call by purpose, and maps your real AI supply chain — purpose → model → venue. Then it re-sources each purpose to the cheapest model that clears its floor. Toggle to see o10's recommended architecture; click any purpose to trace it.

Showing routes as deployed today — click a purpose to trace live traffic
Observed spend
$0
Optimized spend
$0
Saved / month
$0
Select a purpose to see how o10 re-sources it — the model it runs on today, the cheapest model that clears its floor, and the venue.
06In the product

Not an audit. A live instrument.

KYI isn't a one-off engagement or a slide deck — it runs inside the control plane. Every routed call and every eval feeds the score, so the recommendation is current the moment a board asks.

OBSERVE

Live telemetry & evals

Every call in the path streams cost, latency, policy, and eval scores — continuously, not sampled after the fact.

SCORE

Five pillars, recomputed

Performance, economics, integration, strategy, and risk update from real evidence as conditions change.

RECOMMEND

A verdict, always current

The composite score, confidence, and recommendation are live — ready for the board or the regulator on demand.

ENFORCE

Act on the answer

A use case that slips below its floor gets auto-rightsized or capped at the control plane — no ticket, no sprint.

↻ continuous · scored on every call, governed without a consulting engagement
07The output

One number. Four verdicts.

The composite KYI score maps to a clear recommendation — the line between an investment that defends its economics and one that gets rightsized or capped at the control plane.

80–100
Strongly recommended
65–79
Recommended
50–64
Conditional
0–49
Not recommended
FrameworkDeep dive

KYI methodology & governance

Expanded methodology and governance detail — beyond the interactive scorecard above.

What is Know Your Inference?

See hub definition. Know Your Inference is central to five-pillar framework — the layer where o10 enforces spend instead of observing it after the fact.

KYI weights performance and economics at 25% each; strategy and risk expose whether cheaper tokens create durable value.

Why does Know Your Inference matter now?

Enterprises run inference across fragmented venues without a single ledger. Know Your Inference becomes a control problem when prompts, models, and retries change faster than finance can react.

How is Know Your Inference different from a cost dashboard?

Dashboards tell you what you spent last month. o10 sits in the request path and changes what you spend on the next call — with shadow mode proof before enforce.

What savings are available for Know Your Inference?

o10 benchmarks show material spread — up to 65 floor depending on workload — between default routes and cheapest compliant supply at the same quality floor.

What is a quality floor in Know Your Inference?

A measured eval bar per use case. The cheapest model that passes is the route o10 selects — not the most expensive default.

How do you prove Know Your Inference savings safely?

Shadow mode mirrors traffic, shows what would have routed, and builds a verified baseline. Enforce mode flips only after per-use-case proof.

09Deep dive

The Know Your Inference landscape in 2026

Production AI teams face five-pillar framework across multiple providers without unified policy.

Gateways simplify API access. Aggregators multiply model choice. Committed cloud capacity sits underutilized while per-token APIs absorb live traffic.

Finance receives invoices after spend accrues. Platform teams lack a single control point to hold envelopes when product changes land.

  • Fragmented venues and ledgers
  • Prompt and retry drift without sign-off
  • Model defaults that overshoot quality floor
  • No shadow proof before switching routes
10Deep dive

How o10 controls Know Your Inference

o10 is the inference spend control plane — above gateways, not replacing them.

For Know Your Inference, o10 routes every call to the cheapest compliant model, enforces data and residency policy, and records an immutable per-call ledger.

KYI scores the supply chain above routing so boards see value, risk, and recommendation — not token totals alone.

11Deep dive

What CFOs should ask about Know Your Inference

Four questions — each with a lever, not a slide.

Fully loaded cost per use case. Cost per business outcome. Which use cases fail unit economics. Forecast tied to a volume driver.

o10 answers each in the control plane and auto-rightsizes or caps use cases that breach the floor.

How-toOperational steps

Implementing Know Your Inference with o10

  1. 01

    Paste a week of traffic

    Segment by use case. See current model, venue, and blended $/1M.

  2. 02

    Define eval floors

    Per workload — support, RAG, code, batch — not one global number.

  3. 03

    Run shadow mode

    Prove savings and equivalence against your baseline.

  4. 04

    Enforce + govern

    Flip enforce. KYI and ledger stay live for board and regulator.

SourceMethodology

o10 Know Your Inference hub content. Benchmarks from State of Inference Spend 2026. Framework by Shen Pandi.

FAQFrequently asked questions

Common questions

What is Know Your Inference?

Know Your Inference is central to five-pillar framework in enterprise AI. o10 treats it as a control problem, not a reporting metric: spend and policy must be enforced on the next request, not explained on last month's invoice. The operational layer is inference — where models meet live traffic, tokens accrue, and governance either holds or fails. KYI weights performance and economics at 25% each; strategy and risk expose whether cheaper tokens create durable value.

How do you reduce cost for know your inference?

Route each use case to the cheapest model that clears your eval-defined quality floor — never the most expensive default. Start in shadow mode to prove savings per workload against your baseline, then flip enforce mode to hold budget envelopes in the path. Segment support, RAG, code, and batch independently; floors and compliant tiers differ. o10 benchmarks show material spread — up to 65 floor depending on workload — between default routes and cheapest compliant supply.

What is shadow mode for know your inference?

Shadow mode mirrors live inference traffic through o10 without changing production routes. For every request, o10 evaluates candidate models against your per-use-case quality floors and records which route would have been cheapest and compliant — along with the cost delta — while the original provider still serves the response. Engineering sees proof without production risk; finance gets a verified savings figure tied to your traffic, not industry averages. Most teams run shadow for 7–14 days segmented by use case (support, RAG, code, batch) before flipping enforce mode. Use shadow to validate know your inference routing economics before any production change.

What is enforce mode for know your inference?

Enforce mode places o10 in the request path. On every call, o10 selects the cheapest model and venue that clears your eval-defined quality floor, holds the budget envelope, and applies residency and retention policy before the request reaches the provider. Failed eval candidates are never routed. Each enforced call writes an immutable ledger entry: model, venue, policy, jurisdiction, and fully loaded cost. Enforce without shadow proof is possible but discouraged — shadow establishes trust with engineering and finance first. Enforce is how know your inference policy becomes spend reality on every live call.

Does o10 replace gateways for know your inference?

No. o10 does not replace your AI gateway or developer-facing APIs. It sits above gateways and clouds, adding spend enforcement, eval-gated routing, policy, and CFO-grade ledger — not proxy compatibility. Teams keep Vercel AI Gateway, OpenRouter, or LiteLLM for access; o10 changes which model and venue serve each request based on cost, eval floor, and governance rules. The split is intentional: gateways provide doors; control planes enforce economics. For know your inference, keep your gateway; add o10 above it for enforcement and KYI governance.

What is Know Your Inference?

Know Your Inference (KYI) is a governance framework by Shen Pandi that scores inference systems across five weighted pillars: Performance (25%), Economics (25%), Integration (20%), Strategy (20%), and Risk (10%). Each pillar scores 0–100; the composite rolls into a confidence level and board-signable recommendation. KYI runs continuously in the o10 control plane — not as a one-off audit — so every routed call and eval updates the score. A composite floor of 65 triggers enforcement levers: cap, rightsizing, or sunset per policy.

How is know your inference measured?

Per-use-case ledger entries, continuous eval scores, and unit economics — not blended token averages. o10 records model, venue, policy, jurisdiction, and fully loaded cost on every call. KYI rolls pillar scores into a composite recommendation boards can sign. know your inference measurement stays live; it does not wait for month-end close.

What venues support know your inference?

o10 unifies routing policy and ledger across Vercel AI Gateway (per-token API), OpenRouter (multi-provider aggregator), Amazon Bedrock (per-token and committed capacity), and owned or open-weight infrastructure. A single control plane sits above all venues — you do not need separate dashboards per provider. o10 selects the cheapest compliant supply per call while honoring data residency, zero-retention, and model approval rules. Committed Bedrock drawdown and open-weight routing are first-class venues, not afterthoughts.

What is a quality floor?

A quality floor is the minimum eval score a model must achieve for a specific use case before o10 routes production traffic to it. Floors are per workload — support, RAG, code, and batch clear at different bars — and measured by replaying representative traffic through eval suites, not assumed from vendor benchmarks. Once a cheaper candidate passes the floor, o10 can route to it in shadow (proof) or enforce (live). Floors without evals are hopes; evals without floors are expensive defaults.

How fast can know your inference go live?

Most stacks connect o10 in shadow mode within a day: point traffic through the control plane, segment by use case, and start the verified savings clock. Enforce mode follows after per-use-case eval equivalence is proven — typically one to two weeks for enterprises with multiple workloads. No six-week gateway migration is required; o10 sits above existing gateways and clouds. KYI scoring and the immutable ledger stay live from day one in shadow.

What is the 638× spread?

The 638× figure is the observed ratio between the most and least expensive compliant routing options for identical enterprise workloads at the same per-use-case quality floor across venues — not a guarantee for every team. o10 measured this across Vercel AI Gateway, OpenRouter, Amazon Bedrock committed capacity, and owned open-weight in June 2026. Actual savings depend on your venue mix, volumes, and eval floors; shadow mode proves your organization's number against your baseline.

Where is the research?

Benchmarks and spread methodology are documented in the State of Inference Spend 2026 report at o10.io/research/state-of-inference-spend-2026, including venue price tables, workload savings models, and the 638× compliant spread calculation. The KYI framework whitepaper at o10.io/research/kyi-whitepaper provides the governance methodology cited across glossary and hub content. Both are primary sources designed for search snippets and AI answer engine citation.

KYIKnow your inference

Score your AI supply chain.

Put KYI in the path on your top use cases — the score, the risks, and the lever, continuously.

See it on your traffic