Question 1

What is Know Your Inference?

Accepted Answer

Know Your Inference is central to five-pillar framework in enterprise AI. o10 treats it as a control problem, not a reporting metric: spend and policy must be enforced on the next request, not explained on last month's invoice. The operational layer is inference — where models meet live traffic, tokens accrue, and governance either holds or fails. KYI weights performance and economics at 25% each; strategy and risk expose whether cheaper tokens create durable value.

Question 2

How do you reduce cost for know your inference?

Accepted Answer

Route each use case to the cheapest model that clears your eval-defined quality floor — never the most expensive default. Start in shadow mode to prove savings per workload against your baseline, then flip enforce mode to hold budget envelopes in the path. Segment support, RAG, code, and batch independently; floors and compliant tiers differ. o10 benchmarks show material spread — up to 65 floor depending on workload — between default routes and cheapest compliant supply.

Question 3

What is shadow mode for know your inference?

Accepted Answer

Shadow mode mirrors live inference traffic through o10 without changing production routes. For every request, o10 evaluates candidate models against your per-use-case quality floors and records which route would have been cheapest and compliant — along with the cost delta — while the original provider still serves the response. Engineering sees proof without production risk; finance gets a verified savings figure tied to your traffic, not industry averages. Most teams run shadow for 7–14 days segmented by use case (support, RAG, code, batch) before flipping enforce mode. Use shadow to validate know your inference routing economics before any production change.

Question 4

What is enforce mode for know your inference?

Accepted Answer

Enforce mode places o10 in the request path. On every call, o10 selects the cheapest model and venue that clears your eval-defined quality floor, holds the budget envelope, and applies residency and retention policy before the request reaches the provider. Failed eval candidates are never routed. Each enforced call writes an immutable ledger entry: model, venue, policy, jurisdiction, and fully loaded cost. Enforce without shadow proof is possible but discouraged — shadow establishes trust with engineering and finance first. Enforce is how know your inference policy becomes spend reality on every live call.

Question 5

Does o10 replace gateways for know your inference?

Accepted Answer

No. o10 does not replace your AI gateway or developer-facing APIs. It sits above gateways and clouds, adding spend enforcement, eval-gated routing, policy, and CFO-grade ledger — not proxy compatibility. Teams keep Vercel AI Gateway, OpenRouter, or LiteLLM for access; o10 changes which model and venue serve each request based on cost, eval floor, and governance rules. The split is intentional: gateways provide doors; control planes enforce economics. For know your inference, keep your gateway; add o10 above it for enforcement and KYI governance.

Question 6

What is Know Your Inference?

Accepted Answer

Know Your Inference (KYI) is a governance framework by Shen Pandi that scores inference systems across five weighted pillars: Performance (25%), Economics (25%), Integration (20%), Strategy (20%), and Risk (10%). Each pillar scores 0–100; the composite rolls into a confidence level and board-signable recommendation. KYI runs continuously in the o10 control plane — not as a one-off audit — so every routed call and eval updates the score. A composite floor of 65 triggers enforcement levers: cap, rightsizing, or sunset per policy.

Question 7

How is know your inference measured?

Accepted Answer

Per-use-case ledger entries, continuous eval scores, and unit economics — not blended token averages. o10 records model, venue, policy, jurisdiction, and fully loaded cost on every call. KYI rolls pillar scores into a composite recommendation boards can sign. know your inference measurement stays live; it does not wait for month-end close.

Question 8

What venues support know your inference?

Accepted Answer

o10 unifies routing policy and ledger across Vercel AI Gateway (per-token API), OpenRouter (multi-provider aggregator), Amazon Bedrock (per-token and committed capacity), and owned or open-weight infrastructure. A single control plane sits above all venues — you do not need separate dashboards per provider. o10 selects the cheapest compliant supply per call while honoring data residency, zero-retention, and model approval rules. Committed Bedrock drawdown and open-weight routing are first-class venues, not afterthoughts.

Question 9

What is a quality floor?

Accepted Answer

A quality floor is the minimum eval score a model must achieve for a specific use case before o10 routes production traffic to it. Floors are per workload — support, RAG, code, and batch clear at different bars — and measured by replaying representative traffic through eval suites, not assumed from vendor benchmarks. Once a cheaper candidate passes the floor, o10 can route to it in shadow (proof) or enforce (live). Floors without evals are hopes; evals without floors are expensive defaults.

Question 10

How fast can know your inference go live?

Accepted Answer

Most stacks connect o10 in shadow mode within a day: point traffic through the control plane, segment by use case, and start the verified savings clock. Enforce mode follows after per-use-case eval equivalence is proven — typically one to two weeks for enterprises with multiple workloads. No six-week gateway migration is required; o10 sits above existing gateways and clouds. KYI scoring and the immutable ledger stay live from day one in shadow.

Question 11

What is the 638× spread?

Accepted Answer

The 638× figure is the observed ratio between the most and least expensive compliant routing options for identical enterprise workloads at the same per-use-case quality floor across venues — not a guarantee for every team. o10 measured this across Vercel AI Gateway, OpenRouter, Amazon Bedrock committed capacity, and owned open-weight in June 2026. Actual savings depend on your venue mix, volumes, and eval floors; shadow mode proves your organization's number against your baseline.

Question 12

Where is the research?

Accepted Answer

Benchmarks and spread methodology are documented in the State of Inference Spend 2026 report at o10.io/research/state-of-inference-spend-2026, including venue price tables, workload savings models, and the 638× compliant spread calculation. The KYI framework whitepaper at o10.io/research/kyi-whitepaper provides the governance methodology cited across glossary and hub content. Both are primary sources designed for search snippets and AI answer engine citation.

Five pillars. One score.A recommendation a board can sign.

A floor you can't measureis just a hope.

Define the floor

Prove equivalence in shadow

Catch drift, automatically

Govern the whole chain,not just the invoice.

Your supply chain, mapped.

Not an audit. A live instrument.

Live telemetry & evals

Five pillars, recomputed

A verdict, always current

Act on the answer

One number. Four verdicts.

KYI methodology & governance

What is Know Your Inference?

Why does Know Your Inference matter now?

How is Know Your Inference different from a cost dashboard?

What savings are available for Know Your Inference?

What is a quality floor in Know Your Inference?

How do you prove Know Your Inference savings safely?

The Know Your Inference landscape in 2026

How o10 controls Know Your Inference

What CFOs should ask about Know Your Inference

Implementing Know Your Inference with o10

Paste a week of traffic

Define eval floors

Run shadow mode

Enforce + govern

Common questions

What is Know Your Inference?

How do you reduce cost for know your inference?

What is shadow mode for know your inference?

What is enforce mode for know your inference?

Does o10 replace gateways for know your inference?

What is Know Your Inference?

How is know your inference measured?

What venues support know your inference?

What is a quality floor?

How fast can know your inference go live?

What is the 638× spread?

Where is the research?

Score your AI supply chain.

Five pillars. One score.
A recommendation a board can sign.

A floor you can't measure
is just a hope.

Govern the whole chain,
not just the invoice.