Eval drift in production
Models drift; weekly eval replay on production samples keeps quality floors honest.
Up to 638× spread between most and least expensive compliant routes for identical workloads at the same quality floor (o10 State of Inference Spend 2026).