For the past two years, GitHub has been quietly rewriting its API in Rust. Anthropic's Claude, OpenAI's GPT-5, and DeepMind's Gemini are all wrestling with the same problem: how do you serve billions of tokens per day without your latency falling off a cliff? The answer, it turns out, isn't a magic algorithm. It's three things, in order: better caching, smarter batching, and a relentless focus on what they call P99 tail behavior. We talked to engineers at Cloudflare, Vercel, and Fly.io about why it isn't 2019 anymore, and what changed. Spoiler: ONNX, WebGPU, and the boring miracle of standardized JSON-RPC endpoints.
