TurboQuant: Google's KV Cache Compression That's Rewriting the Rules of LLM Inference
A deep dive into Google's TurboQuant algorithm — how PolarQuant and QJL combine to compress LLM key-value caches to 3 bits with zero accuracy loss, 6x memory reduction, and 8x faster attention on H100 GPUs. What it means for agentic AI, local inference, and the memory economics of production LLM systems.