Manoj Bhandari

AI Engineer

I build agent-based software solutions that work in production. Combining LLMs with retrieval-augmented generation, tool use, and memory to create reliable, cost-aware AI systems. I write about agentic AI, orchestration patterns, and what actually works.

Writing

April 6, 2026-12 min read

TurboQuant: Google's KV Cache Compression That's Rewriting the Rules of LLM Inference

A deep dive into Google's TurboQuant algorithm — how PolarQuant and QJL combine to compress LLM key-value caches to 3 bits with zero accuracy loss, 6x memory reduction, and 8x faster attention on H100 GPUs. What it means for agentic AI, local inference, and the memory economics of production LLM systems.

AILLMInference

Read article

March 15, 2026-3 min read

Building AI Agents: From Theory to Production

A deep dive into building production-ready AI agents with LLMs, tool use, and memory systems. Learn the patterns and pitfalls from real-world deployments.

AILLMsAgents

Read article

February 20, 2026-3 min read

Understanding Transformers: The Architecture That Changed AI

A visual and mathematical exploration of the Transformer architecture, from self-attention to multi-head attention, with practical implementation insights.

Deep LearningTransformersNLP

Read article

View all posts