株式会社オブライト
AI2026-05-17

KV Cache (Key-Value Cache)

Also known as: KV Cache / Key-Value Cache / キーバリューキャッシュ

A memory-level optimization that caches the Key and Value vectors computed during Transformer attention, avoiding recomputation of earlier tokens and speeding up autoregressive inference.


Overview

In Transformer self-attention, generating each new token normally requires recomputing Key and Value projections for all prior tokens. KV Cache stores these vectors in GPU memory after the first computation and reuses them in subsequent generation steps. The benefit grows with context length, but so does VRAM consumption.

Prefix caching

Prefix caching extends KV Cache across requests sharing a common prefix (a System Prompt or a large document). Anthropic's Prompt Caching and OpenAI's equivalent feature use this mechanism to significantly cut costs when the same context is reused across many queries.

Related Columns

Related Terms

Feel free to contact us

Contact Us