Context Window
Also known as: Context Window / コンテキストウィンドウ / Context Length / コンテキスト長
The maximum number of tokens an LLM can process in a single inference call. Larger windows support longer documents and conversation histories but increase compute and memory costs.
Overview
The context window sets the total number of tokens — prompt, conversation history, retrieved documents, and generated output — that an LLM can process in one call. Exceeding it truncates older content or returns an error. As of 2026, Claude 3.5 supports 200K tokens, Gemini 1M-2M, and Qwen 3.6 Plus 1M.
Relationship to RAG
An infinite context window would allow injecting entire document corpora directly, but cost and latency make RAG — selectively retrieving only relevant chunks — the practical architecture. Even with very long context models, RAG remains valuable for retrieval precision and cost control.
Related Columns
Related Terms
Feel free to contact us
Contact Us