株式会社オブライト
AI2026-05-17

Context Window

Also known as: Context Window / コンテキストウィンドウ / Context Length / コンテキスト長

The maximum number of tokens an LLM can process in a single inference call. Larger windows support longer documents and conversation histories but increase compute and memory costs.


Overview

The context window sets the total number of tokens — prompt, conversation history, retrieved documents, and generated output — that an LLM can process in one call. Exceeding it truncates older content or returns an error. As of 2026, Claude 3.5 supports 200K tokens, Gemini 1M-2M, and Qwen 3.6 Plus 1M.

Relationship to RAG

An infinite context window would allow injecting entire document corpora directly, but cost and latency make RAG — selectively retrieving only relevant chunks — the practical architecture. Even with very long context models, RAG remains valuable for retrieval precision and cost control.

Related Columns

Related Terms

Feel free to contact us

Contact Us