Glossary: AI

A standardized set of tasks or datasets used to measure and compare AI model capabilities. MMLU, HumanEval, SWE-bench, and HLE are prominent examples covering reasoning, coding, and frontier difficulty.

Benchmark

Benchmark / AI評価指標 / ベンチマーク評価

A subword tokenization algorithm that iteratively merges the most frequent byte/character pairs to build a vocabulary. Used in most major LLMs including the GPT family as the foundation of their tokenizers.

BPE (Byte Pair Encoding)

BPE / Byte Pair Encoding / バイトペアエンコーディング

A prompting technique that elicits step-by-step reasoning from an LLM, significantly improving accuracy on multi-step math, logic, and planning tasks. Even the phrase 'think step by step' triggers CoT.

Chain-of-Thought (CoT)

Chain-of-Thought / CoT / 思考の連鎖

Anthropic's alignment approach in which an AI critiques and revises its own outputs according to a set of written principles ('constitution'), reducing harmful responses with less reliance on human labeling.

Constitutional AI

Constitutional AI / CAI / 憲法的AI

The maximum number of tokens an LLM can process in a single inference call. Larger windows support longer documents and conversation histories but increase compute and memory costs.

Context Window

Context Window / コンテキストウィンドウ / Context Length

A standard Transformer model where all parameters participate in processing every token, as opposed to MoE's sparse expert selection. Compute scales proportionally with parameter count.

Dense Model

Dense Model / Dense Transformer / 高密度モデル

Distillation (Knowledge Distillation)

Distillation / Knowledge Distillation / 知識蒸留

Training a smaller 'student' model to mimic the output distribution of a larger 'teacher' model, compressing capabilities into a lighter-weight model suited for edge deployment or cost reduction.

DPO (Direct Preference Optimization)

DPO / Direct Preference Optimization / 直接選好最適化

An alignment method that optimizes an LLM directly on human preference pairs without training a separate reward model, offering simpler implementation and more stable training than RLHF.

The transformation of text or other data into high-dimensional vectors where semantic proximity is preserved — the core representation technique underlying RAG, semantic search, and recommendation systems.

Embedding

Embedding / 埋め込み表現 / ベクトル埋め込み

A framework for evaluating LLM performance across multiple benchmarks in a unified pipeline. EleutherAI's LM Evaluation Harness is the most widely used, supporting hundreds of tasks and custom evaluations.

Eval Harness

Eval Harness / Evaluation Harness / 評価ハーネス

A prompting technique that includes a small number of input-output examples (typically 2-10) in the prompt so the LLM infers the desired task format and replicates the pattern.

Few-shot

Few-shot Learning / Few-shot Prompting / 少数ショット学習

Additional training of a pre-trained foundation model on task- or domain-specific data to specialize its behavior or style. LoRA and QLoRA have made fine-tuning accessible without full parameter updates.

Fine-tuning

Fine-tuning / ファインチューニング / 微調整

A large model pre-trained on broad data that can be adapted to many downstream tasks via fine-tuning or prompting. The category includes LLMs, vision models, audio models, and multimodal systems.

Foundation Model

Foundation Model / 基盤モデル

An extension of RAG that combines knowledge graphs with vector search to capture entity relationships, enabling more contextually rich answers than pure vector similarity can provide.

GraphRAG

GraphRAG / Graph Retrieval-Augmented Generation / グラフRAG

The practice of anchoring LLM outputs to verifiable external sources (documents, databases, search results) to prevent hallucination. RAG is the dominant technical approach to grounding.

Grounding

Grounding / グラウンディング / 事実根拠付け

The phenomenon where an LLM confidently generates factually incorrect content — fabricated citations, wrong figures, nonexistent APIs — one of the most significant risks in production LLM deployments.

Hallucination

Hallucination / 幻覚 / AI幻覚

The process of running a trained AI model on new inputs to produce predictions or generated outputs. In LLMs, this is the text-generation step — distinct from the training process.

Inference

Inference / 推論 / モデル推論

KV Cache (Key-Value Cache)

KV Cache / Key-Value Cache / キーバリューキャッシュ

A memory-level optimization that caches the Key and Value vectors computed during Transformer attention, avoiding recomputation of earlier tokens and speeding up autoregressive inference.

AI5

LLM (Large Language Model)

Large Language Model / 大規模言語モデル

A neural-network language model with billions to trillions of parameters, capable of text generation, translation, summarization, and code synthesis — the foundation of modern generative AI.

LoRA (Low-Rank Adaptation)

LoRA / Low-Rank Adaptation / 低ランク適応

A parameter-efficient fine-tuning method that freezes the original model weights and learns only small low-rank adapter matrices, drastically cutting memory and compute requirements.

A model architecture with multiple specialized sub-networks (experts) where only a sparse subset is activated per token, allowing parameter counts to scale without proportionally increasing compute.

MoE (Mixture of Experts)

MoE / Mixture of Experts / 混合エキスパート

An AI model or system that handles multiple modalities — text, images, audio, and video — within a single architecture. GPT-4o and Gemini are representative examples.

Multimodal

Multimodal AI / マルチモーダルAI / Multimodal Model

The collective term for all training phases after pre-training — SFT, RLHF, DPO, and other alignment methods — that transform a raw language model into a helpful, safe assistant.

Post-training

Post-training / ポストトレーニング / 事後学習

The initial large-scale training phase in which an LLM learns from vast text corpora via next-token prediction, establishing the general language and world knowledge that downstream fine-tuning and alignment build upon.

Pre-training

Pre-training / 事前学習 / プレトレーニング

The practice of designing and refining LLM input text to reliably elicit desired outputs — covering instruction structuring, few-shot examples, role assignment, and output-format specification.

Prompt Engineering

Prompt Engineering / プロンプトエンジニアリング / プロンプト設計

A fine-tuning method that combines LoRA with 4-bit quantization (NF4), enabling fine-tuning of 65B+ parameter models on a single consumer GPU.

QLoRA (Quantized LoRA)

QLoRA / Quantized Low-Rank Adaptation / 量子化LoRA

Converting model weights from 32-bit or 16-bit floats to lower-precision formats (8-bit, 4-bit, etc.) to reduce model size and memory footprint, enabling faster inference and local execution.

Quantization

Quantization / 量子化 / モデル量子化

AI5

RAG (Retrieval-Augmented Generation)

Retrieval-Augmented Generation / 検索拡張生成

A technique that retrieves relevant information from an external knowledge base and grounds an LLM response on it — the mainstream approach for connecting LLMs to up-to-date or proprietary data.

ReAct (Reasoning and Acting)

ReAct / Reasoning and Acting / 推論と行動

An agent framework that interleaves LLM reasoning traces with external actions (tool calls, searches, code execution), enabling the model to gather information and verify intermediate steps before concluding.

A safety evaluation practice in which testers deliberately probe an AI model with adversarial, harmful, or manipulative prompts to surface vulnerabilities before deployment.

Red Teaming

Red Teaming / AI Red Teaming / レッドチーミング

RLHF (Reinforcement Learning from Human Feedback)

RLHF / Reinforcement Learning from Human Feedback / 人間のフィードバックからの強化学習

A training paradigm in which human raters compare model outputs, a reward model is trained on those preferences, and the LLM is then optimized via RL to match human intent — the technique that made ChatGPT conversationally useful.

A decoding strategy that samples multiple Chain-of-Thought reasoning paths for the same prompt and selects the most frequent final answer by majority vote, improving reliability over greedy decoding.

Self-Consistency

Self-Consistency / 自己整合性

Search that ranks results by semantic similarity rather than keyword overlap, implemented via embedding vectors and approximate nearest-neighbor retrieval.

Semantic Search

Semantic Search / 意味検索 / セマンティック検索

AI5

SLM (Small Language Model)

Small Language Model / 小規模言語モデル

A language model in the hundreds-of-millions to low-billions parameter range. Key advantages are local execution, lower cost, and low latency — ideal for edge devices and domain-specific tasks.

An inference acceleration technique where a small draft model generates multiple candidate tokens, which a large target model then verifies in a single forward pass — achieving 3-4x speedups without quality loss.

Speculative Decoding

Speculative Decoding / 投機的デコーディング / スペキュレイティブデコーディング

An initial instruction sent to an LLM before user conversation begins, defining the model's role, tone, constraints, and output format — the primary control surface for application-level behavior.

System Prompt

System Prompt / システムプロンプト / システム指示

A hyperparameter that controls randomness in LLM text generation. Values near 0 produce deterministic, consistent outputs; higher values yield more diverse and creative responses.

Temperature

Temperature / 温度パラメータ / サンプリング温度

The pre-processing step that splits text into tokens (sub-word units, characters, or symbols) that the LLM operates on. Tokenization design affects model performance, cost, and multilingual capability.

Tokenization

Tokenization / トークン化 / トークナイゼーション

A decoding method that restricts next-token sampling to the k highest-probability tokens. Simple to implement but produces uneven sampling width because k is fixed regardless of the probability distribution.

Top-k Sampling

Top-k / Top-k Sampling / 上位Kサンプリング

AI1

Top-p (Nucleus Sampling)

Top-p / Nucleus Sampling / 核サンプリング

A decoding method that samples from the smallest set of tokens whose cumulative probability exceeds p (the 'nucleus'), dynamically adjusting sampling width and balancing quality with diversity.

The process of optimizing model parameters by learning patterns from data. In the LLM context it encompasses pre-training, fine-tuning, and alignment training phases.

Training

Training / 学習 / モデル学習

A reasoning framework that explores multiple thought branches in a tree structure, evaluating and pruning nodes to find optimal solutions — an extension of Chain-of-Thought for more complex problem solving.

Tree-of-Thought (ToT)

Tree-of-Thought / ToT / 思考の木

A database purpose-built to store high-dimensional embedding vectors and return nearest-neighbor results via approximate search (ANN) — the core storage layer in most RAG architectures.

Vector Database

Vector Database / ベクトルデータベース / Vector DB