Top-p (Nucleus Sampling)
Also known as: Top-p / Nucleus Sampling / 核サンプリング
A decoding method that samples from the smallest set of tokens whose cumulative probability exceeds p (the 'nucleus'), dynamically adjusting sampling width and balancing quality with diversity.
Overview
Top-p (Nucleus Sampling) computes the cumulative probability over tokens sorted by likelihood, then samples only from the smallest set whose cumulative probability exceeds p. With p=0.9, only tokens collectively holding 90% of the probability mass are candidates. This yields more dynamic sampling widths than fixed Top-k.
Temperature vs Top-p
Most LLM APIs expose both parameters, but the recommendation is to tune only one at a time. OpenAI's official docs suggest not changing both simultaneously. A default of p=0.95 is common in production settings.
Feel free to contact us
Contact Us