株式会社オブライト
AI2026-05-17

Pre-training

Also known as: Pre-training / 事前学習 / プレトレーニング

The initial large-scale training phase in which an LLM learns from vast text corpora via next-token prediction, establishing the general language and world knowledge that downstream fine-tuning and alignment build upon.


Overview

Pre-training is the first LLM development phase: self-supervised next-token prediction on trillions of tokens from web pages, books, and code. The model acquires language structure, grammar, world knowledge, and coding ability. The enormous GPU hours required mean almost all organizations start from an existing pre-trained model.

Frontier vs open models

OpenAI, Anthropic, and Google conduct proprietary large-scale pre-training and offer API access. Open-weight models like Llama and Qwen release pre-trained weights publicly, enabling organizations to fine-tune from a capable base without pre-training costs.

Related Columns

Related Terms

Feel free to contact us

Contact Us