株式会社オブライト
AI2026-05-17

QLoRA (Quantized LoRA)

Also known as: QLoRA / Quantized Low-Rank Adaptation / 量子化LoRA

A fine-tuning method that combines LoRA with 4-bit quantization (NF4), enabling fine-tuning of 65B+ parameter models on a single consumer GPU.


Overview

Introduced by the University of Washington in 2023, QLoRA loads the base model in 4-bit NF4 quantization to minimize GPU memory, then trains LoRA adapters in bfloat16. The paper demonstrated fine-tuning a 65B-parameter model on a single 48 GB GPU, democratizing large-model customization.

Difference from LoRA

Standard LoRA loads the base model in float16/bfloat16. QLoRA adds 4-bit quantization to achieve roughly 4x further memory reduction. The NF4 data type is designed to minimize accuracy loss from quantization.

Related Columns

Related Terms

Feel free to contact us

Contact Us