Skip to main content

株式会社オブライト

Articles tagged "DiffusionGemma"

1 article

DiffusionGemma Deep Dive — Google DeepMind's June 10, 2026 Open-Weight Text-Diffusion LLM, Same Backbone as Gemma 4 26B (A4B MoE), Up to 4× Faster Than AR Counterparts, Apache 2.0, With an Honest "Quality Trails AR" Disclosure

A primary-source deep dive on **DiffusionGemma** (`google/diffusiongemma-26B-A4B-it`, 25.2B total / 3.8B active MoE), released June 10, 2026 by Google DeepMind in coordination with NVIDIA. Grounded in the [official Google blog](https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation/), [ai.google.dev model card](https://ai.google.dev/gemma/docs/diffusiongemma/model_card), [Hugging Face card](https://huggingface.co/google/diffusiongemma-26B-A4B-it), and [NVIDIA's blog](https://blogs.nvidia.com/blog/rtx-ai-garage-local-gemma-diffusion/). Where autoregressive (AR) models generate one token at a time left-to-right, diffusion language models (DLMs) **denoise a 256-token canvas in parallel into final text**. 15-20 tokens commit per forward pass, up to 48 denoising steps, 1,000+ tok/sec on H100, 700+ on RTX 5090, ~3.5–4× the throughput of the AR Gemma 4 counterpart. Crucially, Google **openly states that quality lags AR**: MMLU Pro 77.6 vs 82.6, GPQA 73.2 vs 82.3, MMMU Pro 54.3 vs 73.8. Apache 2.0, distributed via Hugging Face / Vertex AI / NVIDIA NIM — the first large-scale open-weight diffusion LLM in the industry. The column covers practical implications for Japanese enterprises (on-prem internal agents, code editing, low-latency workflows) and positioning against Mercury (Inception Labs), LLaDA, and Gemini Diffusion.

Google DeepMindGemma 4DiffusionGemma