Articles tagged "Google DeepMind"

5 articles

AI2026-07-14

[Cerebras Inference Runs Multimodal Gemma 4 31B](https://www.cerebras.ai/blog/gemma-4-on-cerebras-the-fastest-inference-is-now-multimodal) Deep Dive (Announced 2026-06-29) — 1,851 Output Tokens per Second (35× a Typical GPU Endpoint), Cerebras's First Multimodal Model Accepting Images / Screenshots / Charts / UI States, the First Google DeepMind Model on the Platform, Apache 2.0 Open Weights, 18× Faster Than Claude Haiku 4.5 at a Comparable Intelligence Index of 29 Unlocking Practical "Computer Use / Image-Driven Agents / UI Debugging / Dashboard Analysis" at Wafer-Scale

**On 2026-06-29 Cerebras launched [Gemma 4 31B on Cerebras Inference](https://www.cerebras.ai/blog/gemma-4-on-cerebras-the-fastest-inference-is-now-multimodal)** — the platform's **first multimodal model** and the **first Google DeepMind model** available on it, in public preview. **Performance**: **1,851 output tokens/second** (**35× a typical GPU endpoint**), **≤1.5 s to first token including reasoning**, **[Artificial Analysis Intelligence Index of 29](https://artificialanalysis.ai/)** (comparable to Claude Haiku 4.5's 30), and **18× the speed of Haiku on Cerebras**. **Model specs**: **[Gemma 4 31B](../columns/gemma-4-technical-report-2026-07) dense architecture** (not MoE), **Apache 2.0 open weights**, long-context capable, and image understanding (screenshots, charts, documents, UI states, diagrams, scanned pages, forms). **Distinctive value**: unlocks multimodal workloads that used to be impractical on GPUs — **computer use, image-driven agentic loops, UI debugging with code-patch generation, real-time dashboard analysis, long-document summarization** — at wafer-scale speed. **Position**: the [Gemma 4 Technical Report](../columns/gemma-4-technical-report-2026-07) covered Google DeepMind's open-weights strategy; combined with **Cerebras's wafer-scale hardware, it reaches a new "multimodal × fast inference × open weights" infrastructure stack** — the open-weights + fast-inference camp's counter to the closed-model camp underlying [OpenAI GPT-5.6 + ChatGPT Work](../columns/openai-chatgpt-work-launch-2026-07) and [Claude Cowork](../columns/claude-cowork-web-mobile-launch-2026-07). Paired with [Nous Portal's 300+ neutral models](../columns/nous-portal-cloud-hermes-agent-2026-07) or [local LLM deployments](../columns/local-llm-landscape-2026-june-update), it becomes part of the **late-2026 open-weights-practical AI infrastructure stack**. **Cerebras Inference Cloud public preview** (limited time), with details in the [API docs](https://inference-docs.cerebras.ai/models/gemma-4-31b).

CerebrasGemma 4Wafer-Scale

AI2026-07-08

Gemma 4 Technical Report Deep Dive — Google DeepMind's Open-Weight, Natively Multimodal 2.3B–31B LLMs with an Encoder-Free 12B Unified Design and Built-In Reasoning Mode [arXiv:2607.02770](https://arxiv.org/abs/2607.02770), Published 2026-07-02, 300+ Authors, Both Dense and MoE Variants

**Google DeepMind's Gemma Team released the Gemma 4 Technical Report as [arXiv:2607.02770](https://arxiv.org/abs/2607.02770) on 2026-07-02**. The paper introduces **2.3B / 12B / 31B parameter models**, **both Dense and MoE variants**, **natively multimodal (text / image / audio)**, a **12B encoder-free unified design** (raw audio and image patches processed directly without separate encoders), a **built-in reasoning (thinking) mode**, **improved vision / audio encoders**, **architectural refinements for inference speed, memory efficiency, and long context**, and **competitive performance against larger open models on STEM, multimodal, and long-context benchmarks**. Over 300 authors contributed. Open weights allow commercial use, distributed via Hugging Face and Ollama. Sits alongside [Qwen 3.6-35B-A3B](../columns/qwen36-35b-a3b-uncensored-abliterated-2026-07) and the [Local LLM June 2026 update](../columns/local-llm-landscape-2026-june-update) as a new chapter at the open-weights frontier. **A milestone in Google's open-weights strategy**; the encoder-free unified design departs from Qwen / Llama / DeepSeek multimodality (separate vision encoder + projection). The **reasoning mode** mirrors the extended-thinking modes of Anthropic and OpenAI closed models — the open ecosystem catching up. Caveats: commercial-license fine print, potential systemic-risk classification (EU AI Act's 10^25 FLOPs threshold), and heavy Google Cloud Vertex AI integration bias.

Gemma 4Google DeepMindOpen Weight

AI2026-06-11

DiffusionGemma Deep Dive — Google DeepMind's June 10, 2026 Open-Weight Text-Diffusion LLM, Same Backbone as Gemma 4 26B (A4B MoE), Up to 4× Faster Than AR Counterparts, Apache 2.0, With an Honest "Quality Trails AR" Disclosure

A primary-source deep dive on **DiffusionGemma** (`google/diffusiongemma-26B-A4B-it`, 25.2B total / 3.8B active MoE), released June 10, 2026 by Google DeepMind in coordination with NVIDIA. Grounded in the [official Google blog](https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation/), [ai.google.dev model card](https://ai.google.dev/gemma/docs/diffusiongemma/model_card), [Hugging Face card](https://huggingface.co/google/diffusiongemma-26B-A4B-it), and [NVIDIA's blog](https://blogs.nvidia.com/blog/rtx-ai-garage-local-gemma-diffusion/). Where autoregressive (AR) models generate one token at a time left-to-right, diffusion language models (DLMs) **denoise a 256-token canvas in parallel into final text**. 15-20 tokens commit per forward pass, up to 48 denoising steps, 1,000+ tok/sec on H100, 700+ on RTX 5090, ~3.5–4× the throughput of the AR Gemma 4 counterpart. Crucially, Google **openly states that quality lags AR**: MMLU Pro 77.6 vs 82.6, GPQA 73.2 vs 82.3, MMMU Pro 54.3 vs 73.8. Apache 2.0, distributed via Hugging Face / Vertex AI / NVIDIA NIM — the first large-scale open-weight diffusion LLM in the industry. The column covers practical implications for Japanese enterprises (on-prem internal agents, code editing, low-latency workflows) and positioning against Mercury (Inception Labs), LLaDA, and Gemini Diffusion.

Google DeepMindGemma 4DiffusionGemma

AI2026-06-04

Gemma 4 12B Deep Dive — The Encoder-Free Multimodal LLM That Runs on a 16GB Laptop Under Apache 2.0 (June 3, 2026)

A deep dive into Gemma 4 12B, released by Google DeepMind on June 3, 2026, grounded in the [official announcement](https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/) and [Developer Guide](https://developers.googleblog.com/gemma-4-12b-the-developer-guide/). The standout property is **encoder-free multimodal architecture** — replacing the prior vision encoder (~550M parameters) with a 35M-parameter lightweight embedder plus a single matrix multiplication, and removing the 12-layer Conformer audio encoder entirely by projecting raw audio straight into the LLM's embedding space. Runs on a 16GB VRAM laptop (Copilot+ PC or Apple Silicon Mac), shipped under Apache 2.0, available through Hugging Face / Ollama / LM Studio / MLX / Vertex AI on day one. Covers the architectural rationale, the "approaches 26B MoE at less than half the memory" benchmark claim, positioning within the Gemma 4 family (E2B / E4B / 26B / 31B), competitive comparison against Llama 4 / Qwen 3.5 / Phi-5, and the fit with Japanese enterprise on-prem AI, voice workflows, and data-sovereignty requirements.

Gemma 4Gemma 4 12BGoogle DeepMind

AI2026-04-24

Gemini 3.1 Pro × Deep Research / Deep Research Max — Google's New Autonomous Research Agents [April 2026]

Summary of Google's Deep Research and Deep Research Max, announced April 21, 2026, built on Gemini 3.1 Pro: MCP support, native visualizations, long-horizon research workflows, DeepSearchQA 93.3% / Humanity's Last Exam 54.6%, and paid preview availability via the Gemini API — based on official sources.

Gemini 3.1 ProDeep ResearchDeep Research Max