株式会社オブライト

Articles tagged "Blackwell"

1 article

Local LLM June 2026 Update — Two Months After Our April Landscape: GLM-5.2 Leads Open Weights at Intelligence Index v4.1 51, MiniMax M3 Ships 1M Context + SWE-Bench Pro 59%, NVIDIA Nemotron 3 Ultra 550B, Blackwell Native MXFP4 Pushes RTX 5090 Into the 30-70B Practical Zone, Japan's SI Market Matures (Intec ¥5M+, Ricoh On-Prem Starter Kit Won the Nikkei Grand Prize, PFN PLaMo Selected for the Digital Agency 'Gennai' Platform), EU AI Act GPAI Enforcement Starts August 2, 2026

Two months after our [April 2026 local-LLM landscape column](../columns/local-llm-landscape-2026-april-comprehensive-comparison), here is the primary-source update on what has changed. **Three big shifts**: **(1) Open-weights have closed the gap with closed-source.** [GLM-5.2](https://simonwillison.net/2026/Jun/17/glm-52/) (Z.ai, MIT, June 16, 2026) tops the Intelligence Index v4.1 at **51** (MiniMax M3 44 / DeepSeek V4 Pro 44 / Kimi K2.6 43). [MiniMax M3](https://kilo.ai/open-source-models) ships **1M context + native multimodality + SWE-Bench Pro 59.0% + Terminal-Bench 2.1 66.0% + MCP Atlas 74.2%**. [NVIDIA Nemotron 3 Ultra](https://research.nvidia.com/labs/nemotron/Nemotron-3/) (revealed by Jensen Huang at Computex 2026) is a **550B-parameter** US-flag open-weight leader. [VibeThinker-3B](https://arxiv.org/pdf/2606.16140) (WeiboAI, MIT, Qwen2.5-Coder-3B fine-tune) reaches **frontier-reasoner parity at 3B**. **(2) Blackwell makes 30–70B models practical on consumer GPUs.** The RTX 5090 has **32GB GDDR7 and 1,792 GB/s bandwidth** (+77% vs 4090) with **native MXFP4 — GGUF Q4 runs with zero emulation overhead**, hitting **5,841 tok/s** on Qwen 2.5-Coder-7B at batch 8 (2.6× A100 80GB). The RTX PRO 6000 Blackwell reaches **~8,425 tok/s** on 30B; the B200 ships **192GB HBM3e at 8 TB/s** (4–5× H100). **(3) Japan's SI market is maturing.** **Intec** (TIS group) launched local-LLM deployment SI on January 29, 2026 — **minimum 1 month, from ¥5,000,000+ ex tax** — targeting manufacturing and finance. **Ricoh's 'RICOH On-Prem LLM Starter Kit'** won the **2025 Nikkei Excellent Product/Service Award grand prize** (Qwen2.5-VL-32B-Instruct base). PFN's [PLaMo 3.0 Prime](../columns/plamo-3-0-prime-pfn-japanese-llm-2026-06) was selected for the Japanese **Digital Agency 'Gennai'** common generative-AI platform — alongside the Mizuho / Lion Qwen on-domestic-infrastructure precedent. The column also covers concurrent moves on [Kimi K2.7-Code](../columns/kimi-k2-7-code-moonshot-ai-2026-06), [Sakana Fugu](../columns/sakana-fugu-orchestration-model-2026-06), [DiffusionGemma](../columns/diffusiongemma-google-text-diffusion-2026-06), and [Liquid AI LFM2.5-J](../columns/liquid-ai-lfm25-japanese-models-2026-06), inference-engine selection (**AWQ + vLLM for GPU, GGUF + llama.cpp for CPU/edge, SGLang for agents, TensorRT-LLM for NVIDIA clusters**), quantization (BitNet 1.58-bit / MXFP4 / AWQ), regulation (**EU AI Act GPAI enforcement from August 2, 2026; systemic-risk threshold of 10^25 FLOPs**, US [Fable 5 export-control precedent](../columns/claude-fable-5-export-control-suspension-2026-06), Chinese-model cross-border data), typical GPU configurations by workload, and a three-step Oflight-recommended adoption path. The article closes with **three direct inquiry funnels** for local-LLM evaluation, build, and ongoing maintenance.

Local LLMOpen WeightSelf-hosted