株式会社オブライト
AI2026-04-24

DeepSeek V4 Preview Released — 1.6T MoE / 1M-Token Context Open-Weight Model [April 2026]

Overview of DeepSeek V4 Preview, released on April 24, 2026: two open-weight Mixture-of-Experts variants (V4-Pro at 1.6T total / 49B active and V4-Flash at 284B / 13B), 1-million-token context, weights on Hugging Face, and rollout via API and chat — based on official information.


DeepSeek V4 Preview at a glance

Hangzhou-based DeepSeek released DeepSeek V4 Preview on April 24, 2026, publishing weights on Hugging Face. After their attention-grabbing model in 2025, V4 continues the open-weight Mixture-of-Experts (MoE) line. The same day, the API and chat.deepseek.com Expert / Instant modes went live.

Two variants: V4-Pro and V4-Flash

Two MoE models shipped:

ModelTotal paramsActiveIntended use
V4-Pro1.6T49BHigh-difficulty reasoning, long-doc processing
V4-Flash284B13BFast responses, cost-sensitive real-time

Both use MoE so only a subset of experts activates at inference time — runtime memory is far below the headline parameter count. Both support 1-million-token context, suiting long documents, large codebases, and multi-doc cross-reasoning.

Open weights and licensing

V4-Pro and V4-Flash weights are posted to Hugging Face. As Preview releases, final license terms, commercial-use boundaries, and fine-tuning policy should be verified on the official repository. The strategy of offering near-frontier performance under open weights, contra OpenAI / Anthropic / Google, continues from the prior generation — making this a strong choice for teams pursuing local or self-hosted deployments.

Operational notes

Full-precision 1.6T MoE typically exceeds standard enterprise inference budgets; quantized weights and tuned serving stacks (vLLM, TGI) are the realistic baseline. V4-Flash (284B / 13B active) is more tractable — production-speed inference is feasible on A100 80GB or single H100 nodes. For consumer-grade hardware (e.g., Mac mini), V4 is impractical at full precision; pair quantized V4-Flash with smaller open models (Gemma 4, Qwen 3.6-27B).

How Oflight uses it

We're building templates so OpenClaw can route lightweight tasks to V4-Flash in local / hybrid setups, while V4-Pro is reserved for cloud-API integration. See AI Consulting for tailored deployment advice.

FAQ

Q1: How does Preview differ from GA? A: Preview ships before final licensing and benchmark validation. Specs and terms may shift. Production adopters may prefer to wait for GA. Q2: Can V4-Pro run on a Mac mini? A: Not at full precision, and even aggressive quantization is impractical at this size. On Mac mini-class hardware, prefer quantized V4-Flash or open models like Gemma 4 / Qwen 3.5-9B. Q3: Is it usable commercially? A: Verify the latest official license matches your requirements before commercial deployment.

References

Feel free to contact us

Contact Us