株式会社オブライト
AI2026-04-10

Mistral Small 4 Complete Guide — Unified Reasoning, Multimodal & Code in 119B MoE [2026]

Mistral Small 4, released March 2026, unifies reasoning, multimodal vision, and agentic coding in a 119B MoE model under Apache 2.0. Supports 11 languages including Japanese. Full specs, setup guide, and model comparisons.


What is Mistral Small 4? A 119B MoE that unifies reasoning, vision, and code under Apache 2.0

Mistral Small 4, released on March 16, 2026, is a large language model from Mistral AI featuring a 119B total-parameter MoE architecture with 128 experts, activating only 6.5B parameters per inference. It is the first Mistral model to unify three distinct capabilities in a single checkpoint: chain-of-thought reasoning (Magistral), multimodal vision understanding (Pixtral), and agentic coding (Devstral). The model is released under the Apache 2.0 license, allowing unrestricted commercial use, modification, and redistribution.

Unified Architecture: Three Capabilities in One Model

Loading diagram...

Detailed Specifications

Key specifications of Mistral Small 4:

SpecificationValue
Total Parameters119B
ArchitectureMoE (Mixture of Experts)
Number of Experts128
Active Parameters6.5B per inference
Context Length256K tokens
Reasoning ModeConfigurable effort (low/medium/high)
LicenseApache 2.0
Release DateMarch 16, 2026

11-Language Support Including Japanese

Mistral Small 4 officially supports 11 languages: English (EN), French (FR), Spanish (ES), German (DE), Italian (IT), Portuguese (PT), Dutch (NL), Chinese (ZH), Japanese (JA), Korean (KO), and Arabic (AR). Japanese enterprises can confidently deploy this model for multilingual applications without additional fine-tuning.

Hardware Requirements

Hardware requirements vary depending on quantization level:

ConfigurationVRAM / Unified MemoryRecommended Hardware
Full precision (FP16)~240 GB3× H100 80GB or more
Q4 quantization~60 GB1× A100 or H100 80GB
Q5 quantization~70 GB1× A100 80GB
Mac Studio (Apple Silicon)64–192 GB unified memoryM3 Ultra recommended

Apple Silicon Mac Studio with up to 192 GB unified memory is a cost-effective on-premise option for running Q4/Q5 quantized versions. Pricing starts at around $3,999 for the 64 GB configuration.

Model Comparison

How does Mistral Small 4 stack up against comparable open-source models?

ModelParameters (Total/Active)ContextLicenseKey Strength
Mistral Small 4119B / 6.5B256KApache 2.0Unified reasoning + vision + code
Qwen 3.5-9B9B / 9B (Dense)128KApache 2.0Compact, high performance
Gemma 4 26B MoE26B / ~6.5B128KGemma TermsGoogle-developed
Llama 4 Scout109B / 17B1MLlama 4 CommunityUltra-long context

Mistral Small 4 stands out with its combination of 256K context and native multimodal support in a single model.

Setup Guide: vLLM, TGI, and Ollama

The recommended deployment methods are vLLM or Text Generation Inference (TGI). vLLM:

bash
pip install vllm
vllm serve mistralai/Mistral-Small-4 \
  --tensor-parallel-size 4 \
  --max-model-len 65536

TGI (Docker):

bash
docker run --gpus all \
  -e HF_TOKEN=<TOKEN> \
  ghcr.io/huggingface/text-generation-inference:latest \
  --model-id mistralai/Mistral-Small-4

Ollama (GGUF support rolling out): Once available: `ollama run mistral-small-4`. Check the official Ollama library for availability.

Use Case: End-to-End Complex Workflows

Mistral Small 4 shines in workflows that previously required multiple models: 1. Image analysis: Accept UI screenshots, ER diagrams, or charts as input (Pixtral) 2. Reasoning: Analyze and reason over the extracted information (Magistral) 3. Code generation: Generate implementation code from the analysis (Devstral) This single-model pipeline reduces API costs, lowers latency, and simplifies architecture compared to chaining separate specialized models.

Mistral 3 Family Overview

The full Mistral AI model lineup as of 2026:

ModelRoleKey Feature
Mistral Small 4General-purpose integrated modelReasoning + multimodal + coding
Mistral Large 3Flagship high-performanceMaximum accuracy, large tasks
Devstral 2Coding specialistOptimized for agentic software engineering
Voxtral TTSVoice synthesisText-to-speech generation

Small 4 occupies the center of the lineup, balancing cost efficiency with broad capability coverage.

Frequently Asked Questions

Q1. Can Mistral Small 4 be used commercially? Yes. Under Apache 2.0, you can use, modify, and redistribute the model for any commercial purpose without restrictions. Q2. How do I enable reasoning mode? Set the `reasoning_effort` parameter to `low`, `medium`, or `high` in your API request. This lets you trade off cost vs. accuracy per task. Q3. What image formats does the multimodal feature support? JPEG, PNG, and WebP are supported. Images can be passed as URLs or Base64-encoded strings in the API request. Q4. How good is Japanese language quality? Japanese is an officially supported language, so natural Japanese text generation and comprehension works well for production use cases. Q5. When will Ollama GGUF support be available? As of April 2026, GGUF support is rolling out. Monitor the official Ollama library page for updates. Q6. How does Small 4 compare to Mistral Large 3? Small 4 prioritizes cost efficiency and versatility; Large 3 targets maximum accuracy for demanding tasks. Start with Small 4 and upgrade to Large 3 only if accuracy is insufficient. Q7. Where can I get help deploying Mistral Small 4 in my organization? Oflight provides enterprise AI consulting covering model selection, integration, prompt engineering, and cost optimization. Visit `/services/ai-consulting` to learn more.

AI Deployment Support from Oflight

Oflight helps enterprises integrate cutting-edge models like Mistral Small 4 into production systems. From PoC design and architecture review to prompt optimization, security compliance, and ongoing operations — our AI engineers support every stage of your deployment journey. Contact us at `/services/ai-consulting` to get started.

Feel free to contact us

Contact Us