Mistral Small 4 Complete Guide — Unified Reasoning, Multimodal & Code in 119B MoE [2026]
Mistral Small 4, released March 2026, unifies reasoning, multimodal vision, and agentic coding in a 119B MoE model under Apache 2.0. Supports 11 languages including Japanese. Full specs, setup guide, and model comparisons.
What is Mistral Small 4? A 119B MoE that unifies reasoning, vision, and code under Apache 2.0
Mistral Small 4, released on March 16, 2026, is a large language model from Mistral AI featuring a 119B total-parameter MoE architecture with 128 experts, activating only 6.5B parameters per inference. It is the first Mistral model to unify three distinct capabilities in a single checkpoint: chain-of-thought reasoning (Magistral), multimodal vision understanding (Pixtral), and agentic coding (Devstral). The model is released under the Apache 2.0 license, allowing unrestricted commercial use, modification, and redistribution.
Unified Architecture: Three Capabilities in One Model
Detailed Specifications
Key specifications of Mistral Small 4:
| Specification | Value |
|---|---|
| Total Parameters | 119B |
| Architecture | MoE (Mixture of Experts) |
| Number of Experts | 128 |
| Active Parameters | 6.5B per inference |
| Context Length | 256K tokens |
| Reasoning Mode | Configurable effort (low/medium/high) |
| License | Apache 2.0 |
| Release Date | March 16, 2026 |
11-Language Support Including Japanese
Mistral Small 4 officially supports 11 languages: English (EN), French (FR), Spanish (ES), German (DE), Italian (IT), Portuguese (PT), Dutch (NL), Chinese (ZH), Japanese (JA), Korean (KO), and Arabic (AR). Japanese enterprises can confidently deploy this model for multilingual applications without additional fine-tuning.
Hardware Requirements
Hardware requirements vary depending on quantization level:
| Configuration | VRAM / Unified Memory | Recommended Hardware |
|---|---|---|
| Full precision (FP16) | ~240 GB | 3× H100 80GB or more |
| Q4 quantization | ~60 GB | 1× A100 or H100 80GB |
| Q5 quantization | ~70 GB | 1× A100 80GB |
| Mac Studio (Apple Silicon) | 64–192 GB unified memory | M3 Ultra recommended |
Apple Silicon Mac Studio with up to 192 GB unified memory is a cost-effective on-premise option for running Q4/Q5 quantized versions. Pricing starts at around $3,999 for the 64 GB configuration.
Model Comparison
How does Mistral Small 4 stack up against comparable open-source models?
| Model | Parameters (Total/Active) | Context | License | Key Strength |
|---|---|---|---|---|
| Mistral Small 4 | 119B / 6.5B | 256K | Apache 2.0 | Unified reasoning + vision + code |
| Qwen 3.5-9B | 9B / 9B (Dense) | 128K | Apache 2.0 | Compact, high performance |
| Gemma 4 26B MoE | 26B / ~6.5B | 128K | Gemma Terms | Google-developed |
| Llama 4 Scout | 109B / 17B | 1M | Llama 4 Community | Ultra-long context |
Mistral Small 4 stands out with its combination of 256K context and native multimodal support in a single model.
Setup Guide: vLLM, TGI, and Ollama
The recommended deployment methods are vLLM or Text Generation Inference (TGI). vLLM:
pip install vllm
vllm serve mistralai/Mistral-Small-4 \
--tensor-parallel-size 4 \
--max-model-len 65536TGI (Docker):
docker run --gpus all \
-e HF_TOKEN=<TOKEN> \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id mistralai/Mistral-Small-4Ollama (GGUF support rolling out): Once available: `ollama run mistral-small-4`. Check the official Ollama library for availability.
Use Case: End-to-End Complex Workflows
Mistral Small 4 shines in workflows that previously required multiple models: 1. Image analysis: Accept UI screenshots, ER diagrams, or charts as input (Pixtral) 2. Reasoning: Analyze and reason over the extracted information (Magistral) 3. Code generation: Generate implementation code from the analysis (Devstral) This single-model pipeline reduces API costs, lowers latency, and simplifies architecture compared to chaining separate specialized models.
Mistral 3 Family Overview
The full Mistral AI model lineup as of 2026:
| Model | Role | Key Feature |
|---|---|---|
| Mistral Small 4 | General-purpose integrated model | Reasoning + multimodal + coding |
| Mistral Large 3 | Flagship high-performance | Maximum accuracy, large tasks |
| Devstral 2 | Coding specialist | Optimized for agentic software engineering |
| Voxtral TTS | Voice synthesis | Text-to-speech generation |
Small 4 occupies the center of the lineup, balancing cost efficiency with broad capability coverage.
Frequently Asked Questions
Q1. Can Mistral Small 4 be used commercially? Yes. Under Apache 2.0, you can use, modify, and redistribute the model for any commercial purpose without restrictions. Q2. How do I enable reasoning mode? Set the `reasoning_effort` parameter to `low`, `medium`, or `high` in your API request. This lets you trade off cost vs. accuracy per task. Q3. What image formats does the multimodal feature support? JPEG, PNG, and WebP are supported. Images can be passed as URLs or Base64-encoded strings in the API request. Q4. How good is Japanese language quality? Japanese is an officially supported language, so natural Japanese text generation and comprehension works well for production use cases. Q5. When will Ollama GGUF support be available? As of April 2026, GGUF support is rolling out. Monitor the official Ollama library page for updates. Q6. How does Small 4 compare to Mistral Large 3? Small 4 prioritizes cost efficiency and versatility; Large 3 targets maximum accuracy for demanding tasks. Start with Small 4 and upgrade to Large 3 only if accuracy is insufficient. Q7. Where can I get help deploying Mistral Small 4 in my organization? Oflight provides enterprise AI consulting covering model selection, integration, prompt engineering, and cost optimization. Visit `/services/ai-consulting` to learn more.
AI Deployment Support from Oflight
Oflight helps enterprises integrate cutting-edge models like Mistral Small 4 into production systems. From PoC design and architecture review to prompt optimization, security compliance, and ongoing operations — our AI engineers support every stage of your deployment journey. Contact us at `/services/ai-consulting` to get started.
Feel free to contact us
Contact Us