AI2026-04-10

Mistral Small 4 Complete Guide — Unified Reasoning, Multimodal & Code in 119B MoE [2026]

Mistral Small 4, released March 2026, unifies reasoning, multimodal vision, and agentic coding in a 119B MoE model under Apache 2.0. Supports 11 languages including Japanese. Full specs, setup guide, and model comparisons.

Mistral Small 4 MoE マルチモーダル推論 Apache 2.0

What is Mistral Small 4? A 119B MoE that unifies reasoning, vision, and code under Apache 2.0

Mistral Small 4, released on March 16, 2026, is a large language model from Mistral AI featuring a 119B total-parameter MoE architecture with 128 experts, activating only 6.5B parameters per inference. It is the first Mistral model to unify three distinct capabilities in a single checkpoint: chain-of-thought reasoning (Magistral), multimodal vision understanding (Pixtral), and agentic coding (Devstral). The model is released under the Apache 2.0 license, allowing unrestricted commercial use, modification, and redistribution.

Unified Architecture: Three Capabilities in One Model

Loading diagram...

Detailed Specifications

Key specifications of Mistral Small 4:

Specification	Value
Total Parameters	119B
Architecture	MoE (Mixture of Experts)
Number of Experts	128
Active Parameters	6.5B per inference
Context Length	256K tokens
Reasoning Mode	Configurable effort (low/medium/high)
License	Apache 2.0
Release Date	March 16, 2026

11-Language Support Including Japanese

Mistral Small 4 officially supports 11 languages: English (EN), French (FR), Spanish (ES), German (DE), Italian (IT), Portuguese (PT), Dutch (NL), Chinese (ZH), Japanese (JA), Korean (KO), and Arabic (AR). Japanese enterprises can confidently deploy this model for multilingual applications without additional fine-tuning.

Hardware Requirements

Hardware requirements vary depending on quantization level:

Configuration	VRAM / Unified Memory	Recommended Hardware
Full precision (FP16)	~240 GB	3× H100 80GB or more
Q4 quantization	~60 GB	1× A100 or H100 80GB
Q5 quantization	~70 GB	1× A100 80GB
Mac Studio (Apple Silicon)	64–192 GB unified memory	M3 Ultra recommended

Apple Silicon Mac Studio with up to 192 GB unified memory is a cost-effective on-premise option for running Q4/Q5 quantized versions. Pricing starts at around $3,999 for the 64 GB configuration.

Model Comparison

How does Mistral Small 4 stack up against comparable open-source models?

Model	Parameters (Total/Active)	Context	License	Key Strength
Mistral Small 4	119B / 6.5B	256K	Apache 2.0	Unified reasoning + vision + code
Qwen 3.5-9B	9B / 9B (Dense)	128K	Apache 2.0	Compact, high performance
Gemma 4 26B MoE	26B / ~6.5B	128K	Gemma Terms	Google-developed
Llama 4 Scout	109B / 17B	1M	Llama 4 Community	Ultra-long context

Mistral Small 4 stands out with its combination of 256K context and native multimodal support in a single model.

Setup Guide: vLLM, TGI, and Ollama

The recommended deployment methods are vLLM or Text Generation Inference (TGI).

vLLM:

bash

pip install vllm
vllm serve mistralai/Mistral-Small-4 \
  --tensor-parallel-size 4 \
  --max-model-len 65536

TGI (Docker):

bash

docker run --gpus all \
  -e HF_TOKEN=<TOKEN> \
  ghcr.io/huggingface/text-generation-inference:latest \
  --model-id mistralai/Mistral-Small-4

Ollama (GGUF support rolling out):
Once available: ollama run mistral-small-4. Check the official Ollama library for availability.

Use Case: End-to-End Complex Workflows

Mistral Small 4 shines in workflows that previously required multiple models:

1. Image analysis: Accept UI screenshots, ER diagrams, or charts as input (Pixtral)
2. Reasoning: Analyze and reason over the extracted information (Magistral)
3. Code generation: Generate implementation code from the analysis (Devstral)

This single-model pipeline reduces API costs, lowers latency, and simplifies architecture compared to chaining separate specialized models.

Mistral 3 Family Overview

The full Mistral AI model lineup as of 2026:

Model	Role	Key Feature
Mistral Small 4	General-purpose integrated model	Reasoning + multimodal + coding
Mistral Large 3	Flagship high-performance	Maximum accuracy, large tasks
Devstral 2	Coding specialist	Optimized for agentic software engineering
Voxtral TTS	Voice synthesis	Text-to-speech generation

Small 4 occupies the center of the lineup, balancing cost efficiency with broad capability coverage.

Frequently Asked Questions

Q1. Can Mistral Small 4 be used commercially?
Yes. Under Apache 2.0, you can use, modify, and redistribute the model for any commercial purpose without restrictions.

Q2. How do I enable reasoning mode?
Set the reasoning_effort parameter to low, medium, or high in your API request. This lets you trade off cost vs. accuracy per task.

Q3. What image formats does the multimodal feature support?
JPEG, PNG, and WebP are supported. Images can be passed as URLs or Base64-encoded strings in the API request.

Q4. How good is Japanese language quality?
Japanese is an officially supported language, so natural Japanese text generation and comprehension works well for production use cases.

Q5. When will Ollama GGUF support be available?
As of April 2026, GGUF support is rolling out. Monitor the official Ollama library page for updates.

Q6. How does Small 4 compare to Mistral Large 3?
Small 4 prioritizes cost efficiency and versatility; Large 3 targets maximum accuracy for demanding tasks. Start with Small 4 and upgrade to Large 3 only if accuracy is insufficient.

Q7. Where can I get help deploying Mistral Small 4 in my organization?
Oflight provides enterprise AI consulting covering model selection, integration, prompt engineering, and cost optimization. Visit /services/ai-consulting to learn more.

AI Deployment Support from Oflight

Oflight helps enterprises integrate cutting-edge models like Mistral Small 4 into production systems. From PoC design and architecture review to prompt optimization, security compliance, and ongoing operations — our AI engineers support every stage of your deployment journey. Contact us at /services/ai-consulting to get started.

Feel free to contact us