AI2026-04-10

MiniMax M2.5 Complete Guide — Lightning Attention Achieves 80.2% SWE-bench [2026]

MiniMax M2.5 achieves 80.2% on SWE-bench Verified using proprietary Lightning Attention in a 230B MoE model. Full breakdown of architecture, benchmarks, license terms, and setup instructions.

MiniMax M2.5 SWE-bench Lightning Attention MoE オープンソースLLM

What is MiniMax M2.5? The open MoE model achieving 80.2% on SWE-bench with Lightning Attention

MiniMax M2.5, released by Chinese AI company MiniMax on February 11, 2026, is an open-weight large language model with 230B total parameters and 10B active parameters per inference, using a MoE architecture that selects 8 experts per token. Its defining innovation is Lightning Attention, a proprietary linearized attention mechanism. The model achieves 80.2% on SWE-bench Verified — one of the highest scores among open-source models globally, trailing Claude Opus 4.6 (80.8%) by just 0.6 percentage points.

What is Lightning Attention?

Lightning Attention is MiniMax's proprietary linearized attention technology. Standard Transformer attention has O(N²) memory and compute complexity with respect to sequence length N. Lightning Attention reduces this to approximately O(N), delivering several key advantages: - Dramatically lower memory consumption: VRAM usage no longer explodes with long inputs - Superior long-context handling: Efficiently processes documents with tens of thousands of tokens - Higher throughput: Better parallelism during batch inference This gives MiniMax M2.5 a structural edge on engineering tasks that involve large codebases and long documents.

M2.5 Architecture

Loading diagram...

Benchmark Comparison

MiniMax M2.5 benchmark results compared to leading models:

Benchmark	MiniMax M2.5	Claude Opus 4.6	Kimi K2.5	GLM-5.1
SWE-bench Verified	80.2%	80.8%	76.8%	—
Multi-SWE-bench	51.3%	—	—	—
SWE-bench Pro	—	57.3%	—	58.4%

On SWE-bench Verified, MiniMax M2.5 comes within 0.6 points of Claude Opus 4.6, making it one of the most capable open-weight models for software engineering tasks available today.

License: Credit Attribution Required for Commercial Use

MiniMax M2.5 uses a Modified MIT License. Unlike standard MIT, commercial use requires explicit credit attribution — specifically, you must include a reference to "MiniMax M2.5" (e.g., "Powered by MiniMax M2.5") in products, services, or publications that use the model. This is more restrictive than Apache 2.0 models. Enterprises should review the full license terms before production deployment. For personal and research use, the model is freely available with standard MIT-style permissions.

Hardware Requirements and Setup

Hardware requirements by configuration:

Configuration	VRAM Required	Hardware
Full precision (BF16)	~460 GB	6× H100 80GB or more
Q4 quantization	~120 GB	2× H100 / 3× A100
Q8 quantization	~230 GB	Large server environment
Ollama lightweight	24–48 GB	1–2× RTX 4090

Ollama setup:

bash

ollama run minimax-m2.5

Ollama automatically selects the optimal quantization level based on available VRAM, making consumer GPU deployment accessible. A single RTX 4090 ($1,599 MSRP) can run lighter quantized versions.

Near-Claude Opus 4.6 Coding Performance

SWE-bench Verified measures a model's ability to autonomously resolve real GitHub issues. An 80.2% score means MiniMax M2.5 can automatically fix 802 out of 1,000 real-world software bugs — nearly identical to Claude Opus 4.6 at 80.8%. The practical implication: for bulk code generation, review, debugging, and refactoring workflows, MiniMax M2.5 delivers near-frontier coding quality at zero API cost. This makes it a compelling choice for teams running high-volume code automation.

Road to MiniMax M2.7

MiniMax has announced that M2.7 will incorporate a self-evolving architecture. The key target: M2.7 should be capable of autonomously executing 30–50% of reinforcement learning (RL) research tasks. Rather than simply improving benchmark scores, MiniMax is positioning M2.7 as a model that accelerates AI research itself. A release is expected in the second half of 2026.

Quick Reference Specs

Specification	Value
Total Parameters	230B
Active Parameters	10B per inference
Experts Selected per Token	8
Attention Mechanism	Lightning Attention (linearized)
Release Date	February 11, 2026
License	Modified MIT (credit required for commercial use)
SWE-bench Verified	80.2%
Multi-SWE-bench	51.3%

Frequently Asked Questions

Q1. Is MiniMax M2.5 free to use? The model weights are publicly available at no cost under the Modified MIT License. Commercial use is allowed with credit attribution — no usage fees are charged. Q2. Is Lightning Attention used in other models? Lightning Attention is a proprietary MiniMax technology, currently exclusive to the MiniMax model family. Q3. What is SWE-bench? SWE-bench is a benchmark that evaluates a model's ability to autonomously resolve real GitHub issues from open-source repositories. It is widely regarded as one of the most practically meaningful coding benchmarks available. Q4. Can I run it on consumer hardware? Yes. Via Ollama's automatic quantization, it can run on 1–2× RTX 4090 GPUs (24 GB VRAM each) at reduced precision. Q5. How is Japanese language quality? MiniMax M2.5 is primarily trained on English and Chinese. Japanese quality may trail models with official Japanese language support (e.g., Mistral Small 4, Qwen). Pre-production testing is recommended for Japanese-language tasks. Q6. Can I get help deploying MiniMax M2.5? Oflight provides enterprise consulting for open-source LLM deployments including MiniMax M2.5. We cover everything from requirements definition to production operations. Visit `/services/ai-consulting` to get started.

AI Deployment Support from Oflight

Whether you want to integrate MiniMax M2.5 into code review pipelines, build RAG systems, or automate software engineering workflows — Oflight's AI engineers are ready to help. We handle model selection, infrastructure design, prompt engineering, and security compliance end-to-end. Reach out at `/services/ai-consulting` to discuss your project.

Feel free to contact us