AI2026-03-17

Complete Guide to Rakuten AI 3.0 Architecture: Next-Gen Japanese LLM with MoE

A comprehensive analysis of Rakuten AI 3.0's Mixture of Experts architecture with 700B parameters. Explore the 8-expert configuration, 40B active parameter efficiency, and technical background behind achieving 8.88 on Japanese MT-Bench.

Rakuten AI 3.0 MoE Mixture of Experts 大規模言語モデル GENIAC 日本語LLM 700億パラメータ Apache 2.0

Rakuten AI 3.0: Overview of Next-Generation LLM Development

Rakuten AI 3.0 is a large language model with approximately 700 billion parameters, developed by Rakuten Group as part of the GENIAC project promoted by Japan's Ministry of Economy, Trade and Industry (METI) and NEDO. Set for release in spring 2026 under the Apache 2.0 license via Hugging Face, it has garnered attention as an open-source LLM specialized in Japanese language processing. Its key feature is the Mixture of Experts (MoE) architecture, which activates only about 40 billion parameters—approximately 6% of total parameters—during inference, achieving fast and efficient processing. On the Japanese MT-Bench, it scored 8.88, surpassing GPT-4o. Trained on Rakuten's internal multi-node GPU cluster, it will be deployed across Rakuten's ecosystem services through the Rakuten AI Gateway. The model excels in text generation, code generation, and document analysis and extraction tasks tailored for Japanese language.

How Mixture of Experts Architecture Works

Mixture of Experts (MoE) is an architecture that combines multiple specialized neural networks (experts) and dynamically selects the optimal expert for each input token. Unlike traditional Dense models where all parameters participate in every computation, MoE employs a routing mechanism (gate network) that selects the most relevant experts for each token. This maintains the model's overall representational capacity while drastically reducing actual computational costs. In Rakuten AI 3.0, the optimal expert is selected from 8 specialized experts for each token, plus one shared expert that remains active for all token processing. This design provides the representational power of approximately 700 billion parameters while operating with only about 40 billion parameters during inference, dramatically improving latency and cost-efficiency. The MoE approach represents a paradigm shift in how large-scale models balance capability and efficiency.

8-Expert Configuration and Specialization Strategy

Rakuten AI 3.0's eight experts automatically specialize in different tasks and domains through the training process. Each expert optimizes for distinct language patterns, contexts, and task types, handling specialized areas such as text generation, code generation, document analysis, and data extraction. The routing mechanism analyzes input token context and selects the most appropriate expert combination to generate high-quality outputs. Furthermore, one constantly active shared expert participates in all token processing, facilitating knowledge sharing among experts and maintaining consistency. This configuration enables Rakuten AI 3.0 to flexibly handle diverse tasks while executing complex language processing with high precision, including Japanese-specific grammatical structures, honorific expressions, and business document creation. The specialization strategy allows the model to achieve expert-level performance across multiple domains simultaneously.

Dense vs MoE: Efficiency of 40B Active Parameters

Compared to traditional Dense models, the efficiency of MoE architecture is remarkable. A 700 billion parameter Dense model requires computation of all parameters for every inference, consuming enormous GPU memory and computational time. In contrast, Rakuten AI 3.0 activates only about 40 billion parameters, reducing memory footprint by approximately 94% and significantly improving inference speed. Rakuten announced achieving up to 90% cost reduction compared to third-party frontier models, attributed to MoE's computational efficiency. Moreover, 40 billion parameters represent a substantial scale even as a standalone Dense model, and when combined with MoE specialization, achieves quality comparable to much larger Dense models. This efficiency is critically important for enterprises deploying AI in production environments. The cost-performance ratio makes advanced AI accessible to organizations with limited computational resources.

Training Infrastructure: Rakuten's Internal GPU Cluster Capabilities

Training Rakuten AI 3.0 utilized Rakuten's internally built multi-node GPU cluster. Efficiently training a 700 billion parameter MoE model requires advanced distributed training techniques and optimized infrastructure. Rakuten's GPU cluster implements 3D parallel training, combining data parallelism, model parallelism, and pipeline parallelism to efficiently synchronize parameters and gradients across multiple GPU nodes. It also addresses MoE-specific challenges such as load balancing (adjusting balance among experts) and minimizing communication overhead. With support from the GENIAC project, Rakuten built a proprietary training pipeline and conducted pre-training on large-scale datasets centered on Japanese corpora. This internal infrastructure enables rapid development and continuous improvement of Rakuten AI 3.0. The investment in dedicated training infrastructure demonstrates Rakuten's long-term commitment to AI development.

Achieving 8.88 on Japanese MT-Bench: Technical Factors

Rakuten AI 3.0's achievement of 8.88 on the Japanese MT-Bench stems from multiple technical factors. First is the quality and quantity of training data. Rakuten collected vast amounts of high-quality Japanese corpora covering diverse domains including business documents, technical documentation, and conversational data. Second is specialization through MoE architecture. Eight experts optimized for different task types deliver high performance across MT-Bench's diverse evaluation axes, including text generation, logical reasoning, and mathematical problem-solving. Third is handling Japanese-specific linguistic structures. The model intensively learns challenging aspects of Japanese processing, such as honorific expressions, particle usage, and context-dependent semantic interpretation. These elements combine to achieve Japanese language performance surpassing GPT-4o, generating high-quality outputs in practical business scenarios. The benchmark results validate Rakuten AI 3.0's readiness for real-world deployment.

Summary: Message for Enterprises Considering AI Adoption

Rakuten AI 3.0 represents a next-generation large language model combining MoE architecture efficiency with Japanese-specialized performance. Its release under the Apache 2.0 license enables a wide range of organizations, including small and medium enterprises, to leverage this advanced technology. Oflight Inc., based in Shinagawa Ward, Tokyo, provides AI adoption support and consulting services for regional businesses centered in Shinagawa, Minato, Shibuya, Setagaya, Meguro, and Ota wards, focusing on cutting-edge AI technologies including Rakuten AI 3.0. From understanding technical architecture to implementation support and business integration, we offer comprehensive assistance to realize corporate DX initiatives and AI utilization. As a partner to maximize Rakuten AI 3.0's potential and connect it to business value, we welcome your inquiries.

Feel free to contact us