AI2026-03-17

Rakuten AI 3.0 vs GPT-4o: Japanese Performance Comparison and Cost Analysis

A comprehensive comparison of Rakuten's latest AI 'Rakuten AI 3.0' and OpenAI's 'GPT-4o' based on Japanese MT-Bench scores, task-specific performance, and deployment costs. We provide concrete decision criteria for enterprises choosing between these models.

Rakuten AI 3.0 GPT-4o ベンチマーク比較日本語MT-Bench コスト分析 LLM比較日本語AI AI選定ガイド

Performance Gap in Japanese MT-Bench Scores

Rakuten AI 3.0 achieved a high score of 8.88 on the Japanese MT-Bench, surpassing GPT-4o. MT-Bench is a standard benchmark that evaluates large language models across diverse task categories including writing, roleplay, reasoning, mathematics, coding, information extraction, STEM, and humanities. GPT-4o's Japanese score is officially around 8.5, meaning Rakuten AI 3.0 demonstrates approximately 4.5% performance improvement. This gap is particularly evident in understanding Japanese-specific contexts, honorific expressions, and business document creation. The performance difference stems from Rakuten Group's optimized training dataset for the Japanese market and the Mixture of Experts (MoE) architecture with approximately 700 billion parameters.

Task-Specific Performance: Strengths and Weaknesses

In writing tasks, Rakuten AI 3.0 receives high ratings for Japanese business emails, reports, and press releases. It excels particularly in the appropriate use of honorifics and tone adjustment according to internal/external contexts compared to GPT-4o, generating more natural outputs. In code generation, both models show high performance, but GPT-4o has a slight advantage due to its rich training data across multiple programming languages. However, in information extraction and document analysis tasks, Rakuten AI 3.0 demonstrates superior accuracy in extracting structured data from Japanese contracts, legal documents, and technical specifications. For roleplay tasks, both models score above 8.5, but Rakuten AI 3.0 tends to generate more appropriate responses in scenarios specific to Japanese culture (customer service, conversations with superiors, etc.).

Detailed API Cost Comparison

GPT-4o API pricing is approximately $0.005 per input token and $0.015 per output token (as of March 2026). For enterprises processing 1 million tokens (input and output combined) per month, the monthly cost is about $10,000. In contrast, Rakuten AI 3.0 is available for free download from Hugging Face under the Apache 2.0 license, assuming self-hosting operations. For an on-premise environment with 8× NVIDIA A100 80GB configuration, the initial investment is approximately $200,000, with monthly operational costs (power, cooling, maintenance) around $3,000. When hosting on cloud infrastructure (such as AWS p4d.24xlarge), monthly costs are approximately $25,000, but Rakuten claims up to 90% cost reduction compared to third-party frontier models. This is due to the MoE architecture that activates only about 40 billion parameters during inference, efficiently utilizing computational resources.

Self-Hosting vs Cloud API TCO Analysis

Comparing 3-year Total Cost of Ownership (TCO), GPT-4o API costs approximately $360,000 for 1 million tokens per month, while Rakuten AI 3.0 on-premise environment totals $200,000 initial investment + $108,000 operational costs = $308,000, representing about 15% cost reduction. However, this calculation assumes constant monthly token volume; if usage varies significantly, API's pay-as-you-go model may be more advantageous. Additionally, self-hosting requires specialized MLOps engineers' salaries ($100,000-$150,000 annually). On the other hand, for financial, healthcare, and government institutions with strict data privacy and compliance requirements, the benefits of Rakuten AI 3.0's complete on-premise data control become substantial. Rakuten itself trains and operates on internal multi-node GPU clusters, proving enterprise-level viability.

Selection Guidelines by Company Size

For startups and small businesses (under 100,000 tokens monthly), GPT-4o API with no initial investment and pay-as-you-go pricing is optimal. When technical teams are small with limited infrastructure management resources, OpenAI's managed service minimizes operational burden. Mid-sized companies (1-10 million tokens monthly) face decisions based on token volume and data privacy requirements. If Japanese tasks are central and customer or confidential data cannot be sent externally, Rakuten AI 3.0 self-hosting becomes a strong option. For large enterprises (over 10 million tokens monthly), Rakuten AI 3.0's cost advantage becomes pronounced through economies of scale. Particularly for companies planning integration with the Rakuten ecosystem or utilizing Rakuten AI Gateway, integration with the same architecture is straightforward.

Technical Requirements and Ecosystem Considerations

Deploying Rakuten AI 3.0 requires knowledge of MoE-compatible inference engines (vLLM, Text Generation Inference, etc.) and GPU environment construction/operation skills. To efficiently run a model with approximately 700 billion parameters, a configuration of at least 4× A100 80GB is recommended, and even with quantization (GPTQ, AWQ), substantial infrastructure investment is required. In contrast, GPT-4o can be immediately utilized via REST API, with enterprise support available through Azure OpenAI Service. From a development ecosystem perspective, GPT-4o has abundant toolchains including LangChain, LlamaIndex, and OpenAI Gym, but Rakuten AI 3.0 is also compatible with major open-source tools based on Hugging Face Transformers. After the official release in spring 2026, community-driven fine-tuning examples and plugin development are expected to become active.

Conclusion: Choosing the Optimal Model

The choice between Rakuten AI 3.0 and GPT-4o depends on company size, technical resources, data privacy requirements, and the weight of Japanese language tasks. Rakuten AI 3.0 is optimal for enterprises prioritizing Japanese performance, planning large-scale token processing, and possessing self-hosting technical capabilities. Conversely, GPT-4o API is suitable when emphasizing rapid deployment, global multilingual support, and operational simplicity. Based in Shinagawa Ward, Tokyo, Oflight Inc. provides AI adoption support and consulting services for the latest AI models including Rakuten AI 3.0 and GPT-4o, primarily serving the Shinagawa, Minato, Shibuya, Setagaya, Meguro, and Ota ward areas. From selecting the optimal AI solution for your business requirements to implementation and operational support, we offer comprehensive assistance. Please feel free to contact us.

Feel free to contact us