株式会社オブライト

Articles tagged "MoE"

8 articles

AI2026-06-15
Kimi K2.7-Code Deep Dive — Moonshot AI's June 12, 2026 Coding-Specialized 1T MoE Open-Weights Model, Modified MIT License, $0.95/$4.00 per 1M, 256K Context — But Japanese Enterprises Face Two Critical Caveats (Cross-Border Data and Unverified Benchmarks)
A primary-source deep dive on **Kimi K2.7-Code**, released June 12, 2026 by Moonshot AI (Beijing). Grounded in the [Hugging Face model card](https://huggingface.co/moonshotai/Kimi-K2.7-Code), [MarkTechPost](https://www.marktechpost.com/2026/06/12/moonshot-ai-releases-kimi-k2-7-code-a-coding-model-reporting-21-8-on-kimi-code-bench-v2-over-k2-6/), and [VentureBeat's skepticism piece](https://venturebeat.com/technology/kimi-k2-7-code-cuts-thinking-tokens-30-practitioners-say-benchmarks-dont-check-out). Covers the 1T-total / 32B-active MoE architecture (384 experts, 8 routed + 1 shared), 256K context, MoonViT ~400M vision encoder, native INT4, forced-on thinking mode. License is **Modified MIT** (attribution required only above 100M MAU or $20M MRR), API pricing is $0.95 input / $0.19 cache-hit / $4.00 output per 1M tokens — roughly **1/18 of Claude Opus 4.8's output price**. OpenAI + Anthropic-compatible endpoints drop straight into Claude Code / Cursor / Aider / Cline / cmux. Moonshot self-reports **+21.8% vs K2.6 on its own Kimi Code Bench v2 and -30% reasoning tokens**, but **all public benchmarks are Moonshot's own proprietary suites; independent SWE-bench Verified / Pro / FrontierCode scores are not yet available as of June 15, 2026** (VentureBeat). For Japanese enterprises the column flags two critical caveats: **(1) both `api.moonshot.cn` and the Singapore-subsidiary-run `api.moonshot.ai` remain exposed to PRC National Intelligence Law Article 7 compelled disclosure (set against Japan's PPC DeepSeek alert of February 3, 2025 and the Digital Agency notice of February 6, 2025), and (2) the only reliable mitigation is Hugging Face self-hosting (~4-8 H100, ~595GB INT4) following the Mizuho / Lion Qwen-on-domestic-infrastructure precedent**.
Moonshot AIKimi K2.7-CodeOpen Weight LLM
AI2026-04-24
DeepSeek V4 Preview Released — 1.6T MoE / 1M-Token Context Open-Weight Model [April 2026]
Overview of DeepSeek V4 Preview, released on April 24, 2026: two open-weight Mixture-of-Experts variants (V4-Pro at 1.6T total / 49B active and V4-Flash at 284B / 13B), 1-million-token context, weights on Hugging Face, and rollout via API and chat — based on official information.
DeepSeek V4オープンソースLLMMoE
AI2026-04-10
GLM-5.1 Complete Guide — #1 SWE-bench Pro Open-Source LLM [April 2026]
GLM-5.1 by Z.ai (released April 7, 2026) is the first open-source LLM to top SWE-bench Pro at 58.4%, surpassing GPT-5.4 (57.7%) and Claude Opus 4.6 (57.3%). This guide covers its 744B/40B-active MoE architecture, MIT license, 8-hour autonomous task capability, and setup via Ollama.
GLM-5.1Z.aiSWE-bench
AI2026-04-10
Kimi K2.5 Complete Guide — 1 Trillion Parameter MIT-Licensed Open-Source LLM [2026]
Kimi K2.5, released by Moonshot AI on January 27, 2026, is a 1 trillion parameter (32B active) MoE model under the MIT License. It scores 76.8% on SWE-bench, 99.0% on HumanEval, and 87.6% on GPQA Diamond. This guide covers its architecture, hardware requirements, Ollama setup, and practical use cases.
Kimi K2.5Moonshot AI1兆パラメータ
AI2026-04-10
Mistral Small 4 Complete Guide — Unified Reasoning, Multimodal & Code in 119B MoE [2026]
Mistral Small 4, released March 2026, unifies reasoning, multimodal vision, and agentic coding in a 119B MoE model under Apache 2.0. Supports 11 languages including Japanese. Full specs, setup guide, and model comparisons.
Mistral Small 4MoEマルチモーダル
AI2026-04-10
MiniMax M2.5 Complete Guide — Lightning Attention Achieves 80.2% SWE-bench [2026]
MiniMax M2.5 achieves 80.2% on SWE-bench Verified using proprietary Lightning Attention in a 230B MoE model. Full breakdown of architecture, benchmarks, license terms, and setup instructions.
MiniMax M2.5SWE-benchLightning Attention
AI2026-03-17
Complete Guide to Rakuten AI 3.0 Architecture: Next-Gen Japanese LLM with MoE
A comprehensive analysis of Rakuten AI 3.0's Mixture of Experts architecture with 700B parameters. Explore the 8-expert configuration, 40B active parameter efficiency, and technical background behind achieving 8.88 on Japanese MT-Bench.
Rakuten AI 3.0MoEMixture of Experts
AI2026-03-17
NemoClaw's NIM Inference Microservices and Nemotron Models — Deployment Strategies from Edge to Cloud
A technical deep dive into NemoClaw's NIM inference microservices and Nemotron model family. We examine containerized API endpoints, elastic scaling, Nemotron 3 Super performance (120B parameters, MoE with 12B active), deployment comparisons across AWS, Azure, GCP, and on-premises, lightweight edge device operations, and partner integration use cases with Salesforce, CrowdStrike, and more.
NemoClawNIMNemotron