AI2026-06-23

Local LLM June 2026 Update — Two Months After Our April Landscape: GLM-5.2 Leads Open Weights at Intelligence Index v4.1 51, MiniMax M3 Ships 1M Context + SWE-Bench Pro 59%, NVIDIA Nemotron 3 Ultra 550B, Blackwell Native MXFP4 Pushes RTX 5090 Into the 30-70B Practical Zone, Japan's SI Market Matures (Intec ¥5M+, Ricoh On-Prem Starter Kit Won the Nikkei Grand Prize, PFN PLaMo Selected for the Digital Agency 'Gennai' Platform), EU AI Act GPAI Enforcement Starts August 2, 2026

Two months after our April 2026 local-LLM landscape column, here is the primary-source update on what has changed. Three big shifts: (1) Open-weights have closed the gap with closed-source. GLM-5.2 (Z.ai, MIT, June 16, 2026) tops the Intelligence Index v4.1 at 51 (MiniMax M3 44 / DeepSeek V4 Pro 44 / Kimi K2.6 43). MiniMax M3 ships 1M context + native multimodality + SWE-Bench Pro 59.0% + Terminal-Bench 2.1 66.0% + MCP Atlas 74.2%. NVIDIA Nemotron 3 Ultra (revealed by Jensen Huang at Computex 2026) is a 550B-parameter US-flag open-weight leader. VibeThinker-3B (WeiboAI, MIT, Qwen2.5-Coder-3B fine-tune) reaches frontier-reasoner parity at 3B. (2) Blackwell makes 30–70B models practical on consumer GPUs. The RTX 5090 has 32GB GDDR7 and 1,792 GB/s bandwidth (+77% vs 4090) with native MXFP4 — GGUF Q4 runs with zero emulation overhead, hitting 5,841 tok/s on Qwen 2.5-Coder-7B at batch 8 (2.6× A100 80GB). The RTX PRO 6000 Blackwell reaches ~8,425 tok/s on 30B; the B200 ships 192GB HBM3e at 8 TB/s (4–5× H100). (3) Japan's SI market is maturing. Intec (TIS group) launched local-LLM deployment SI on January 29, 2026 — minimum 1 month, from ¥5,000,000+ ex tax — targeting manufacturing and finance. Ricoh's 'RICOH On-Prem LLM Starter Kit' won the 2025 Nikkei Excellent Product/Service Award grand prize (Qwen2.5-VL-32B-Instruct base). PFN's PLaMo 3.0 Prime was selected for the Japanese Digital Agency 'Gennai' common generative-AI platform — alongside the Mizuho / Lion Qwen on-domestic-infrastructure precedent. The column also covers concurrent moves on Kimi K2.7-Code, Sakana Fugu, DiffusionGemma, and Liquid AI LFM2.5-J, inference-engine selection (AWQ + vLLM for GPU, GGUF + llama.cpp for CPU/edge, SGLang for agents, TensorRT-LLM for NVIDIA clusters), quantization (BitNet 1.58-bit / MXFP4 / AWQ), regulation (EU AI Act GPAI enforcement from August 2, 2026; systemic-risk threshold of 10^25 FLOPs, US Fable 5 export-control precedent, Chinese-model cross-border data), typical GPU configurations by workload, and a three-step Oflight-recommended adoption path. The article closes with three direct inquiry funnels for local-LLM evaluation, build, and ongoing maintenance.

Local LLM Open Weight Self-hosted RTX 5090 Blackwell Enterprise AI GLM-5.2 MiniMax M3 Nemotron 3

TL;DR — Three Big Shifts in Local LLMs Since Our April 2026 Landscape

Two months after our April 2026 local-LLM landscape column, the market has moved hard. This is the differential update.

Three big shifts:

1. Open weights have closed the gap with closed source. GLM-5.2 (Z.ai, MIT, June 16, 2026) now leads the Intelligence Index v4.1 at 51. MiniMax M3 ships 1M context + SWE-Bench Pro 59.0%. NVIDIA Nemotron 3 Ultra is the 550B US-flag open-weight leader. 2. Blackwell makes 30–70B practical on consumer GPUs. RTX 5090 native MXFP4 lets GGUF Q4 run with zero emulation; Qwen 2.5-Coder-7B hits 5,841 tok/s at batch 8. Consumer hardware is now production-adjacent. 3. Japan's SI market is maturing. Intec ships full on-prem LLM SI from ¥5M+, Ricoh's on-prem LLM Starter Kit took the Nikkei grand prize, and PFN's PLaMo 3.0 Prime was selected for the Digital Agency 'Gennai' platform.

We close the column with three direct inquiry funnels for local-LLM evaluation, build, and ongoing maintenance.

Diff Map — April vs June 2026

Dimension	April 2026	June 2026
Open-weights leader	GLM-5.1 / Kimi K2.5	GLM-5.2 (Intelligence Index v4.1 = 51)
Long-context / multimodal	Gemma 4 E4B / Llama 4 series	MiniMax M3 (1M context + native multimodal)
US open-weights flagship	Llama 4 Maverick	NVIDIA Nemotron 3 Ultra (550B)
Lightweight reasoning	Gemma 4 E4B	VibeThinker-3B (3B at frontier parity)
Japanese specialist	Llama-3.1-Swallow / Stockmark	PLaMo 3.0 Prime (256K, Digital Agency adoption) / Liquid AI LFM2.5-J
Coding	GLM-5.1 / Kimi K2.5	Kimi K2.7-Code (1T MoE / 32B active)
Consumer GPU ceiling	RTX 4090 (24GB, Q4 13B)	RTX 5090 (32GB, MXFP4 30B, mixed-quant 70B)
Server GPU	H100 / H200	B200 (192GB HBM3e, 8 TB/s) / B300 / GB300 NVL72
Japan SI market	Nascent	Intec ¥5M+, Ricoh, PFN are the new mainline players
New category	—	Orchestration model (Sakana Fugu)

Major June 2026 Releases

GLM-5.2 (Z.ai, June 16, 2026, MIT) — New Open-Weights Leader

Z.ai shipped GLM-5.2 to coding-plan subscribers on June 13 and dropped MIT open weights on June 16 (Simon Willison, 2026-06-17). On Intelligence Index v4.1 it scores 51, beating MiniMax M3 (44), DeepSeek V4 Pro (44), and Kimi K2.6 (43). Major coding and agentic gains over 5.1; Nous Research integrated it into the Hermes Agent within days.

What this signals: the "open weights at frontier parity comes from China" line keeps strengthening. MIT licensing means free commercial / modification / redistribution — minimal legal friction for Japanese self-hosting. Cross-border data scrutiny still applies (Z.ai is Beijing-based — see our Kimi K2.7-Code coverage).

MiniMax M3 (June 2026) — 1M Context + Native Multimodal

MiniMax's latest open-weights model (kilo.ai 2026 comparison). 1M-token context, native multimodality, SWE-Bench Pro 59.0%, Terminal-Bench 2.1 66.0%, MCP Atlas 74.2%.

Strength: long-context plus image/video make it well-suited to whole-knowledge-base RAG and long-running project-management agents. GPU memory is heavy, but Blackwell B200 or RTX PRO 6000 with quantization brings it into reach.

NVIDIA Nemotron 3 Ultra (June 1, 2026, Computex) — US Open-Weights Flagship

Unveiled by Jensen Huang at Computex 2026 (NVIDIA tech report PDF, NVIDIA Research). 550B parameters, the strongest US open-weights model to date. Sebastian Raschka described it as having an "ultra impressive capability:efficiency ratio."

Why it matters: under US / EU procurement frameworks there's persistent demand for "open weights that aren't from China." Combined with the export-control precedent (below), Nemotron 3 Ultra is positioned to become the first-pick US open-weights model for high-stakes Japanese deployments.

VibeThinker-3B (WeiboAI, June 2026, MIT) — Frontier Reasoning at 3B

arXiv 2606.16140. MIT, Qwen2.5-Coder-3B fine-tune, frontier-reasoner-class math and code performance at 3B parameters.

Implication: "small is the new big" is real. Apple M5, RTX 5060/5070, Snapdragon X Elite Gen 2 — all run frontier-level reasoning now. Consumer, embedded, and offline-business use cases just became massively easier.

Other Concurrent Releases (Already Covered)

- Kimi K2.7-Code (June 12, 2026, Moonshot AI, 1T MoE / 32B active, Modified MIT) - Sakana Fugu (June 22, 2026, Sakana AI, orchestration model) - PLaMo 3.0 Prime (June 22, 2026, PFN, 256K context, dual reasoning variants, ¥60 / ¥250 per 1M tokens) - DiffusionGemma (June 2026, Google, text-diffusion model) - Liquid AI LFM2.5-J (June 2026, two Japanese-specialized models)

Intelligence Index v4.1 Ranking — GLM-5.2 on Top

Rank	Model	Score	License
1	GLM-5.2	51	MIT
2	MiniMax M3	44	OSS (license TBD)
2	DeepSeek V4 Pro	44	DeepSeek License
4	Kimi K2.6	43	Modified MIT
...	...	...	...

Observation: the top of the open-weights board is all Chinese. US/EU open-weights (Nemotron 3 Ultra, Llama 4, Mistral family) are still half a step behind on aggregate benchmarks. But vertical-specific leaders are different: coding (SWE-Bench Pro / Terminal-Bench 2.1) is Kimi K2.7-Code / MiniMax M3 / Sakana Fugu Ultra; long-context (LongBench v2) is PLaMo 3.0 Prime / MiniMax M3; math / reasoning (AIME / GPQA-Diamond) is DeepSeek V4 / VibeThinker-3B. Index #1 is not always your use-case #1.

Gap to closed source: Claude Opus 4.8 / GPT-5.5 / Gemini 3.1 Pro sit around 55–60 on the same index — 4 to 9 points above open-weights. Down from a 15+ point gap six months ago.

Hardware — Blackwell Changes Everything

RTX 5090 — Native MXFP4 Makes 30B Practical

RTX 5090: 32GB GDDR7, 1,792 GB/s memory bandwidth (+77% vs RTX 4090's 1,008 GB/s), Blackwell architecture.

Native MXFP4 is the revolution (runyard.dev RTX 5090 guide). GGUF Q4 / similar 4-bit formats run with zero emulation overhead. Qwen 2.5-Coder-7B hits 5,841 tok/s (batch 8, 2.6× A100 80GB). LLM generation is memory-bandwidth-bound; bandwidth directly maps to tokens/sec.

Practical impact: 30B-class models (Qwen3.5-30B, Gemma 4 31B, Mistral Small 4) run at real-time chat speed at Q4; 70B-class is workable with mixed-quant (Q3-Q4 mix). Consumer hardware now hits near-production performance.

RTX PRO 6000 Blackwell — The Best Single-GPU Workstation

Yotta Labs benchmarks show ~8,425 tok/s on 30B (vLLM), ~1.8× faster than RTX 5090. With 96GB VRAM you can serve 70B unquantized and 120B at Q4. The new de-facto standard for serious individual / SMB self-hosting.

B200 / B300 / GB300 NVL72 — The Enterprise Tier

B200: 192GB HBM3e at 8 TB/s, 4–5× H100 throughput (15× optimized). Llama 3.1 70B fits at FP16 on a single GPU with KV-cache headroom.

B300 (extended) and GB300 NVL72 (72-GPU rack) are now shipping. Hyperscalers and the major Japanese vendors are taking deliveries.

AMD MI350 / Apple M5 / Edge SoCs

AMD Instinct MI350 is shipping into datacenters as the NVIDIA cost alternative.

Apple M5 Ultra + on-device Apple Intelligence transforms local LLM on macOS. MLX is now strong enough to run 30B comfortably on M5.

Edge SoCs: Qualcomm Snapdragon X Elite Gen 2 / Intel Lunar / Panther Lake / NVIDIA Jetson Thor — all run VibeThinker-3B / Gemma 4 E4B / Liquid AI LFM2.5-J offline.

Inference-Engine Selection (June 2026)

Engine	Best fit	Strength
vLLM	GPU servers, production	AWQ + vLLM = fastest (Marlin-AWQ 741 tok/s, Pass@1 51.8%)
llama.cpp (GGUF)	CPU / edge / Apple Silicon	1.58-bit to 8-bit full coverage, minimal deps
SGLang	Agents, tool use	RadixAttention removes redundant shared-prefix compute
TensorRT-LLM	NVIDIA clusters	NVIDIA-optimized, production SLAs
MLX	Apple Silicon	M-series-tuned, Mac dev workflow
Ollama	Personal, PoC	Easy setup, GGUF-based
LM Studio / Jan	Desktop GUI	Non-engineer friendly

Quantization evolution:

- MXFP4 (Blackwell native): 4-bit, hardware-accelerated, zero emulation - AWQ: 4-bit, fastest on GPU servers (Marlin kernel) - GGUF: 1.5-bit to 8-bit, CPU/edge standard - BitNet 1.58-bit: {-1, 0, +1}, full-precision parity at ~1/10 the memory — the next big bet - GPTQ: older generation, being displaced by AWQ - INT8 / INT4: standard, minimal quality loss

Japan's SI Market — From Nascent to Mainstream

Intec (TIS Group) — Full SI From ¥5M

Intec press, January 29, 2026: on-prem generative-AI / local-LLM deployment SI starting January 29, 2026 — minimum 1 month, from ¥5,000,000+ ex tax, targeting manufacturing and finance. Coverage: IT Leaders.

Significance: "local LLM is now an SI flagship product line" precedent. SMB / mid-market procurement is finally pavable.

Ricoh — On-Prem LLM Starter Kit Wins the Nikkei Grand Prize

Ricoh official: the 'RICOH On-Prem LLM Starter Kit' won the 2025 Nikkei Excellent Product / Service Award grand prize. Per Nikkei 2026-01, Ricoh built its own multimodal LLM on Qwen2.5-VL-32B-Instruct, tuned for reading Japanese-business documents with embedded diagrams and tables.

Significance: a major copier vendor turning local LLM into a packaged appliance. SMB procurement extends to "add a local LLM when we refresh the office copier."

PFN — PLaMo 3.0 Prime + Digital Agency 'Gennai'

As detailed in our PLaMo 3.0 Prime column, PFN went GA on June 22, 2026 and was selected as a trial model in the Digital Agency 'Gennai' common generative-AI platform — a from-scratch Japanese + government-procurement precedent.

Hitachi Solutions, Mizuho, Lion, and Others

- Hitachi Solutions — Katsubun local-LLM solutions on-prem - Mizuho / Lion — Qwen fine-tuned on domestic GPU clouds (pattern documented earlier) - NTT / KDDI / SoftBank / Fujitsu / NEC — each on either their own model or an OSS fine-tune track

Regulation — The August 2, 2026 Watershed

EU AI Act — GPAI Enforcement Begins August 2, 2026

Per the European Commission digital strategy: the Commission's enforcement powers go live on August 2, 2026 for General Purpose AI Models — investigation, evaluation, market-restriction, and fines.

Reduced burden for open-weights GPAI:

- Models whose parameters / weights / architecture / usage are publicly available under a free open licence have only the copyright-compliance and training-data-summary obligations - Unless classified as systemic risk (cumulative training FLOPs > 10^25) — those get the full obligation set - Models placed on the market before August 2, 2025 have until August 2, 2027 to comply - Providers must notify the Commission within 2 weeks of crossing the threshold

Japan-business angle: if you serve or use these models for EU customers, the AI Act applies. This is exactly why Sakana Fugu currently excludes EU/EEA (see Fugu column).

US Export Controls — The Fable 5 Precedent

May 2026's sudden suspension of Claude Fable 5 / Mythos 5 under a US export directive made single-vendor-dependency risk a documented precedent. It also drove Sakana Fugu's orchestration design. Japanese enterprises increasingly default to "sensitive workloads on on-prem with a US open-weight (Nemotron 3 Ultra) or a Japanese open-weight (PLaMo / Liquid AI)."

Chinese Cross-Border Data — Same Diligence Bar

The top open-weights performers are mostly Chinese (GLM-5.2 / MiniMax M3 / DeepSeek V4 / Kimi K2.7-Code), but PRC National Intelligence Law Article 7 compelled disclosure still applies on API. See our Kimi K2.7-Code coverage for the full treatment. Self-host on domestic infrastructure remains the only fully reliable mitigation.

Typical GPU Configurations by Workload (June 2026)

Use case	Recommended model	Recommended GPU	Approximate monthly cost
Internal chat / light RAG	Liquid AI LFM2.5-J / Gemma 4 E4B / VibeThinker-3B	Mac M5 / RTX 5070 Ti / RTX 5080	Owned hardware
Coding assist (individual)	Kimi K2.7-Code (Q4) / GLM-5.2 / Qwen3.6-30B	RTX 5090 (32GB)	From ¥400,000 (buy)
Coding assist (team)	Kimi K2.7-Code / GLM-5.2 / Nemotron 3 Ultra	RTX PRO 6000 Blackwell (96GB)	From ¥1,500,000 (buy)
Long-context / multimodal internal KB	MiniMax M3 / PLaMo 3.0 Prime	B200 ×1 (192GB)	From ¥300,000 / month (cloud)
Production scale (finance / healthcare)	Nemotron 3 Ultra / GLM-5.2 (self-hosted)	H200 ×8 or B200 ×4	¥1.5–3M / month (cloud)
Edge / embedded	VibeThinker-3B / Liquid AI LFM2.5-J	Jetson Thor / Snapdragon X Elite Gen 2	Owned hardware

Oflight's View — A Three-Step Adoption Path

What we recommend in our AI consulting practice:

Step 1 — Assessment / requirements (1–2 weeks, from ¥198K): business analysis, compliance check, cross-border-data assessment, candidate-model shortlist, GPU-config estimate. The point is to decide whether you actually need local LLM or whether cloud APIs (e.g. [Sakana Fugu](../columns/sakana-fugu-orchestration-model-2026-06)) suffice.

Step 2 — PoC (4–8 weeks, from ¥498K): fine-tune / prompt-tune the selected model (Nemotron 3 Ultra / Kimi K2.7-Code / GLM-5.2 / PLaMo 3.0 Prime) on your data, pick the inference engine (vLLM / SGLang / llama.cpp), pick quantization (MXFP4 / AWQ / GGUF), measure ROI on a real workload. Decide cloud GPU (Sakura HPC, GMO GPU, AWS Tokyo p5) vs owned GPU.

Step 3 — Production + ongoing maintenance (custom quote): stand up the production environment (on-prem or domestic cloud), train the team, build operations runbooks, and enter a maintenance contract for model updates, quant re-tuning, KPI monitoring, and internal FAQ upkeep.

Talk to Us About Local LLM — Three Inquiry Funnels

We support local-LLM evaluation, build, and ongoing maintenance. Pick the path that fits your stage.

(1) Evaluation & Requirements (from ¥198,000)

"Do we even need a local LLM?" "Which model fits?" "How much GPU do we need?" — answered in 1–2 weeks with a written report.

👉 Contact us — AI consulting (evaluation)

(2) On-Prem Build & PoC (from ¥498,000)

PoC build, fine-tuning, inference-engine setup, quantization, and ROI measurement in 4–8 weeks. Full SI is generally in the ¥5,000,000+ range (industry benchmark: Intec).

👉 Contact us — PoC / production SI

(3) Ongoing Maintenance (¥9,800–¥80,000 / month)

Local LLM needs continuous attention — model updates, quant re-tuning, evaluating new releases, KPI monitoring, internal training. Monthly maintenance contracts available:

- For [OpenClaw-deployed sites](../services/openclaw-setup): Light ¥9,800/mo / Standard ¥19,800/mo / Premium ¥49,800/mo — LLM API updates, OS/security updates, config changes - AI consulting continuous support: Light ¥30,000/mo (monthly meeting + new-model tracking) / Standard ¥80,000/mo (bi-weekly + prompt tuning + monthly KPI review + employee training & FAQ updates) / Premium on request

👉 Contact us — OpenClaw maintenance

FAQ

Q1. Local LLM vs cloud API — which is cheaper? A. Depends on volume. Under ~hundreds of millions of tokens per month, cloud APIs (Claude / GPT / Kimi K2.7-Code) win. Above ~hundreds of millions of tokens, or when sensitive data is involved, on-prem is overwhelmingly cheaper and safer. The crossover sits around 10–50 million tokens per month. Q2. What can a single RTX 5090 handle? A. 30B models (Qwen3.5-30B, Gemma 4 31B, Mistral Small 4) at real-time chat speed (5,000+ tok/s) at Q4; 70B with mixed-quant is workable. Plenty for individual coding assist and internal RAG. Q3. Are Chinese models (GLM-5.2, Kimi K2.7-Code) safe? A. Self-hosted: yes (download weights from Hugging Face onto domestic infrastructure). API: no — PRC National Intelligence Law Article 7 risk persists for sensitive workloads. See Kimi K2.7-Code coverage. Q4. EU AI Act impact? A. Enforcement starts August 2, 2026. Open-weights GPAI gets the lighter regime (copyright + training-data summary only) unless systemic risk. Limited direct impact for Japan-only operations; you must comply if serving EU customers. Q5. Do US export controls (Fable 5 precedent) affect on-prem? A. Already-downloaded open-weights are safe. The risk lives on cloud APIs — Fable 5 / Mythos 5 showed sudden suspensions can happen. That precedent is exactly what motivates Sakana Fugu-style orchestration and on-prem operations. Q6. Which quantization should we pick? A. NVIDIA Blackwell (RTX 5090 / B200) → MXFP4 (native, fastest). GPU servers (H100 / H200) → AWQ + vLLM. CPU / edge → GGUF + llama.cpp. Extreme minification → BitNet 1.58-bit (implementations still limited). Q7. Where does procurement happen in Japan? A. Intec ¥5M+ full SI, Ricoh on-prem LLM starter kit, PFN PLaMo 3.0 Prime, Hitachi Solutions, and the domestic GPU clouds (Sakura HPC / GMO / AWS Japan) are the mainline players. For SMB / mid-market, specialist AI consultancies like Oflight are the cost-effective alternative. Q8. Is full migration from cloud APIs to on-prem realistic? A. Yes, as of June 2026. Open weights have closed the gap (GLM-5.2 / Nemotron 3 Ultra / MiniMax M3), and Blackwell makes 30–70B work on consumer hardware. Decision axes: cost, data sovereignty, continuity (export-control risk).

Bottom Line

Three realities of the local-LLM market in June 2026:

1. Open weights have closed on closed source (GLM-5.2 = 51 on Intelligence Index v4.1; the gap to Opus 4.8 / GPT-5.5 is down to 4–9 points) 2. Blackwell makes 30–70B practical on consumer GPUs (RTX 5090 native MXFP4 / RTX PRO 6000 as the new SMB default) 3. Japan's SI market is mainstream (Intec, Ricoh, PFN, Hitachi as the mainline players; procurement paths are now paved)

All of this happened in two months. The next watershed is the EU AI Act on August 2, 2026, and the open-weights vendor responses around that date. Continuous tracking matters.

At Oflight, we support local-LLM evaluation, PoC build, and ongoing maintenance end-to-end. Use the three inquiry funnels above to get in touch.

References

Models: - Simon Willison — GLM-5.2 most powerful text-only open weights LLM - Z.ai - Kilo.ai — Best Open-Source & Open-Weight Coding Models 2026 - NVIDIA Nemotron 3 Family - NVIDIA Nemotron 3 Ultra Technical Report - Latent Space — NVIDIA Cosmos 3, Nemotron 3 Ultra, RTX Spark - arXiv 2606.16140 — VibeThinker-3B - Codersera — Open-Source LLMs Landscape May 2026 - HuggingFace — Best Open-Source & Open-Weight LLM Models to Run Locally 2026 Hardware: - runyard.dev — RTX 5090 Local LLM Blackwell Guide 2026 - Yotta Labs — Best GPUs for LLM Inference 2026 - VRLA Tech — GPU Benchmark for AI LLM 2026 - Spheron — Best NVIDIA GPUs for LLMs 2026 - knightli — RTX 5090/5080 AI Inference Benchmarks Inference engines / quantization: - Meta Intelligence — Quantization Guide 2026 - Jarvis Labs — vLLM Quantization Complete Guide - VRLA Tech — vLLM vs Ollama vs llama.cpp vs SGLang 2026 - Sesame Disk — Local Inference Engines 2026 - GitHub — ggml-org/llama.cpp Japan: - Intec press, January 29, 2026 - Nikkei — Intec on-prem LLM SI - IT Leaders — Intec local-LLM SI - Ricoh — On-prem LLM Starter Kit grand prize - Nikkei — Ricoh Qwen2.5-VL-32B multimodal - Hitachi Solutions — local LLM on-prem - Biton — Japan-focused local LLM, May 2026 Regulation: - European Commission — Guidelines for GPAI Providers - EU AI Act - Linux Foundation — What Open Source Developers Need to Know - HuggingFace — Open Source Developers Guide to EU AI Act Related Oflight columns: - Local LLM Landscape, April 2026 - Kimi K2.7-Code - Sakana Fugu - PLaMo 3.0 Prime - DiffusionGemma - Liquid AI LFM2.5-J - Sakana Marlin - Claude Fable 5 export-control suspension - Loop Engineering - Cognition FrontierCode benchmark Inquiries: - AI consulting (evaluation / PoC) - OpenClaw setup / maintenance - Custom software development / SI

Feel free to contact us