AI2026-06-1012 min read

Apple AFM Core Advanced Deep Dive

How 20B Sparse MoE Brings Frontier AI to iPhone

AFM Core Advanced, the flagship of Apple's third-generation Foundation Models announced at WWDC 2026, packs a 20B-parameter Sparse MoE with Apple's proprietary IFP technology — enabling frontier-class on-device inference on iPhone 17 Pro. This deep dive covers architectural innovations, A19 Pro specs, device requirements, and the 'fully Apple designed' controversy around Gemini distillation.

Apple AFM Apple Foundation Models WWDC 2026 A19 Pro On-device AI Sparse MoE Apple Intelligence

TL;DR

Apple officially unveiled its third-generation Apple Foundation Models (AFM 3) at WWDC 2026 on June 8-9. The family flagship, AFM 3 Core Advanced, is a 20B-parameter Sparse MoE model that activates only 1-4B parameters per request. Apple's proprietary IFP (Instruction-Following Pruning) technology selects and locks experts during the prefill phase, keeping the full 20B model in NAND flash while loading only the chosen experts into DRAM. The model runs exclusively on devices with 12GB RAM: iPhone 17 Pro / Pro Max (A19 Pro), iPhone Air, iPad (M4 or later), Mac (M3 or later), and Apple Vision Pro (M5). iPhone 16 and iPhone 15 Pro with 8GB RAM are not supported. While the architecture and inference runtime are entirely Apple-designed, VP of AI Amar Subramanya has confirmed that post-training used Gemini frontier model outputs as teacher signals for knowledge distillation. See the Apple ML Research official announcement for full details.

The Full AFM 3 Family Unveiled at WWDC 2026

The third-generation AFM family consists of five models split between on-device and server-side deployment.

On-Device
- AFM 3 Core (~3B Dense) — broadly compatible with iPhone 16 and later, M-series iPad and Mac
- AFM 3 Core Advanced (20B Sparse MoE) — the focus of this article

Server-Side
- AFM 3 Cloud — general-purpose text processing
- AFM 3 Cloud (Image) — image understanding
- AFM 3 Cloud Pro — flagship for advanced reasoning tasks

The two on-device models run entirely on the device without routing through Private Cloud Compute. The three server models are handled either on Apple's private cloud or, as detailed below, on NVIDIA GPUs running on Google Cloud infrastructure. Craig Federighi (SVP Software Engineering) and Amar Subramanya (VP AI) jointly announced the generational update.

AFM Core Advanced — The Innovation of 20B Sparse MoE

The defining difference between AFM 3 Core Advanced and any previous on-device model is its Sparse Mixture-of-Experts (Sparse MoE) architecture. With 20B total parameters but only 1-4B active per request, it consumes as little as 1/20th the compute of an equivalent dense 20B model.

This lets the model retain the knowledge capacity of a large model while keeping inference latency and heat output within smartphone-acceptable limits. Native multimodal support — covering audio, Visual Intelligence, and dictation from the ground up — further distinguishes it from the text-focused AFM 3 Core. Full technical details are available on the Apple ML Research page.

IFP (Instruction-Following Pruning) — What It Means Technically

Apple's proprietary IFP (Instruction-Following Pruning) sets itself apart from conventional model pruning by using 'instruction-following capability' as the preservation criterion during expert pruning.

Rather than pruning based on gradient magnitude or random sampling, IFP monitors the model's ability to follow instructed behavior and prunes experts only when doing so does not degrade instruction compliance. The result is a compact expert configuration that prioritizes 'accurately interpreting user intent' over broad factual recall. This is the architectural reason why Siri responses and Writing Tools feel contextually coherent despite running on a pruned model.

How It Differs from Conventional MoE — NAND Residency to DRAM Lock

In standard MoE models like Mixtral, a gate network dynamically selects experts token by token during generation. On a server with fast HBM memory this works well, but on a smartphone the I/O cost of swapping experts in and out of DRAM every few milliseconds becomes prohibitive.

AFM 3 Core Advanced takes a fundamentally different approach:

1. The full 20B model resides in NAND flash storage at all times
2. During the prefill phase, the task characteristics of the prompt are analyzed and an expert subset (1-4B worth) is selected and locked once
3. The selected experts are loaded into DRAM and held there until token generation is complete — no further NAND-DRAM transfers occur

This 'prefill-lock' strategy eliminates all mid-generation storage I/O, freeing the A19 Pro's 76.8 GB/s DRAM bandwidth entirely for token generation. MacStories covers this mechanism in depth.

Supported Devices — 12GB RAM is the Hard Requirement

Apple's stated minimum requirement for AFM 3 Core Advanced is 12GB RAM. Devices that meet this bar are:

- iPhone 17 Pro / Pro Max (A19 Pro, 12GB)
- iPhone Air
- iPad (M4 or later)
- Mac (M3 or later)
- Apple Vision Pro (M5)

iPhone 16 (8GB) and iPhone 15 Pro (8GB) are explicitly excluded. Enterprises that standardized on iPhone 16 Pro in 2024-2025 would need to refresh to iPhone 17 Pro to leverage AFM Core Advanced — a meaningful budget and procurement consideration. See Appleosophy's detailed coverage for more context.

A19 Pro Chip Specifications

The A19 Pro powering iPhone 17 Pro provides the hardware foundation for AFM 3 Core Advanced:

- Process node: TSMC N3P (3nm)
- CPU: 6 cores (performance cores up to 4.26 GHz)
- GPU: 6 cores with built-in Neural Accelerators (up to 4x GPU performance vs. A18 Pro)
- Neural Engine: 16 cores
- RAM: 12GB LPDDR5X at 76.8 GB/s bandwidth

The built-in Neural Accelerators inside the GPU cores — separate from the Neural Engine — are designed to accelerate AI computation at the GPU level, making them particularly well-matched to the parallel expert computations of a Sparse MoE architecture. Full specifications are available at Apple's iPhone 17 Pro specs page and Notebookcheck's A19 Pro breakdown.

'Fully Apple Designed' — Architecture Yes, Training Partially No

In the WWDC 2026 keynote, Craig Federighi declared 'zero usage of Google Assistant' and positioned AFM as entirely Apple-made. The model architecture and the on-device inference runtime are unambiguously Apple's own work — this is confirmed fact.

However, VP of AI Amar Subramanya acknowledged that post-training used outputs from Gemini frontier models as teacher signals in a knowledge distillation process. This is a standard and widely accepted technique in LLM development: a large 'teacher' model generates high-quality labeled outputs, and a smaller 'student' model is trained on those labels to punch above its parameter count.

MacRumors and MacObserver have both clarified the factual record. The honest summary: model design and inference code are Apple's; a portion of training signals came from Gemini outputs. The critique that Apple's 'Fully Apple Designed' framing glosses over the distillation relationship is fair. See also 9to5Mac's Federighi interview.

The Switch to NVIDIA GPUs — Why Apple Is Not Using Google TPUs

Server-side processing for AFM 3 Cloud Pro runs on NVIDIA GPUs hosted on Google Cloud, under a multi-year deal between Apple and Google signed on January 12, 2026. Apple has been explicit that it is using NVIDIA GPUs, not Google TPUs — a distinction that signals Apple is leveraging Google's cloud infrastructure for raw compute capacity rather than depending on Google's own AI silicon stack.

As reported by CNBC, this architecture minimizes vendor lock-in on the inference hardware side while giving Apple access to NVIDIA's high-TFLOPS GPUs for server reasoning tasks. On-device processing via AFM 3 Core / Core Advanced remains completely independent of this cloud arrangement, with no privacy implications for on-device workloads.

Apple Intelligence Features Powered by AFM Core Advanced

Unlocking AFM 3 Core Advanced on a supported device enables the following Apple Intelligence capabilities:

- Expressive Voices: Emotionally nuanced, customizable synthesized speech
- High-Accuracy Dictation: Improved accuracy for long-form and domain-specific transcription
- New Siri (iOS 27 AI Siri): Context-retaining, multi-step instruction-following assistant
- Enhanced Visual Intelligence: Real-time on-device analysis of images and video
- Writing Tools (enhanced): Long-context document generation and rewriting
- Future Agentic Features: Cross-app task automation (roadmap stage)

Many of these capabilities are only partially available on AFM 3 Core and reach full performance only with Core Advanced.

Competitive Positioning — vs. Gemma 4, LFM2.5, and Phi-5

Apple's official benchmarks claim that AFM 3 Core (3B Dense) outperforms Qwen-2.5-3B, Gemma-3-4B, and Gemma-3n-E4B on MMLU and MMMLU.

However, there are no official head-to-head comparisons between AFM 3 Core Advanced (20B Sparse MoE) and other on-device models such as Gemma 4 12B (Google DeepMind's encoder-free multimodal model), Liquid AI's LFM2.5 (including Japanese-optimized variants), Microsoft Phi-5, or Qwen 3.5. Independent third-party benchmarks are still forthcoming.

Structurally, Sparse MoE's ability to retain 20B knowledge capacity at ~4B active compute cost is a differentiated advantage that competing Dense models cannot replicate at the same device TDP. For devices that clear the 12GB RAM requirement, the quality-to-cost tradeoff is compelling.

What This Means for Japanese Enterprise Use

The fact that AFM 3 Core Advanced runs entirely on-device has concrete implications for Japanese organizations.

Privacy and Regulatory Compliance
Japan's revised Personal Information Protection Act and healthcare information guidelines impose strict controls on transmitting PII to external systems. Because AFM 3 Core Advanced processes text, audio, and images entirely on-device, use cases like voice input for medical records, summarizing legal documents, and real-time transcription of financial advisory sessions become feasible without the legal exposure of cloud-based processing.

Device Refresh Cycle Challenges
The 12GB RAM minimum is a real constraint. Many Japanese enterprises refreshed to iPhone 16 Pro in 2024-2025; AFM Core Advanced would require yet another upgrade to iPhone 17 Pro within 1-2 product generations. Any deployment plan should include a careful ROI calculation that weighs device procurement costs against productivity gains.

Japanese Language Support Timeline Uncertainty
Apple has not announced a timeline for Japanese language support for AFM 3 Core Advanced. Based on Apple's historical pattern of phasing language support months to a year after English launch, Japanese availability may not arrive until autumn 2026 to spring 2027. IT consulting proposals should account for this timeline uncertainty. See our AI Consulting service for tailored assessment.

What Could Not Be Confirmed from Official Sources

As of the publication date of this article (June 10, 2026), the following information has not been confirmed in official Apple documentation:

- Direct benchmark comparisons between AFM 3 Core Advanced and Gemma 4 12B / LFM2.5 / Phi-5
- A specific timeline for Japanese language support for AFM 3 Core Advanced
- Detailed architectural specifications for AFM 3 Cloud Pro
- A phased release schedule for Core Advanced features across macOS 16 and iPadOS 20
- An academic paper or detailed technical document covering IFP (Instruction-Following Pruning)

These gaps are expected to be addressed progressively through Apple's release notes, ML Research blog posts, and WWDC session videos.

FAQ

Q1. Can iPhone 16 Pro run AFM 3 Core Advanced?
A. No. iPhone 16 Pro has 8GB of RAM, which falls short of the 12GB minimum requirement. iPhone 17 Pro / Pro Max or higher is required.

Q2. Does AFM 3 Core Advanced work completely offline?
A. Yes, as an on-device model it processes text, audio, and visual input on the device itself without a network connection. However, features that rely on AFM 3 Cloud Pro still require internet connectivity.

Q3. If Gemini outputs were used in training, can it really be called 'Apple designed'?
A. The model architecture and inference runtime are unambiguously Apple's own. Gemini frontier outputs served only as teacher signals in knowledge distillation — no Gemini code or weights are present in AFM. That said, the criticism that Apple's 'Fully Apple Designed' claim obscures this training relationship is a reasonable one.

Q4. Doesn't activating only 1-4B of 20B parameters hurt quality?
A. Sparse MoE specializes experts so that fewer active parameters can still produce high-quality outputs for a given task. Combined with IFP prioritizing instruction compliance, Apple claims Core Advanced significantly outperforms dense 3B models on instruction-following benchmarks.

Q5. Is the data leakage risk truly zero for medical or legal workloads?
A. When using AFM 3 Core Advanced for on-device processing, text and audio data are not transmitted externally. However, if an application itself sends data to a server, that depends on the app's implementation. Additionally, features routed through AFM 3 Cloud Pro involve cloud-side processing.

Q6. When will Japanese language support be available?
A. No official timeline has been announced. Based on Apple's historical language rollout patterns — typically months to a year after English launch — Japanese support for AFM 3 Core Advanced is most likely in autumn 2026 to spring 2027.

Q7. Can enterprises develop iOS apps that leverage AFM?
A. Public developer APIs for Apple Intelligence are not currently available. A future developer API is possible but not confirmed as of June 2026, with Apple's own system apps (Siri, Writing Tools) as the primary integration targets at this stage. Monitor the Forward Deployed Engineer guide for related developments.

Q8. Is Sparse MoE a good architecture for enterprise on-premises LLMs?
A. Yes — for organizations running on-premises LLM infrastructure, Sparse MoE can deliver quality comparable to much larger dense models at significantly lower compute cost. The Argent x Gemma 4 on-device agent case study offers relevant precedent.

Conclusion

AFM 3 Core Advanced represents Apple's architectural answer to the question of how to deliver frontier-class AI quality, speed, and privacy simultaneously on a smartphone. The combination of 20B Sparse MoE, IFP, and a prefill-lock DRAM strategy is the crystallization of a design philosophy that pushes server-grade reasoning onto a device that fits in your pocket.

For Japanese enterprises, however, the practical calculus is more nuanced. The 12GB RAM requirement makes existing iPhone 16 fleets ineligible, the Japanese language rollout timeline is undefined, and the distinction between on-device and cloud features requires careful architectural planning.

When evaluating AFM 3 Core Advanced for enterprise deployment, we recommend mapping device refresh costs, Japanese language availability timelines, and the on-device vs. cloud feature split before committing to an investment plan. For a tailored assessment, please consult our AI Consulting service.

References

- Apple ML Research — Introducing the Third Generation of Apple Foundation Models
- Apple iPhone 17 Pro Technical Specifications
- MacRumors — Apple's New AI Contains No Gemini
- 9to5Mac — Craig Federighi Details Apple's Collaboration with Google for Siri AI in iOS 27
- MacStories — The Third Generation of Apple's Foundation Models and AFM Core Advanced
- MacObserver — Apple Confirms AFM Cloud Is Its Own AI Model Trained with Gemini Outputs
- CNBC — Apple, Google, NVIDIA AI Chip Partnership
- Notebookcheck — Apple A19 Pro Processor Benchmarks and Specs
- Appleosophy — Apple's Most Powerful On-Device AI Model Will Only Be Available to Limited Devices
- Related: Gemma 4 12B Encoder-Free Multimodal
- Related: Gemma 4 Benchmark Performance
- Related: Liquid AI LFM2.5 Japanese Models
- Related: Argent x Gemma 4 On-Device Agent for iOS
- Related: Forward Deployed Engineer (FDE) Guide
- Obright AI Consulting

Feel free to contact us