Qwen3.6-35B-A3B Uncensored / Abliterated Deep Dive — 35B MoE / 3B Active / 262K Context / 3:1 Hybrid Linear+Softmax Attention / Native Text + Image + Video, 0/465 Refusal Rate; the Technique and Ethics of Community Uncensored Variants HauhauCS Aggressive, huihui-ai abliterated, wangzhang abliterated, prithivMLmods and Other Variants Distributed via Hugging Face and Ollama
Qwen3.6-35B-A3B-Uncensored / Abliterated is a family of community-produced derivatives of Alibaba's Qwen 3.6-35B-A3B (a 35B MoE with 3B active parameters, 262K context, and hybrid attention) with refusal behaviors surgically removed (HackerNoon overview / HauhauCS Aggressive / huihui-ai abliterated / wangzhang abliterated / prithivMLmods Aggressive).
Base model specs: 35B total / 3B active parameters (MoE, sparse experts), 40 layers, hybrid attention in a 3:1 ratio (linear + full softmax), native 262K-token context, and native text / image / video multimodal input. Alibaba positions it as a flagship of its open-weights strategy.
The abliteration technique: the refusal direction is removed with LoRA-based steering on attention and MLP projections. It layers on Expert-Granular Abliteration (EGA) (abliterating per-expert down_proj slices per layer) and MoE router suppression (deactivating safety experts at the router stage) — techniques adapted for the MoE architecture. HauhauCS reports 0 refusals across 465 test prompts. The philosophy: preserve 100% of the base Qwen 3.6-35B's capability, remove only refusal.
Available variants:
- HauhauCS-Aggressive (HF / Ollama): the most aggressive refusal removal
- huihui-ai Huihui-Qwen3.6-abliterated (HF / Ollama): from the established huihui-ai team
- wangzhang abliterated (HF)
- prithivMLmods Uncensored-Aggressive (HF)
Each ships with quantization variants (GGUF Q4 / Q5 / Q8 / FP16), spanning consumer GPUs (RTX 5090 32GB) up to H100 servers.
Ethical and legal considerations: abliterated models can produce content that Qwen would normally refuse (illegal drugs, offensive-security code, dangerous-material synthesis, etc.). Legitimate research, jailbreak-resistance testing, roleplay, and adult-content use cases exist, but enterprise and commercial adoption carries meaningful legal risk. Compliance with the EU AI Act (effective August 2026) and Japanese PPC guidance is also in question. The responsibility sits entirely with the user; Alibaba's Qwen team is not involved.
Positioning: alongside our Local LLM June 2026 Update, Kimi K2.7-Code, and Ornith-1.0, this is a case study showing that "safety-stripping techniques" scale into the MoE era in the open-weights ecosystem.
TL;DR — What Are Qwen3.6-35B Uncensored / Abliterated?
Qwen3.6-35B-A3B-Uncensored / Abliterated is a family of community derivatives of Alibaba's Qwen 3.6-35B-A3B (35B MoE / 3B active / 262K context / hybrid attention / multimodal) with refusal behaviors removed via LoRA + MoE-specific abliteration techniques.
Four takeaways:
1. HauhauCS / huihui-ai / wangzhang / prithivMLmods and others independently released variants; distributed via Hugging Face + Ollama
2. Technique: LoRA-based steering + Expert-Granular Abliteration (EGA) + MoE router suppression (MoE-specific)
3. HauhauCS reports 0 refusals across 465 test prompts while claiming 100% preservation of base capability
4. Legal / ethical risk is significant — enterprise adoption is not advisable; legitimate use is confined to research, jailbreak-resistance testing, adult creative writing, etc.
This column belongs to our open-weights landscape alongside Local LLM June 2026 Update, Kimi K2.7-Code, and Ornith-1.0.
The Base Model: Qwen 3.6-35B-A3B
Alibaba's Qwen 3.6 family is one of the most widely adopted open-weights model lines, and 35B-A3B is the mid-tier MoE.
| Item | Value |
|---|---|
| Total parameters | 35B |
| Active parameters | ~3B per token (sparse MoE) |
| Layers | 40 |
| Attention | Hybrid (linear + full softmax at a 3:1 ratio) |
| Context | 262,144 tokens (native, no ropescaling) |
| Modality | Native text / image / video |
| License | Qwen License (open weights with commercial conditions) |
Why hybrid attention matters: compared to all-softmax stacks, memory and compute go down while long-context retrieval quality holds. Long-document processing is a particular strength.
What Abliteration Is — Surgical Refusal Removal
Abliteration (a portmanteau of *ablate* + *erase*) is a technique introduced by Andy Arditi et al. in 2024. It identifies the internal direction that represents "refusal" in the model and modifies weights to cancel that direction — cheaper than fine-tuning while achieving similar refusal removal.
Why Qwen 3.6 needs a special approach: the original abliteration targeted dense models. Qwen 3.6 is MoE, so plain attention / MLP steering is not enough. The MoE-specific recipe layers on:
(1) LoRA-based steering — LoRA adapters on attention and MLP projections cancel the refusal direction (same idea as dense abliteration).
(2) Expert-Granular Abliteration (EGA) — abliteration is applied per expert down_proj slice, per layer. Since individual experts in the MoE may contribute to refusals independently, they are treated separately.
(3) MoE router suppression — at the router (the gate that picks experts), "safety experts" are made less likely to be selected — a structural way to deactivate safety-focused experts.
With those three layers, MoE models reach dense-level refusal removal rates (HauhauCS reports 0/465 = 0%) — that's the community's claim.
Variants at a Glance
| Variant | Author | Notes | Distribution |
|---|---|---|---|
| HauhauCS-Aggressive | HauhauCS | Most aggressive; claims 0/465 refusals | HF / Ollama |
| Huihui-Qwen3.6-abliterated | huihui-ai | Long-established abliteration team, with many prior releases | HF / Ollama |
| wangzhang abliterated | wangzhang | Simpler abliteration | HF |
| prithivMLmods Uncensored-Aggressive | prithivMLmods | Popular HF creator | HF |
| (Previous generation) Qwen3.5-35B-A3B Uncensored | HauhauCS | Sibling for the earlier Qwen 3.5 | HF |
How to choose:
- HauhauCS-Aggressive — maximum refusal removal, medium quality-degradation risk
- huihui-ai abliterated — better quality-versus-refusal balance, most established team
- wangzhang / prithivMLmods — second opinions
Quantization and Hardware Requirements
Each variant ships GGUF quantizations on Hugging Face / Ollama, running on a wide range of hardware.
| Quantization | Approx file size | Recommended GPU / VRAM |
|---|---|---|
| FP16 (full precision) | ~70 GB | H100 80GB / A100 80GB |
| Q8_0 | ~35 GB | 2× RTX 5090 32GB or A100 40GB |
| Q5_K_M | ~24 GB | Single RTX 5090 32GB |
| Q4_K_M | ~20 GB | RTX 4090 24GB, RTX 5090 |
| Q3_K_M | ~16 GB | RTX 4080 16GB, Mac M3/M4 Max 32GB |
| Q2_K | ~12 GB | Heavy quality loss — experiments only |
Ollama one-liners:
ollama run huihui_ai/Qwen3.6-abliterated
# or
ollama run fredrezones55/Qwen3.6-35B-A3B-Uncensored-HauhauCS-AggressiveBecause only ~3B parameters are active per token in the MoE, inference is fast for the VRAM footprint — noticeably faster than a dense 30B and closer to a dense 7B in latency. Local execution is genuinely practical.
Legitimate Use Cases (Within Legal Scope)
Abliterated models have real, defensible uses:
(1) Jailbreak-resistance research: security researchers use already-abliterated models as a control when studying LLM refusal mechanisms — "what would the model say if refusal were absent?"
(2) Academic research: alignment research, AI-safety work, and analysis of internal refusal representations.
(3) Adult fiction and roleplay: publishing and entertainment use cases where over-eager safety filters block creative writing — especially adult-fiction platforms.
(4) Deep medical / legal queries: when a normal LLM says "consult a doctor / lawyer" but the user is themselves a specialist doing knowledge checks, with responsibility remaining with them.
(5) Analysis of historically or ethically taboo subjects: scholarly study of war-crimes histories, ideologies, or ethical gray zones that default LLMs refuse to discuss.
Legal and Ethical Risks (Enterprise Adoption Not Advised)
Abliterated models can produce content the base model would refuse, including:
- Illegal-drug synthesis procedures
- Cyberattack code (exploits, malware)
- Dangerous-materials / explosives synthesis
- Child-exploitation content (a serious crime)
- Hate speech and discriminatory content
- Impersonation and fraud text
From an enterprise perspective:
(1) Legal liability: responsibility for output sits entirely with the user. Alibaba / the Qwen team and the abliteration authors are not accountable. If your company outputs harmful content from an abliterated model, civil or criminal liability can fall on your company.
(2) EU AI Act compliance (effective August 2026): as detailed in our Local LLM June 2026 Update, GPAI obligations kick in and systemic-risk models (> 10^25 FLOPs) get the full obligation set. Building products on abliterated models likely triggers additional safety-mitigation obligations.
(3) Japan's PPC / METI guidelines: as covered in our Kimi K2.7-Code column, Japan's PPC and METI AI guidelines press hard on safety and transparency — the opposite direction of abliterated models.
(4) Qwen license interpretation: Qwen 3.6's open-weights license permits derivatives and redistribution but also explicitly prohibits certain harmful use cases. If abliteration is read as "aiding harmful use," that's a license-violation risk.
Bottom line: enterprise / production adoption is strongly not advised. Restrict use to research, personal exploration, or narrowly legitimate legal use cases.
Technical Limits and Quality Degradation
Abliteration is not free:
(1) Quality degradation: removing the refusal direction subtly affects instruction-following and reasoning. Aggressive variants degrade more.
(2) Unintended behavior shifts: it's not only refusal that changes — personality, tone, and helpfulness can shift too. Unpredictable in production.
(3) Residual refusals: "0/465" is against a specific test set — real-world prompts sometimes still refuse.
(4) MoE-specific quirks: EGA + router suppression are not equally effective across all experts. Certain tasks produce unexpected outputs.
(5) Security: abliterated models are also easier targets for subsequent malicious fine-tuning. Third-party quantized copies can, in principle, carry backdoors.
Community Ecosystem
The abliteration scene is very active on Hugging Face.
Notable abliteration authors:
- huihui-ai — the longest-running abliteration specialists, with multiple generations of Qwen / Llama / Mistral abliterated releases
- HauhauCS — champions of MoE-specific techniques (EGA / router suppression)
- prithivMLmods — popular HF community member releasing uncensored derivatives across many models
- wangzhang, mradermacher, bartowski — quantization and uncensored providers
Distribution channels:
- Hugging Face Hub — originals plus quantizations
- Ollama — one-command CLI runs
- LM Studio — GUI model management
- llama.cpp / GGUF — usable across many frontends
Positioning — A Side of the Open-Weights Reality
The existence of abliterated / uncensored models is a structural fact about open-weights LLMs:
1. Once weights are open, safety alignment can be stripped after the fact
2. Abliteration reaches MoE — a technique originally for dense models has been adapted
3. Legal and ethical responsibility is fully on the user — original authors (Alibaba) are not involved
4. A visible gap between industry safety discourse and open-weights reality — as Anthropic's Claude Fable 5 export-control episode shows regulators tightening closed-model safety, the open side simultaneously develops abliteration
This column is not a recommendation to use abliterated models. It documents an existing phenomenon in the industry. Whether use is appropriate depends on jurisdiction, use case, and organizational policy.
Related services from us — AI consulting, software development, and OpenClaw setup. For help designing safe enterprise use of open-weights LLMs, alignment verification, or compliance strategy, get in touch.
Bottom Line
Qwen3.6-35B-A3B-Uncensored / Abliterated is a family of community derivatives of Alibaba's Qwen 3.6-35B MoE with refusal behavior removed via a three-stage recipe: LoRA + Expert-Granular Abliteration + MoE router suppression. Variants from HauhauCS, huihui-ai, wangzhang, and prithivMLmods are distributed via Hugging Face and Ollama with quantization options that fit even a single RTX 5090. HauhauCS reports 0 refusals across 465 prompts while claiming 100% preservation of base capability.
Three enduring impacts:
1. Abliteration reaches the MoE era — techniques originally for dense models have been extended with EGA and router suppression
2. The safety reality of open-weights LLMs — safety alignment applied by the original publisher can be stripped after release
3. Legitimate and illegitimate uses coexist — research, adult creative writing, and jailbreak-resistance testing are defensible, but enterprise deployment carries meaningful legal risk
Caveats: enterprise / commercial adoption is not advised, compliance concerns with the EU AI Act and Japanese PPC / METI guidance, Qwen license interpretation risk, quality degradation, potential backdoors in third-party quantizations, and unintended behavior shifts.
References
Hugging Face (main variants):
- HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive
- huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated
- wangzhang/Qwen3.6-35B-A3B-abliterated
- prithivMLmods/Qwen3.6-35B-A3B-Uncensored-Aggressive
- (Previous gen) HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive
Ollama:
- fredrezones55/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive
- huihui_ai/Qwen3.6-abliterated
Coverage:
- HackerNoon — Qwen3.6-35B-A3B Uncensored: A 35B MoE Model With 262K Context
Related columns:
- Local LLM June 2026 Update
- Kimi K2.7-Code
- Ornith-1.0
- Claude Fable 5 returns
- Claude Sonnet 5 release
Feel free to contact us
Contact Us