Alibaba Qwen3.6-27B Dense Model Beats 397B Predecessor

Alibaba on Wednesday released Qwen3.6-27B, a dense 27-billion-parameter open-weight model under Apache 2.0 that tops its own 397B-parameter predecessor on every major agentic coding benchmark. The model scores 77.2 on SWE-bench Verified, 59.3 on Terminal-Bench 2.0, and 1487 on Alibaba's internal QwenWebBench, built on a hybrid Gated DeltaNet and Gated Attention layout across 64 layers. Weights went live on Hugging Face and ModelScope, with a 4-bit quantized version running in roughly 17 gigabytes of memory.

Key Takeaways

Alibaba released Qwen3.6-27B on Wednesday, a dense 27B open-weight coding model under Apache 2.0 with weights on Hugging Face and ModelScope.
The 27B dense tops Alibaba's own 397B-parameter Qwen3.5 MoE on SWE-bench Verified, Terminal-Bench 2.0, SkillsBench, and QwenWebBench.
Terminal-Bench 2.0 score of 59.3 matches Claude 4.5 Opus exactly, narrowing the closed-versus-open gap on coding agents.
The 4-bit Q4_K_M quant runs at 25.57 tokens per second on a personal machine, fitting a 32-gigabyte MacBook Pro without thrashing.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

The numbers Alibaba is pushing

Qwen3.6-27B ships with benchmark gains that read like a correction rather than an increment. SWE-bench Verified lifts to 77.2 from the 75.0 posted by Qwen3.5-27B, and from the 76.2 posted by the much larger Qwen3.5-397B-A17B Mixture-of-Experts model. Terminal-Bench 2.0 reaches 59.3, matching Claude 4.5 Opus exactly under the same three-hour timeout harness. SkillsBench Avg5 jumps to 48.2 from 27.2 on the previous generation, a 77 percent relative swing.

The Claude comparison is the one Alibaba wants you to notice. On SWE-bench Verified, Claude 4.5 Opus still leads at 80.9. On Terminal-Bench 2.0, the 27B dense pulls even. On GPQA Diamond it lands at 87.8, a hair ahead of Opus at 87.0. The gap between a closed frontier model and a downloadable 55-gigabyte file has narrowed to a handful of points across the benchmarks that matter for coding agents. That is the tell.

Why a 27B dense beats a 397B MoE

The architecture is doing most of the work. Each block here starts with three Gated DeltaNet sublayers and caps off with one Gated Attention layer. Sixteen blocks, stacked. DeltaNet is a linear-attention trick: it scales O(n) with sequence length, not O(n²). That matters for long repositories, where quadratic attention chokes on memory. The standard attention layers sit on an asymmetric 24-query, 4-key/value setup. Smaller KV cache, lower VRAM at serve time.

And then there is Multi-Token Prediction. Qwen trained the model with multi-step MTP, enabling speculative decoding at inference. Candidate tokens get verified in parallel. Throughput climbs without quality loss. A dense 27B model running this architecture on a single high-end GPU can serve an agentic coding workload that previously required routing across a 397-billion-parameter MoE cluster. Alibaba also adds a "preserve_thinking" flag that keeps reasoning traces from prior turns in context, cutting redundant chain-of-thought generation inside multi-step agent loops.

Running it locally

The practical story lives downstream of the benchmarks. Simon Willison ran the Unsloth Q4_K_M GGUF build (16.8 gigabytes) on llama-server and clocked 25.57 tokens per second on a personal machine, generating 4,444 tokens in under three minutes. Quant footprints vary a lot. Q3 builds come in under 15 gigs. Q6 pushes past 25. Pick your trade-off. On a 32-gigabyte MacBook Pro the 4-bit build fits with room to spare, no thrashing, no swap.

Read between the benchmarks

Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.

No spam. Unsubscribe anytime.

Local inference still has honest limits. Developer reports on Qwen3.6-35B-A3B, the sibling MoE released six days earlier, describe the model handling utility functions and small refactors well, then getting shakier on long-range repository state. The 27B dense sits in the same neighborhood. For agentic loops demanding sustained context across a messy codebase, the frontier cloud models still hold the edge.

Alibaba's three-week release sprint

This is the fourth Qwen3.6 release in three weeks. Qwen3.6-Plus dropped April 2, followed by Qwen3.6-35B-A3B on April 16, the closed Qwen3.6-Max-Preview on April 20, and now the dense 27B on April 22. The cadence tells you something. Alibaba looks emboldened, and it is not betting on one architecture. The company is shipping one of each, MoE and dense, open and closed, hosted and API, and letting developers pick.

That matters against the broader backdrop. Earlier this month Alibaba released three closed-source models in three days, a signal the company is walling off the top tier for paid APIs. The 27B dense cuts the other way. Apache 2.0, commercial use permitted, no royalty. For teams building coding agents that can't route source code through a US vendor, the math just changed.

The benchmarks will face independent replication. The open weights are already on disk.

Frequently Asked Questions

What is Qwen3.6-27B?

Qwen3.6-27B is Alibaba's dense 27-billion-parameter open-weight coding model, released April 22, 2026, under an Apache 2.0 license. It uses a hybrid Gated DeltaNet plus Gated Attention architecture across 64 layers, supports text, image, and video inputs, and ships with both thinking and non-thinking modes in a single checkpoint. Weights are available on Hugging Face and ModelScope for self-hosting.

How does Qwen3.6-27B compare to Claude 4.5 Opus?

Qwen3.6-27B matches Claude 4.5 Opus at 59.3 on Terminal-Bench 2.0 and edges ahead on GPQA Diamond at 87.8 versus 87.0. Claude still leads on SWE-bench Verified (80.9 to 77.2) and SWE-bench Pro (57.1 to 53.5). The gap between Alibaba's downloadable open model and Anthropic's closed frontier runs only a handful of points across benchmarks that matter for coding agents.

Can Qwen3.6-27B run on consumer hardware?

Yes. The Unsloth Q4_K_M GGUF build weighs 16.8 gigabytes and fits comfortably in a 32-gigabyte MacBook Pro's unified memory. Simon Willison tested it on llama-server and measured 25.57 tokens per second generating 4,444 tokens in under three minutes. Q3 and Q6 builds trade size for quality in the 15-to-30 gigabyte range.

What is thinking preservation and why does it matter?

Thinking preservation is a Qwen3.6 feature that keeps the model's chain-of-thought reasoning traces from earlier conversation turns in context. Most language models discard prior reasoning and re-derive it each turn. For multi-step agent workflows, preserving those traces cuts redundant generation, lowers token consumption, and improves KV cache utilization. It is enabled via the preserve_thinking flag in the API.

How does the 27B dense compare to the Qwen3.6-35B-A3B MoE?

The dense 27B released April 22 outperforms the MoE 35B-A3B (released April 16) on SWE-bench Verified (77.2 vs 73.4), SWE-bench Pro (53.5 vs 49.5), Terminal-Bench 2.0 (59.3 vs 51.5), SkillsBench (48.2 vs 28.7), and QwenWebBench (1487 vs 1397). The MoE activates just 3 billion parameters per forward pass; the dense activates all 27 billion.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

AI News

Marcus Schuler

San Francisco

Editor-in-Chief and founder of Implicator.ai. Former ARD correspondent and senior broadcast journalist with 10+ years covering tech. Writes daily briefings on policy and market developments. Based in San Francisco. E-mail: editor@implicator.ai