Alibaba Releases Qwen3 with Native Hybrid Reasoning

Summary

Alibaba released Qwen3, a family of eight open-weight models — six dense ranging from 0.6B to 32B, and two mixture-of-experts models at 30B-A3B and 235B-A22B — all under Apache 2.0. The family introduced a native hybrid thinking mode that allows models to switch between extended chain-of-thought reasoning and fast non-reasoning responses at inference time, the first open-source model family to offer this capability. The flagship 235B-A22B model was pretrained on 36 trillion tokens.

What Happened

On April 28, 2025, the Qwen team released the full Qwen3 family simultaneously across Hugging Face. The eight models covered a wide deployment spectrum: the 0.6B and 1.7B dense models target edge devices, while the 235B-A22B MoE was designed for high-capability server deployment activating 22 billion parameters per forward pass.

The defining architectural feature was the hybrid thinking switch: every Qwen3 model could operate in "thinking" mode — generating an extended scratchpad before answering — or "non-thinking" mode for direct fast responses. The switch was controllable at inference time via a system prompt or API parameter, without requiring separate model checkpoints. This contrasted with contemporaneous approaches that required separate "reasoning" and "non-reasoning" model variants.

The 235B-A22B flagship was trained on 36 trillion tokens, making it one of the most data-intensive open-weight pretrains to date. All models used Apache 2.0 licensing with no restrictions on commercial use, distillation, or redistribution. The technical report published in May detailed the training curriculum, including multiple stages that progressively incorporated long-context data, code, and mathematical reasoning material.

Why It Matters

Qwen3 collapsed the distinction between reasoning and non-reasoning model tiers into a single deployable artifact. Until this release, accessing extended chain-of-thought reasoning required either a dedicated reasoning model or a separate API endpoint — both often with higher latency and cost. By baking the mode switch into a unified model, Qwen3 reduced deployment complexity for developers who needed both fast responses and deep reasoning from the same system.

The release also demonstrated that Apache 2.0 open-weight models could remain competitive with frontier closed APIs on reasoning benchmarks. For the open-vs-closed debate, Qwen3 represented further evidence that capability differentiation — once considered a durable moat for proprietary labs — was eroding rapidly as open training runs scaled.

§ How to read the metadata

Landmark: Fundamentally alters the trajectory; 2–5 per year.
Major: Meaningfully shifts the landscape; 2–4 per month.
Notable: Worth documenting; significance can be upgraded later.
Confidence: High = primary sources corroborate. Medium = credible secondary only. Low = provisional. Disputed = credible sources disagree.
Contestation: Uncontested = no formal challenge. Contested = at least one challenge open. Superseded = replaced by a later entry. Unresolved = dispute still open.

References

Qwen3: Think Deeper, Act Faster (Mon Apr 28 2025 00:00:00 GMT+0000 (Coordinated Universal Time)) official

Qwen3 Technical Report (Wed May 14 2025 00:00:00 GMT+0000 (Coordinated Universal Time)) primary document

Qwen3-235B-A22B Model Card (Mon Apr 28 2025 00:00:00 GMT+0000 (Coordinated Universal Time)) primary document

Summary

What Happened

Why It Matters

References

See also