Qwen-3.5 - Yudong‘s Blog

Qwen-3.5 详解：Hybrid Linear Attention 登场——把 attention 从 O(N²) 拆成 O(N) + 1/4 O(N²)

Qwen 论文专题系列第六篇——2026-02-16 发布的 Qwen-3.5 是 Qwen 在 attention 演化上的第三次原创跳跃。本文逐项拆解四大要点：(1) Gated DeltaNet 线性 attention（delta rule + exponential gating + Causal Conv1D + L2-norm Q/K 四组件融合）；(2) Hybrid 3:1 配比（3 层 GDN + 1 层 Full Attention，性能仅损失 3% 但 decode 吞吐提升 8.6-19×）；(3) 极致 Sparse MoE（397B 总 / 17B 激活 = 4.3% 激活率，比 Qwen-3 再砍半）；(4) Native multimodal 训练 + 262K native context + 201 语言。这是 2026 年开源 LLM 在 attention 架构上最大的一次跃迁。

2026-05-28 0

AI Research & Engineering: RecSys, Search, NLP, Generative AI and Beyond

Tag Qwen-3.5

Qwen-3.5 详解：Hybrid Linear Attention 登场——把 attention 从 O(N²) 拆成 O(N) + 1/4 O(N²)