DeepSeek Native Sparse Attention

NSA 详解：Compression + Selection + Sliding Window，从粗到精的层级稀疏 attention（DeepSeek 系列第 13 篇）

NSA (Native Sparse Attention, arXiv:2502.11089) 详解：ACL 2025 Best Paper。三分支稀疏注意力——Compression（粗粒度块压缩）+ Selection（Top-K 块精细 attention）+ Sliding Window（局部窗口）+ learned gating。Hardware-aligned + Natively trainable 设计让 64K 序列 decoding 速度提升 11.6×，长上下文 benchmark 上反而比 dense full attention 略好。NSA 是 V3.2 / V4 把上下文扩到百万 token 的核心架构。

2026-04-18 1

AI Research & Engineering: RecSys, Search, NLP, Generative AI and Beyond

Tag DeepSeek Native Sparse Attention

NSA 详解：Compression + Selection + Sliding Window，从粗到精的层级稀疏 attention（DeepSeek 系列第 13 篇）