AI Research & Engineering: RecSys, Search, NLP, Generative AI and Beyond

Tag DeepSeek Native Sparse Attention

NSA 详解:Compression + Selection + Sliding Window,从粗到精的层级稀疏 attention(DeepSeek 系列第 13 篇)

NSA (Native Sparse Attention, arXiv:2502.11089) 详解:ACL 2025 Best Paper。三分支稀疏注意力——Compression(粗粒度块压缩)+ Selection(Top-K 块精细 attention)+ Sliding Window(局部窗口)+ learned gating。Hardware-aligned + Natively trainable 设计让 64K 序列 decoding 速度提升 11.6×,长上下文 benchmark 上反而比 dense full attention 略好。NSA 是 V3.2 / V4 把上下文扩到百万 token 的核心架构。

Loading

© 2026 Yudong‘s Blog — Powered by WordPress

Theme by Anders NorenUp ↑