NSA (Native Sparse Attention, arXiv:2502.11089) 详解:ACL 2025 Best Paper。三分支稀疏注意力——Compression(粗粒度块压缩)+ Selection(Top-K 块精细 attention)+ Sliding Window(局部窗口)+ learned gating。Hardware-aligned + Natively trainable 设计让 64K 序列 decoding 速度提升 11.6×,长上下文 benchmark 上反而比 dense full attention 略好。NSA 是 V3.2 / V4 把上下文扩到百万 token 的核心架构。
![]()
© 2026 Yudong‘s Blog — Powered by WordPress
Theme by Anders Noren — Up ↑