Tag DeepSeek-V2

DeepSeek-V2 详解：低秩 latent + decoupled RoPE，重新定义大模型 attention 的经济性（DeepSeek 系列第 6 篇）

DeepSeek-V2（arXiv:2405.04434）详解：首次提出 MLA (Multi-head Latent Attention) 通过低秩 latent 压缩 + decoupled RoPE + matrix absorption 把 KV cache 砍到 MHA 的 1.76%；236B 总参 / 21B 激活的 MoE 实现 5.76× 吞吐于 67B Dense。

2026-02-26 1

Theme by Anders Noren — Up ↑

AI Research & Engineering: RecSys, Search, NLP, Generative AI and Beyond

Tag DeepSeek-V2

DeepSeek-V2 详解：低秩 latent + decoupled RoPE，重新定义大模型 attention 的经济性（DeepSeek 系列第 6 篇）