AI Research & Engineering: RecSys, Search, NLP, Generative AI and Beyond

Tag DeepSeek-V2

DeepSeek-V2 详解:低秩 latent + decoupled RoPE,重新定义大模型 attention 的经济性(DeepSeek 系列第 6 篇)

DeepSeek-V2(arXiv:2405.04434)详解:首次提出 MLA (Multi-head Latent Attention) 通过低秩 latent 压缩 + decoupled RoPE + matrix absorption 把 KV cache 砍到 MHA 的 1.76%;236B 总参 / 21B 激活的 MoE 实现 5.76× 吞吐于 67B Dense。

Loading

© 2026 Yudong‘s Blog — Powered by WordPress

Theme by Anders NorenUp ↑