AI Research & Engineering: RecSys, Search, NLP, Generative AI and Beyond

Tag DeepSeekMoE

DeepSeekMoE 详解:Fine-grained Expert 与 Shared Expert 双柱设计的奠基之作(DeepSeek 系列第 2 篇)

转载本文请注明出处:https://yudonglee.me/deepseekmoe-explained/ | 作者:yudonglee 本文是 DeepSeek 论文专题系列的第 2 篇,详解 DeepSeek 公司 2024 年 1 月发表的 DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models (arXiv:2401.06066)。这篇论文是 DeepSeek 后续所有 MoE 模型(V2 / V3 / V4 / Coder-V2)的架构起点,提出了两个互相补充的设计:(1) Fine-grained Expert Segmentation——把一个大专家切成 mN 个小专家,让路由组合空间从 暴涨到… Continue Reading →

© 2026 Yudong‘s Blog — Powered by WordPress

Theme by Anders NorenUp ↑