DeepSeekMoE - Yudong‘s Blog

Auxiliary-Loss-Free Load Balancing 详解：用 bias 替代 balance loss，消除 MoE 训练的隐性梯度污染（DeepSeek 系列第 9 篇）

Auxiliary-Loss-Free (arXiv:2408.15664) 详解：用 expert-wise bias + 规则式更新替代传统 auxiliary balance loss，消除干扰梯度对主任务训练的污染。balance 与 specialization 通过 bias 与 affinity 解耦。V3 训练全面采纳。

2026-03-22 1

ESFT 详解：只更新任务相关 expert，让 MoE 模型的 fine-tuning 成本降低 90%（DeepSeek 系列第 8 篇）

ESFT (Expert-Specialized Fine-Tuning, arXiv:2407.01906) 详解：基于 MoE 模型在下游任务上 expert 激活的天然稀疏性，只 fine-tune 任务相关的少数 expert，5-25% 可训练参数即可匹配 Full FT 性能，明显优于 LoRA。

2026-03-15 1

DeepSeekMoE 详解：Fine-grained Expert 与 Shared Expert 双柱设计的奠基之作（DeepSeek 系列第 2 篇）

转载本文请注明出处：https://yudonglee.me/deepseekm… Continue Reading →

2026-01-24 1

AI Research & Engineering: RecSys, Search, NLP, Generative AI and Beyond

Tag DeepSeekMoE

Auxiliary-Loss-Free Load Balancing 详解：用 bias 替代 balance loss，消除 MoE 训练的隐性梯度污染（DeepSeek 系列第 9 篇）

ESFT 详解：只更新任务相关 expert，让 MoE 模型的 fine-tuning 成本降低 90%（DeepSeek 系列第 8 篇）

DeepSeekMoE 详解：Fine-grained Expert 与 Shared Expert 双柱设计的奠基之作（DeepSeek 系列第 2 篇）