DeepSeek-R1 - Yudong‘s Blog

DeepSeek-R1 详解：从 GRPO 到 long-CoT 涌现，开源 reasoning 的新范式（DeepSeek 系列第 12 篇）

DeepSeek-R1 (arXiv:2501.12948) 详解：(1) R1-Zero 用纯 RL 从 V3-Base 训出 reasoning 能力——首次实证证明无需 SFT 的可行性，并观察到 Aha Moment 等元认知行为；(2) R1 用四阶段 pipeline（cold-start SFT → reasoning RL → general SFT → all-scenario RL）输出 production-quality 模型，在 AIME 2024 / MATH-500 / Codeforces 上对齐 OpenAI o1。同时发布 1.5B-70B 的 Distill 系列。引爆 2025-01-27 DeepSeek Moment。

2026-04-15 1

AI Research & Engineering: RecSys, Search, NLP, Generative AI and Beyond

Tag DeepSeek-R1

DeepSeek-R1 详解：从 GRPO 到 long-CoT 涌现，开源 reasoning 的新范式（DeepSeek 系列第 12 篇）