AI Research & Engineering: RecSys, Search, NLP, Generative AI and Beyond

Tag DeepSeek-GRM

DeepSeek-GRM 详解:从 scalar 到 generative,reward modeling 的范式跃迁(DeepSeek 系列第 16 篇)

DeepSeek-GRM (arXiv:2504.02495) 详解:V4 之前的关键 reward modeling 准备工作。提出 SPCT (Self-Principled Critique Tuning) + Pointwise GRM 架构 + Meta RM 投票,让 reward model 本身具备推理时 scaling 能力。在 RewardBench 上达到 89.6 分(K=32 推理采样),超过 GPT-4o judge 与 Claude-3.5-Sonnet judge。同时简略提及 Prover-V2、R1-0528、OCR 等 V4 prelude 工作。

Loading

© 2026 Yudong‘s Blog — Powered by WordPress

Theme by Anders NorenUp ↑