Speech LLM - Yudong‘s Blog

Speech LLM 综述：从 Whisper 到 Moshi / Mini-Omni / Qwen2-Audio（语音 AI 朝 LLM 范式合流的顶层梳理）

Speech LLM Explained: top-level survey of speech AI converging into LLM paradigm — Qwen2-Audio, VALL-E, Moshi, GPT-4o and beyond

本文是「语音技术深度系列」的顶层综述：Speech LLM 三大范式（Speech-In / Speech-Out / End-to-End Speech-to-Speech）分类法、Qwen2-Audio / VALL-E / Moshi / GPT-4o 等核心模型对比、开源 vs 闭源生态、2022-2025 完整时间线、收敛趋势与未来预测。语音技术深度系列第 14 篇（阶段性收官篇）。

2025-12-16 1

AI Research & Engineering: RecSys, Search, NLP, Generative AI and Beyond

Tag Speech LLM

Speech LLM 综述：从 Whisper 到 Moshi / Mini-Omni / Qwen2-Audio（语音 AI 朝 LLM 范式合流的顶层梳理）