Category ASR

Automatic Speech Recognition

Streaming ASR 实战：Chunked Attention、KV Cache、Look-ahead 全解析（流式语音识别架构与源码详解）

Streaming ASR Explained: turn an offline ASR model into a voice agent — chunked attention, KV cache, look-ahead, dynamic chunk training

本文从工程视角彻底拆透流式 ASR：算法延迟 vs 计算延迟、流式三大天敌、Chunked Attention 与 Dynamic Chunk Training、KV Cache、Causal Conv、Whisper 流式化、RNN-T 天然流式、VAD + Endpoint 工业架构、Moshi/GPT-4o Realtime 端到端语音 LLM。CTC、Whisper、RNN-T、Conformer、SSL 系列的姊妹篇。

2025-08-17 1

Wav2Vec 2.0 / HuBERT / WavLM 三部曲：语音自监督预训练演进史（Self-Supervised Speech Pre-Training Explained）

The Self-Supervised Speech Trilogy: Wav2Vec 2.0, HuBERT, WavLM — how BERT-style pretraining came to speech

本文是一篇详细的语音自监督预训练 (SSL) 拆解：从 Wav2Vec 2.0 的对比学习 + 量化、HuBERT 的 k-means 伪标签 + 掩码预测，到 WavLM 的话语混合与门控相对位置偏置。配 PyTorch 微调代码、SUPERB 性能表，与 Whisper / Conformer / RNN-T 系列互链。

2025-03-15 1

Conformer Explained：Convolution-augmented Transformer 如何统治 ASR Backbone（架构与源码详解）

Conformer Explained: convolution-augmented Transformer — the de-facto encoder for modern speech recognition

本文是一篇详细的 Conformer 技术拆解：从纯 Transformer 在 ASR 上的局限、Macaron 双 FFN + Convolution Module 的设计哲学，到完整 PyTorch 实现、S/M/L 三种官方配置和 Squeezeformer / Zipformer 变体演进。CTC、Whisper、RNN-T 系列的姊妹篇。

2024-12-21 1

Whisper Explained：端到端语音识别新范式深度解读

Whisper Explained: OpenAI's end-to-end speech recognition model — architecture, algorithm, and source code deep dive

本文是一篇详细的 Whisper 技术拆解：从整体架构、音频预处理、Multitask 训练范式，到 PyTorch 源码逐段精读和性能 / 生态对比，并配 SVG 原理图、参数表与可运行代码。

2024-11-16 1

RNN-Transducer Explained：CTC 之外的另一条端到端 ASR 路径（RNN-T 算法与源码详解）

RNN-Transducer Explained: algorithm, lattice, and source code of the de-facto streaming ASR loss

本文是一篇详细的 RNN-Transducer 技术拆解：从背景动机、三网络架构、T×(U+1) 对齐格栅、前向后向 Loss 推导，到 PyTorch 源码精读、现代变体演进和工业部署的实际工程坑。CTC 系列与 Whisper Explained 的姊妹篇。

2023-08-13 2

CTC Algorithm Explained Part 3：CTC Demo by Speech Recognition（CTC算法详解之语音识别实战篇）

CTC Algorithm Explained PART 3 - Speech Recognition Demo

转载本文请注明出处：https://yudonglee.me/ctc-expla… Continue Reading →

2020-04-10 1

CTC Algorithm Explained Part 2：Decoding the Network（CTC算法详解之解码篇）

CTC Algorithm Explained PART 2 - Decoding the Network

转载本文请注明出处：https://yudonglee.me/ctc-expla… Continue Reading →

2019-03-31 26

CTC Algorithm Explained Part 1：Training the Network（CTC算法详解之训练篇）

CTC Algorithm Explained PART 1 - Training the Network

转载本文请注明出处：https://yudonglee.me/ctc-expla… Continue Reading →

2018-07-20 81

© 2026 Yudong‘s Blog — Powered by WordPress

Theme by Anders Noren — Up ↑