Commit ab381e3
stable-timestamps: per-segment VAD decoding for subtitle-quality timestamps
Replace concatenate-decode-remap pipeline with per-segment VAD decoding,
matching how stable-ts/faster-whisper works. Each VAD speech segment is
decoded independently and timestamps are offset by the segment's original
start time — no mapping table or interpolation needed.
Results on 5-min synthetic audio (46 utterances, 7x 20s pauses):
pct_words_overlap: 0.89% (vs 5.7% stable-ts, 22.6% previous v2)
n_words_overlap: 5 (vs 22 stable-ts, 144 previous v2)
Wall time: 22.8s (vs 43.2s stable-ts — 1.9x faster via Metal)
Code removed:
- whisper_vad() concatenation + mapping table building
- vad_time_mapping struct, vad_mapping_table, has_vad_segments from state
- map_processed_to_original_time() in whisper.cpp
- whisper_stable_map_processed_to_original() in whisper-stable.cpp
- mapping params from whisper_stable_snap_segments()
Code added:
- whisper_full_vad_segments(): ~70-line per-segment decode loop
- whisper_full_parallel() with VAD delegates to whisper_full()
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>1 parent 9234e47 commit ab381e3
File tree
43 files changed
+70373
-930
lines changed- examples/cli
- plans/stable-timestamps-v2
- notes
- out
- src
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
43 files changed
+70373
-930
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
295 | 295 | | |
296 | 296 | | |
297 | 297 | | |
298 | | - | |
| 298 | + | |
299 | 299 | | |
300 | 300 | | |
301 | 301 | | |
| |||
1005 | 1005 | | |
1006 | 1006 | | |
1007 | 1007 | | |
1008 | | - | |
1009 | | - | |
1010 | | - | |
1011 | | - | |
1012 | | - | |
1013 | | - | |
1014 | 1008 | | |
1015 | 1009 | | |
1016 | 1010 | | |
| |||
Binary file not shown.
0 commit comments