Skip to content

Conversation

@OrelSokolov
Copy link
Contributor

Fix --no-timestamps flag behavior

Summary

This PR fixes the --no-timestamps flag to only affect output formatting without changing transcription quality. Previously, the flag would alter the decoding process, resulting in different and lower quality transcription text.

Problem

When using --no-timestamps flag:

  • ❌ Transcription text differed from the same audio without the flag
  • ❌ Lower transcription quality
  • ❌ Model would sometimes loop/repeat phrases infinitely
  • ❌ Flag modified the decoding process by:
    • Adding <|notimestamps|> token to the prompt
    • Suppressing all timestamp tokens during decoding

Solution

The fix ensures --no-timestamps only controls output formatting:

  • ✅ Model always uses timestamp logic during decoding (for better quality)
  • ✅ Transcription text is identical regardless of the flag
  • ✅ Added repetition detection to prevent infinite loops
  • ✅ Improved segment handling to prevent early termination

Changes

Core Fixes (src/whisper.cpp)

  1. Removed <|notimestamps|> token injection - The model no longer adds this token to prompts, allowing proper timestamp-based segmentation
  2. Removed timestamp token suppression - Timestamp tokens are no longer suppressed from logits, enabling the model to segment properly
  3. Added repetition detection - Detects and prevents infinite loops where the model repeats the same phrase
  4. Improved error handling - Better buffer allocation error messages

Tests (tests/)

  • Added test-no-timestamps.cpp - Automated test that verifies transcription quality is identical with/without the flag
  • Added TEST_NO_TIMESTAMPS.md - Test documentation
  • Updated tests/CMakeLists.txt - Test integration

Documentation

  • Added NO_TIMESTAMPS_FIX.md - Detailed explanation of the problem and solution

Testing

Automated Test

cd build
ctest -R test-no-timestamps -V

Test verifies that:

  • Transcription with timestamps enabled produces text: "And so my fellow Americans..."
  • Transcription with --no-timestamps produces identical text
  • ✅ Test passes (9.87s)

Manual Testing

# Both commands now produce identical transcription quality:
./whisper-cli -m model.bin -f audio.wav                    # With timestamps in output
./whisper-cli -m model.bin -f audio.wav --no-timestamps    # Without timestamps in output

Backward Compatibility

Fully backward compatible

  • All existing tests pass
  • CLI interface unchanged
  • API unchanged
  • Only improvement: better transcription quality with --no-timestamps

Impact

  • Users: Better transcription quality when using --no-timestamps
  • Developers: Clear separation between output formatting and decoding logic
  • Maintenance: Automated test prevents regression

Checklist

  • Code follows project style guidelines
  • Self-review completed
  • Comments added for complex logic
  • Documentation updated
  • Tests added and passing
  • No new warnings introduced
  • Backward compatible

@OrelSokolov OrelSokolov changed the title Fix --no-timestamps flag behavior Critical bug: Fix --no-timestamps flag behavior Nov 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant