Add word-level timestamps (#43)

* Add word-level timestamps * Fix alignment between the segments and the lists of words * Fix truncated words list when the replacement character is decoded * Check for empty text_tokens * Add usage example in the readme * Update ctranslate2 to 3.9 * Skip empty segment * Set typing for the new methods
2023-03-15 15:02:28 +01:00
parent b41fd05948
commit 8bd013ea99
4 changed files with 314 additions and 8 deletions
--- a/README.md
+++ b/README.md
@@ -99,6 +99,16 @@ for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
 ```

+#### Word-level timestamps
+
+```python
+segments, _ = model.transcribe("audio.mp3", word_timestamps=True)
+
+for segment in segments:
+    for word in segment.words:
+        print("[%.2fs -> %.2fs] %s" % (word.start, word.end, word.word))
+```
+
 See more model and transcription options in the [`WhisperModel`](https://github.com/guillaumekln/faster-whisper/blob/master/faster_whisper/transcribe.py) class implementation.

 ## Comparing performance against other implementations