Add word-level timestamps (#43)

* Add word-level timestamps

* Fix alignment between the segments and the lists of words

* Fix truncated words list when the replacement character is decoded

* Check for empty text_tokens

* Add usage example in the readme

* Update ctranslate2 to 3.9

* Skip empty segment

* Set typing for the new methods
This commit is contained in:
Guillaume Klein
2023-03-15 15:02:28 +01:00
committed by GitHub
parent b41fd05948
commit 8bd013ea99
4 changed files with 314 additions and 8 deletions

View File

@@ -1,3 +1,3 @@
av==10.*
ctranslate2>=3.8,<4
ctranslate2>=3.9,<4
tokenizers==0.13.*