Commit Graph

3 Commits

Author SHA1 Message Date
Guillaume Klein
d82be59d5f Fix unset attribute when using English-only models 2023-03-17 18:33:16 +01:00
Guillaume Klein
8bd013ea99 Add word-level timestamps (#43)
* Add word-level timestamps

* Fix alignment between the segments and the lists of words

* Fix truncated words list when the replacement character is decoded

* Check for empty text_tokens

* Add usage example in the readme

* Update ctranslate2 to 3.9

* Skip empty segment

* Set typing for the new methods
2023-03-15 15:02:28 +01:00
Guillaume Klein
c52adaca90 Create a helper class Tokenizer 2023-03-09 12:53:49 +01:00