Jordi Mas
1195359984
Filter out non_speech_tokens in suppressed tokens ( #898 )
...
* Filter out non_speech_tokens in suppressed tokens
2024-07-05 14:43:11 +07:00
Oscaarjs
3084409633
Add V3 Support ( #578 )
...
* Add V3 Support
* update conversion example
---------
Co-authored-by: oscaarjs <oscar.johansson@conversy.se >
2023-11-24 23:16:12 +01:00
Guillaume Klein
727ab81f31
Improve error message for invalid task and language parameters ( #466 )
2023-09-12 10:02:23 +02:00
Guillaume Klein
a5d03e55fa
Prevent out of range error in method split_tokens_on_unicode ( #111 )
2023-04-04 10:51:14 +02:00
Guillaume Klein
9fa1989073
Revert "Prevent out of range error in method split_tokens_on_unicode"
...
This reverts commit 36160c1e7e .
2023-04-04 10:25:41 +02:00
Guillaume Klein
36160c1e7e
Prevent out of range error in method split_tokens_on_unicode
2023-04-04 10:17:56 +02:00
Guillaume Klein
39fddba886
Suppress some special tokens when the default set is not used
2023-03-30 12:42:29 +02:00
Guillaume Klein
d82be59d5f
Fix unset attribute when using English-only models
2023-03-17 18:33:16 +01:00
Guillaume Klein
8bd013ea99
Add word-level timestamps ( #43 )
...
* Add word-level timestamps
* Fix alignment between the segments and the lists of words
* Fix truncated words list when the replacement character is decoded
* Check for empty text_tokens
* Add usage example in the readme
* Update ctranslate2 to 3.9
* Skip empty segment
* Set typing for the new methods
2023-03-15 15:02:28 +01:00
Guillaume Klein
c52adaca90
Create a helper class Tokenizer
2023-03-09 12:53:49 +01:00