Guillaume Klein
e06511f96b
Rename AudioInfo to TranscriptionInfo ( #174 )
2023-04-24 16:29:17 +02:00
Anthony
338a725ff8
fix where the tokens are reset ( #175 )
2023-04-24 16:28:47 +02:00
Amar Sood
f893113759
Align segment structure with openai/whisper ( #154 )
...
* Align segment structure with openai/whisper
* Update code to apply requested changes
* Move increment below the segment filtering
---------
Co-authored-by: Guillaume Klein <guillaumekln@users.noreply.github.com >
2023-04-24 15:04:42 +02:00
FlippFuzz
2b51a97e61
Add transcription_options to AudioInfo ( #170 )
...
* Add transcription_options to AudioInfo
It would be great if we can include the transcription_options in AudioInfo.
My application is only making a few changes but leaving the rest as default.
However, I would like to record down all settings (including those that I did not specify) so that the audio can be transcribed again identically in future if need be.
* Make TranscriptionOptions appear before AudioInfo
* Remove unnecessary whitespace
2023-04-24 15:02:19 +02:00
Jordi Mas
358d373691
Allow specifying local_files_only to prevent checking the Internet everytime ( #166 )
2023-04-20 14:26:06 +02:00
Ewald Enzinger
2b53dee6b6
Expose download location in WhisperModel constructor ( #126 )
...
This increases compatibility with OpenAI Whisper's whisper.load_model() and is useful for downstream integrations
2023-04-08 10:02:36 +02:00
Guillaume Klein
e9a082dcf2
Keep segment timestamps aligned with words timestamps after VAD ( #119 )
2023-04-06 11:54:40 +02:00
Guillaume Klein
051b3350e5
Add some info and debug logs ( #113 )
2023-04-05 16:57:59 +02:00
Guillaume Klein
19698c95f8
Support VAD filter ( #95 )
...
* Support VAD filter
* Generalize function collect_samples
* Define AudioSegment class
* Only pass prompt and prefix to the first chunk
* Add dict argument vad_parameters
* Fix isort format
* Rename method
* Update README
* Add shortcut when the chunk offset is 0
* Reword readme
* Fix end property
* Concatenate the speech chunks
* Cleanup diff
* Increase default speech pad
* Update README
* Increase default speech pad
2023-04-03 17:22:48 +02:00
palladium123
b4c1c57781
Added retrieval mechanism (avg_log_prob/no_speech_prob) ( #103 )
...
* Added retrieval mechanism
Added retrieval mechanism to retrieve avg_log_prob and no_speech_prob from the Transcribe method.
* Update transcribe.py
* Update transcribe.py
* Initial commit
2023-04-03 16:56:35 +02:00
Guillaume Klein
1a968a4323
Pass prefix only to the first window
2023-04-01 09:27:20 +02:00
Guillaume Klein
d03383f902
Simplify reuse of the encoder output
2023-03-30 15:58:27 +02:00
Guillaume Klein
39fddba886
Suppress some special tokens when the default set is not used
2023-03-30 12:42:29 +02:00
Guillaume Klein
0224400584
Add large-v1 model
2023-03-28 14:36:10 +02:00
Guillaume Klein
e2705d11c9
Raise an explicit error message if the model size is invalid
2023-03-26 16:30:00 +02:00
Jordi Mas
f8d2fb169f
Fix variable name reference ( #77 )
2023-03-25 10:00:59 +01:00
Guillaume Klein
de7682a2f0
Automatically download converted models from the Hugging Face Hub ( #70 )
...
* Automatically download converted models from the Hugging Face Hub
* Remove unused import
* Remove non needed requirements in dev mode
* Remove extra index URL when pip install in CI
* Allow downloading to a specific directory
* Update docstring
* Add argument to disable the progess bars
* Fix typo in docstring
2023-03-24 10:55:55 +01:00
Guillaume Klein
523ae2180f
Run the encoder only once for each 30-second window ( #73 )
2023-03-24 10:53:49 +01:00
Guillaume Klein
52264f2277
Fix typing for device_index argument
2023-03-22 13:51:12 +01:00
Guillaume Klein
0ab8db2b37
Remove debug prints
2023-03-18 09:48:02 +01:00
Guillaume Klein
a70aac18ae
Remove unused import
2023-03-18 09:47:02 +01:00
Guillaume Klein
cce6b53e45
Fix incorrect attribute access
2023-03-16 10:32:36 +01:00
Guillaume Klein
2007adf0b5
Fix typing of words attribute
2023-03-15 17:49:07 +01:00
Guillaume Klein
ae9898f0d8
Include duration in AudioInfo structure
2023-03-15 15:30:29 +01:00
Guillaume Klein
eafb2c79a3
Add more typing annotations
2023-03-15 15:22:53 +01:00
Guillaume Klein
8bd013ea99
Add word-level timestamps ( #43 )
...
* Add word-level timestamps
* Fix alignment between the segments and the lists of words
* Fix truncated words list when the replacement character is decoded
* Check for empty text_tokens
* Add usage example in the readme
* Update ctranslate2 to 3.9
* Skip empty segment
* Set typing for the new methods
2023-03-15 15:02:28 +01:00
Guillaume Klein
3301dd9273
Make get_input a free function
2023-03-09 12:54:41 +01:00
Guillaume Klein
c52adaca90
Create a helper class Tokenizer
2023-03-09 12:53:49 +01:00
Guillaume Klein
f0a21ea916
Use a dict to represent intermediate segments
2023-03-09 11:53:55 +01:00
Guillaume Klein
6a84df400f
Fix all_tokens handling
...
See 38f2f4d99d
2023-03-09 10:02:25 +01:00
Guillaume Klein
4176da0d68
Rename offset to seek to match the OpenAI implementation
2023-03-09 09:58:58 +01:00
Guillaume Klein
6b16b8a69c
Pad the audio instead of the spectrogram
...
See 919a713499
2023-03-08 10:50:46 +01:00
Guillaume Klein
01ef12a6a0
Do not ignore last segment ending with one timestamp
...
See eab8d920ed
2023-03-07 10:05:04 +01:00
Guillaume Klein
469244a57d
Update CTranslate2 to 3.8.0
2023-03-06 16:21:48 +01:00
Guillaume Klein
4a18adc382
Load the tokenizer from the model directory if it exists
2023-03-01 15:47:16 +01:00
Guillaume Klein
873992623c
Accept the audio waveform as an input to transcribe() ( #21 )
2023-02-28 19:01:31 +01:00
Guillaume Klein
a4f1cc8f11
Add prefix parameter
2023-02-27 12:09:40 +01:00
Guillaume Klein
528aa3e784
Make threshold parameters optional
2023-02-27 11:32:03 +01:00
Guillaume Klein
f0add58bdc
Add typing to constructor and transcribe method
2023-02-27 11:22:02 +01:00
Guillaume Klein
ef71be09ed
Update CTranslate2 to 3.7.0
2023-02-23 11:18:58 +01:00
Guillaume Klein
d91365e321
Minor code simplification
2023-02-22 11:02:11 +01:00
Guillaume Klein
4b8237da1b
Strip the leading space before computing the compression ratio
2023-02-22 10:28:04 +01:00
Guillaume Klein
e47e00910a
Add length_penalty parameter and correctly compute the avg log prob
2023-02-22 10:27:38 +01:00
Guillaume Klein
f5c9f15c2c
Check that the language code is valid
2023-02-21 12:10:54 +01:00
Guillaume Klein
e2094b6474
Reduce the maximum length when the prompt is longer than 448/2
2023-02-17 14:37:24 +01:00
Guillaume Klein
123d9a5704
Support English-only models
2023-02-16 17:02:40 +01:00
Guillaume Klein
cbbe633082
Add num_workers parameter
2023-02-14 09:34:05 +01:00
Guillaume Klein
c86353d323
Add task parameter
2023-02-13 21:26:25 +01:00
Guillaume Klein
f56dfc6491
Add without_timestamps parameter
2023-02-13 21:22:05 +01:00
Guillaume Klein
3dc44f7bb5
Raise a more explicit error message for English-only models
2023-02-13 18:26:45 +01:00