Commit Graph

60 Commits

Author SHA1 Message Date
FlippFuzz
5d8f3e2d90 Implement VadOptions (#198)
* Implement VadOptions

* Fix line too long

./faster_whisper/transcribe.py:226:101: E501 line too long (111 > 100 characters)

* Reformatted files with black

* black .\faster_whisper\vad.py    
* black .\faster_whisper\transcribe.py

* Fix import order with isort

* isort .\faster_whisper\vad.py
* isort .\faster_whisper\transcribe.py

* Made recommended changes

Recommended in https://github.com/guillaumekln/faster-whisper/pull/198

* Fix typing of vad_options argument

---------

Co-authored-by: Guillaume Klein <guillaumekln@users.noreply.github.com>
2023-05-09 12:47:02 +02:00
Guillaume Klein
89a4c7f1f0 Update docstring to clarify download_root and output_dir 2023-04-26 17:37:51 +02:00
Guillaume Klein
6f9d68dd6b Fix typing of local_files_only 2023-04-26 17:36:24 +02:00
Jordi Mas
68df3214ba Use cache_dir instead of local_dir (#182)
* Use cache_dir instead of local_dir

* Fix unit test

* Use cache_dir and preserve local_dir parameter

* Remove blank line at the end

* Disable ut

* Implement  download_root suggestion

* Use cache_dir=download_root
2023-04-26 16:35:18 +02:00
Guillaume Klein
8340e04dc6 Assign words to the speech chunk with the greatest coverage (#180) 2023-04-25 15:54:31 +02:00
Guillaume Klein
e06511f96b Rename AudioInfo to TranscriptionInfo (#174) 2023-04-24 16:29:17 +02:00
Anthony
338a725ff8 fix where the tokens are reset (#175) 2023-04-24 16:28:47 +02:00
Amar Sood
f893113759 Align segment structure with openai/whisper (#154)
* Align segment structure with openai/whisper

* Update code to apply requested changes

* Move increment below the segment filtering

---------

Co-authored-by: Guillaume Klein <guillaumekln@users.noreply.github.com>
2023-04-24 15:04:42 +02:00
FlippFuzz
2b51a97e61 Add transcription_options to AudioInfo (#170)
* Add transcription_options to AudioInfo

It would be great if we can include the transcription_options in AudioInfo.

My application is only making a few changes but leaving the rest as default.
However, I would like to record down all settings (including those that I did not specify) so that the audio can be transcribed again identically in future if need be.

* Make TranscriptionOptions appear before AudioInfo

* Remove unnecessary whitespace
2023-04-24 15:02:19 +02:00
Jordi Mas
358d373691 Allow specifying local_files_only to prevent checking the Internet everytime (#166) 2023-04-20 14:26:06 +02:00
Ewald Enzinger
2b53dee6b6 Expose download location in WhisperModel constructor (#126)
This increases compatibility with OpenAI Whisper's whisper.load_model() and is useful for downstream integrations
2023-04-08 10:02:36 +02:00
Guillaume Klein
e9a082dcf2 Keep segment timestamps aligned with words timestamps after VAD (#119) 2023-04-06 11:54:40 +02:00
Guillaume Klein
051b3350e5 Add some info and debug logs (#113) 2023-04-05 16:57:59 +02:00
Guillaume Klein
19698c95f8 Support VAD filter (#95)
* Support VAD filter

* Generalize function collect_samples

* Define AudioSegment class

* Only pass prompt and prefix to the first chunk

* Add dict argument vad_parameters

* Fix isort format

* Rename method

* Update README

* Add shortcut when the chunk offset is 0

* Reword readme

* Fix end property

* Concatenate the speech chunks

* Cleanup diff

* Increase default speech pad

* Update README

* Increase default speech pad
2023-04-03 17:22:48 +02:00
palladium123
b4c1c57781 Added retrieval mechanism (avg_log_prob/no_speech_prob) (#103)
* Added retrieval mechanism 

Added retrieval mechanism to retrieve avg_log_prob and no_speech_prob from the Transcribe method.

* Update transcribe.py

* Update transcribe.py

* Initial commit
2023-04-03 16:56:35 +02:00
Guillaume Klein
1a968a4323 Pass prefix only to the first window 2023-04-01 09:27:20 +02:00
Guillaume Klein
d03383f902 Simplify reuse of the encoder output 2023-03-30 15:58:27 +02:00
Guillaume Klein
39fddba886 Suppress some special tokens when the default set is not used 2023-03-30 12:42:29 +02:00
Guillaume Klein
0224400584 Add large-v1 model 2023-03-28 14:36:10 +02:00
Guillaume Klein
e2705d11c9 Raise an explicit error message if the model size is invalid 2023-03-26 16:30:00 +02:00
Jordi Mas
f8d2fb169f Fix variable name reference (#77) 2023-03-25 10:00:59 +01:00
Guillaume Klein
de7682a2f0 Automatically download converted models from the Hugging Face Hub (#70)
* Automatically download converted models from the Hugging Face Hub

* Remove unused import

* Remove non needed requirements in dev mode

* Remove extra index URL when pip install in CI

* Allow downloading to a specific directory

* Update docstring

* Add argument to disable the progess bars

* Fix typo in docstring
2023-03-24 10:55:55 +01:00
Guillaume Klein
523ae2180f Run the encoder only once for each 30-second window (#73) 2023-03-24 10:53:49 +01:00
Guillaume Klein
52264f2277 Fix typing for device_index argument 2023-03-22 13:51:12 +01:00
Guillaume Klein
0ab8db2b37 Remove debug prints 2023-03-18 09:48:02 +01:00
Guillaume Klein
a70aac18ae Remove unused import 2023-03-18 09:47:02 +01:00
Guillaume Klein
cce6b53e45 Fix incorrect attribute access 2023-03-16 10:32:36 +01:00
Guillaume Klein
2007adf0b5 Fix typing of words attribute 2023-03-15 17:49:07 +01:00
Guillaume Klein
ae9898f0d8 Include duration in AudioInfo structure 2023-03-15 15:30:29 +01:00
Guillaume Klein
eafb2c79a3 Add more typing annotations 2023-03-15 15:22:53 +01:00
Guillaume Klein
8bd013ea99 Add word-level timestamps (#43)
* Add word-level timestamps

* Fix alignment between the segments and the lists of words

* Fix truncated words list when the replacement character is decoded

* Check for empty text_tokens

* Add usage example in the readme

* Update ctranslate2 to 3.9

* Skip empty segment

* Set typing for the new methods
2023-03-15 15:02:28 +01:00
Guillaume Klein
3301dd9273 Make get_input a free function 2023-03-09 12:54:41 +01:00
Guillaume Klein
c52adaca90 Create a helper class Tokenizer 2023-03-09 12:53:49 +01:00
Guillaume Klein
f0a21ea916 Use a dict to represent intermediate segments 2023-03-09 11:53:55 +01:00
Guillaume Klein
6a84df400f Fix all_tokens handling
See 38f2f4d99d
2023-03-09 10:02:25 +01:00
Guillaume Klein
4176da0d68 Rename offset to seek to match the OpenAI implementation 2023-03-09 09:58:58 +01:00
Guillaume Klein
6b16b8a69c Pad the audio instead of the spectrogram
See 919a713499
2023-03-08 10:50:46 +01:00
Guillaume Klein
01ef12a6a0 Do not ignore last segment ending with one timestamp
See eab8d920ed
2023-03-07 10:05:04 +01:00
Guillaume Klein
469244a57d Update CTranslate2 to 3.8.0 2023-03-06 16:21:48 +01:00
Guillaume Klein
4a18adc382 Load the tokenizer from the model directory if it exists 2023-03-01 15:47:16 +01:00
Guillaume Klein
873992623c Accept the audio waveform as an input to transcribe() (#21) 2023-02-28 19:01:31 +01:00
Guillaume Klein
a4f1cc8f11 Add prefix parameter 2023-02-27 12:09:40 +01:00
Guillaume Klein
528aa3e784 Make threshold parameters optional 2023-02-27 11:32:03 +01:00
Guillaume Klein
f0add58bdc Add typing to constructor and transcribe method 2023-02-27 11:22:02 +01:00
Guillaume Klein
ef71be09ed Update CTranslate2 to 3.7.0 2023-02-23 11:18:58 +01:00
Guillaume Klein
d91365e321 Minor code simplification 2023-02-22 11:02:11 +01:00
Guillaume Klein
4b8237da1b Strip the leading space before computing the compression ratio 2023-02-22 10:28:04 +01:00
Guillaume Klein
e47e00910a Add length_penalty parameter and correctly compute the avg log prob 2023-02-22 10:27:38 +01:00
Guillaume Klein
f5c9f15c2c Check that the language code is valid 2023-02-21 12:10:54 +01:00
Guillaume Klein
e2094b6474 Reduce the maximum length when the prompt is longer than 448/2 2023-02-17 14:37:24 +01:00