faster-whisper

Author	SHA1	Message	Date
FlippFuzz	5d8f3e2d90	Implement VadOptions (#198 ) * Implement VadOptions * Fix line too long ./faster_whisper/transcribe.py:226:101: E501 line too long (111 > 100 characters) * Reformatted files with black * black .\faster_whisper\vad.py * black .\faster_whisper\transcribe.py * Fix import order with isort * isort .\faster_whisper\vad.py * isort .\faster_whisper\transcribe.py * Made recommended changes Recommended in https://github.com/guillaumekln/faster-whisper/pull/198 * Fix typing of vad_options argument --------- Co-authored-by: Guillaume Klein <guillaumekln@users.noreply.github.com>	2023-05-09 12:47:02 +02:00
Guillaume Klein	89a4c7f1f0	Update docstring to clarify download_root and output_dir	2023-04-26 17:37:51 +02:00
Guillaume Klein	6f9d68dd6b	Fix typing of local_files_only	2023-04-26 17:36:24 +02:00
Jordi Mas	68df3214ba	Use cache_dir instead of local_dir (#182 ) * Use cache_dir instead of local_dir * Fix unit test * Use cache_dir and preserve local_dir parameter * Remove blank line at the end * Disable ut * Implement download_root suggestion * Use cache_dir=download_root	2023-04-26 16:35:18 +02:00
Guillaume Klein	8340e04dc6	Assign words to the speech chunk with the greatest coverage (#180 )	2023-04-25 15:54:31 +02:00
Guillaume Klein	e06511f96b	Rename AudioInfo to TranscriptionInfo (#174 )	2023-04-24 16:29:17 +02:00
Anthony	338a725ff8	fix where the tokens are reset (#175 )	2023-04-24 16:28:47 +02:00
Amar Sood	f893113759	Align segment structure with openai/whisper (#154 ) * Align segment structure with openai/whisper * Update code to apply requested changes * Move increment below the segment filtering --------- Co-authored-by: Guillaume Klein <guillaumekln@users.noreply.github.com>	2023-04-24 15:04:42 +02:00
FlippFuzz	2b51a97e61	Add transcription_options to AudioInfo (#170 ) * Add transcription_options to AudioInfo It would be great if we can include the transcription_options in AudioInfo. My application is only making a few changes but leaving the rest as default. However, I would like to record down all settings (including those that I did not specify) so that the audio can be transcribed again identically in future if need be. * Make TranscriptionOptions appear before AudioInfo * Remove unnecessary whitespace	2023-04-24 15:02:19 +02:00
Jordi Mas	358d373691	Allow specifying local_files_only to prevent checking the Internet everytime (#166 )	2023-04-20 14:26:06 +02:00
Ewald Enzinger	2b53dee6b6	Expose download location in WhisperModel constructor (#126 ) This increases compatibility with OpenAI Whisper's whisper.load_model() and is useful for downstream integrations	2023-04-08 10:02:36 +02:00
Guillaume Klein	e9a082dcf2	Keep segment timestamps aligned with words timestamps after VAD (#119 )	2023-04-06 11:54:40 +02:00
Guillaume Klein	051b3350e5	Add some info and debug logs (#113 )	2023-04-05 16:57:59 +02:00
Guillaume Klein	19698c95f8	Support VAD filter (#95 ) * Support VAD filter * Generalize function collect_samples * Define AudioSegment class * Only pass prompt and prefix to the first chunk * Add dict argument vad_parameters * Fix isort format * Rename method * Update README * Add shortcut when the chunk offset is 0 * Reword readme * Fix end property * Concatenate the speech chunks * Cleanup diff * Increase default speech pad * Update README * Increase default speech pad	2023-04-03 17:22:48 +02:00
palladium123	b4c1c57781	Added retrieval mechanism (avg_log_prob/no_speech_prob) (#103 ) * Added retrieval mechanism Added retrieval mechanism to retrieve avg_log_prob and no_speech_prob from the Transcribe method. * Update transcribe.py * Update transcribe.py * Initial commit	2023-04-03 16:56:35 +02:00
Guillaume Klein	1a968a4323	Pass prefix only to the first window	2023-04-01 09:27:20 +02:00
Guillaume Klein	d03383f902	Simplify reuse of the encoder output	2023-03-30 15:58:27 +02:00
Guillaume Klein	39fddba886	Suppress some special tokens when the default set is not used	2023-03-30 12:42:29 +02:00
Guillaume Klein	0224400584	Add large-v1 model	2023-03-28 14:36:10 +02:00
Guillaume Klein	e2705d11c9	Raise an explicit error message if the model size is invalid	2023-03-26 16:30:00 +02:00
Jordi Mas	f8d2fb169f	Fix variable name reference (#77 )	2023-03-25 10:00:59 +01:00
Guillaume Klein	de7682a2f0	Automatically download converted models from the Hugging Face Hub (#70 ) * Automatically download converted models from the Hugging Face Hub * Remove unused import * Remove non needed requirements in dev mode * Remove extra index URL when pip install in CI * Allow downloading to a specific directory * Update docstring * Add argument to disable the progess bars * Fix typo in docstring	2023-03-24 10:55:55 +01:00
Guillaume Klein	523ae2180f	Run the encoder only once for each 30-second window (#73 )	2023-03-24 10:53:49 +01:00
Guillaume Klein	52264f2277	Fix typing for device_index argument	2023-03-22 13:51:12 +01:00
Guillaume Klein	0ab8db2b37	Remove debug prints	2023-03-18 09:48:02 +01:00
Guillaume Klein	a70aac18ae	Remove unused import	2023-03-18 09:47:02 +01:00
Guillaume Klein	cce6b53e45	Fix incorrect attribute access	2023-03-16 10:32:36 +01:00
Guillaume Klein	2007adf0b5	Fix typing of words attribute	2023-03-15 17:49:07 +01:00
Guillaume Klein	ae9898f0d8	Include duration in AudioInfo structure	2023-03-15 15:30:29 +01:00
Guillaume Klein	eafb2c79a3	Add more typing annotations	2023-03-15 15:22:53 +01:00
Guillaume Klein	8bd013ea99	Add word-level timestamps (#43 ) * Add word-level timestamps * Fix alignment between the segments and the lists of words * Fix truncated words list when the replacement character is decoded * Check for empty text_tokens * Add usage example in the readme * Update ctranslate2 to 3.9 * Skip empty segment * Set typing for the new methods	2023-03-15 15:02:28 +01:00
Guillaume Klein	3301dd9273	Make get_input a free function	2023-03-09 12:54:41 +01:00
Guillaume Klein	c52adaca90	Create a helper class Tokenizer	2023-03-09 12:53:49 +01:00
Guillaume Klein	f0a21ea916	Use a dict to represent intermediate segments	2023-03-09 11:53:55 +01:00
Guillaume Klein	6a84df400f	Fix all_tokens handling See `38f2f4d99d`	2023-03-09 10:02:25 +01:00
Guillaume Klein	4176da0d68	Rename offset to seek to match the OpenAI implementation	2023-03-09 09:58:58 +01:00
Guillaume Klein	6b16b8a69c	Pad the audio instead of the spectrogram See `919a713499`	2023-03-08 10:50:46 +01:00
Guillaume Klein	01ef12a6a0	Do not ignore last segment ending with one timestamp See `eab8d920ed`	2023-03-07 10:05:04 +01:00
Guillaume Klein	469244a57d	Update CTranslate2 to 3.8.0	2023-03-06 16:21:48 +01:00
Guillaume Klein	4a18adc382	Load the tokenizer from the model directory if it exists	2023-03-01 15:47:16 +01:00
Guillaume Klein	873992623c	Accept the audio waveform as an input to transcribe() (#21 )	2023-02-28 19:01:31 +01:00
Guillaume Klein	a4f1cc8f11	Add prefix parameter	2023-02-27 12:09:40 +01:00
Guillaume Klein	528aa3e784	Make threshold parameters optional	2023-02-27 11:32:03 +01:00
Guillaume Klein	f0add58bdc	Add typing to constructor and transcribe method	2023-02-27 11:22:02 +01:00
Guillaume Klein	ef71be09ed	Update CTranslate2 to 3.7.0	2023-02-23 11:18:58 +01:00
Guillaume Klein	d91365e321	Minor code simplification	2023-02-22 11:02:11 +01:00
Guillaume Klein	4b8237da1b	Strip the leading space before computing the compression ratio	2023-02-22 10:28:04 +01:00
Guillaume Klein	e47e00910a	Add length_penalty parameter and correctly compute the avg log prob	2023-02-22 10:27:38 +01:00
Guillaume Klein	f5c9f15c2c	Check that the language code is valid	2023-02-21 12:10:54 +01:00
Guillaume Klein	e2094b6474	Reduce the maximum length when the prompt is longer than 448/2	2023-02-17 14:37:24 +01:00

1 2

60 Commits