faster-whisper

Author	SHA1	Message	Date
Guillaume Klein	1e6eb967c9	Add "large" alias for "large-v2" model (#453 )	2023-09-04 11:54:42 +02:00
Guillaume Klein	f0ff12965a	Expose generation parameter no_repeat_ngram_size (#449 )	2023-09-01 17:31:30 +02:00
MinorJinx	e87fbf8a49	Added audio duration after VAD to TranscriptionInfo object (#445 ) * Added VAD removed audio duration to TranscriptionInfo object Along with the duration of the original audio, this commit adds the seconds of audio removed by the VAD to the returned info obj * Chaning naming for duration_after_vad Instead of the property returning the audio duration removed, it now returns the final duration after the vad. If vad_filter is False or if it doesn't remove any audio, the original duration is returned.	2023-08-31 17:19:48 +02:00
Aisu Wata	1562b02345	added repetition_penalty to TranscriptionOptions (#403 ) Co-authored-by: Aisu Wata <aisu.wata0@gmail.com>	2023-08-06 10:08:24 +02:00
Purfview	1ce16652ee	Adds DEBUG log message for prompt_reset_on_temperature (#399 ) Produce DEBUG log message if prompt_reset_on_temperature threshold is met.	2023-08-04 09:06:17 +02:00
Purfview	857be6f621	Rename clear_previous_text_on_temperature argument (#398 ) `prompt_reset_on_temperature` is more clear what it does.	2023-08-03 18:44:37 +02:00
KH	1a1eb1a027	Add clear_previous_text_on_temperature parameter (#397 ) * Add clear_previous_text_on_temperature parameter * Add a description	2023-08-03 15:40:58 +02:00
Guillaume Klein	0f55c436fe	Invalidate the cached encoder output when no_speech threshold is met (#376 )	2023-07-24 10:57:15 +02:00
KH	e786e26f75	Return result with best log prob when all temperature fallbacks failed (#356 ) * Resolve Inference Selection Bug * Refactor for better readability * Filter out results with compression_ratio * Refactor to avoid variable repetition * Fix incorrect index and perform minor refactoring * Remove final_temperature variable	2023-07-20 16:13:11 +02:00
KH	687db319e0	Remove duplicate code (#359 )	2023-07-18 16:03:01 +02:00
Guillaume Klein	0e051a5b77	Prepend prefix tokens with the initial timestamp token (#358 )	2023-07-18 15:22:39 +02:00
Hoon	3b4a6aa1c2	Improve timestamp heuristics (#336 ) * Improve timestamp heuristics * Chore	2023-07-05 15:16:53 +02:00
zh-plus	c7cb2aa8d4	Add support for using whisper models from Huggingface by specifying the model id. (#334 ) * Add support for downloading CTranslate-converted models from Huggingface. * Update utils.py to pass Flake8. * Update utils.py to pass black. * Remove redundant usage instructions. * Apply suggestions from code review Co-authored-by: Guillaume Klein <guillaumekln@users.noreply.github.com> --------- Co-authored-by: Guillaume Klein <guillaumekln@users.noreply.github.com>	2023-07-03 17:40:10 +02:00
Guillaume Klein	c0d93d0829	Avoid computing higher temperatures on no_speech segments (#225 ) Port commit `e334ff141d`	2023-07-03 10:20:36 +02:00
Guillaume Klein	19c294f978	Squash long words at window and sentence boundaries (#226 ) Port commit `255887f219`	2023-07-03 10:20:20 +02:00
FlippFuzz	fee52c9229	Allow users to input an Iterable of token ids into initial_prompt (#306 ) * Allow users to input an Iterable of token ids into initial_prompt * Need to check for String first because string is also an Iterable	2023-06-21 14:46:20 +02:00
Guillaume Klein	723cb97483	Fix occasional IndexError on empty segments (#227 )	2023-05-24 12:55:04 +02:00
Ozan Caglayan	91f948b0d6	transcribe: return all language probabilities if requested (#210 ) * transcribe: return all language probabilities if requested If return_all_language_probs is True, TranscriptionInfo structure will have a list of tuples reflecting all language probabilities as returned by the model. * transcribe: fix docstring * transcribe: remove return_all_lang_probs parameter	2023-05-09 14:53:47 +02:00
FlippFuzz	5d8f3e2d90	Implement VadOptions (#198 ) * Implement VadOptions * Fix line too long ./faster_whisper/transcribe.py:226:101: E501 line too long (111 > 100 characters) * Reformatted files with black * black .\faster_whisper\vad.py * black .\faster_whisper\transcribe.py * Fix import order with isort * isort .\faster_whisper\vad.py * isort .\faster_whisper\transcribe.py * Made recommended changes Recommended in https://github.com/guillaumekln/faster-whisper/pull/198 * Fix typing of vad_options argument --------- Co-authored-by: Guillaume Klein <guillaumekln@users.noreply.github.com>	2023-05-09 12:47:02 +02:00
Guillaume Klein	89a4c7f1f0	Update docstring to clarify download_root and output_dir	2023-04-26 17:37:51 +02:00
Guillaume Klein	6f9d68dd6b	Fix typing of local_files_only	2023-04-26 17:36:24 +02:00
Jordi Mas	68df3214ba	Use cache_dir instead of local_dir (#182 ) * Use cache_dir instead of local_dir * Fix unit test * Use cache_dir and preserve local_dir parameter * Remove blank line at the end * Disable ut * Implement download_root suggestion * Use cache_dir=download_root	2023-04-26 16:35:18 +02:00
Guillaume Klein	8340e04dc6	Assign words to the speech chunk with the greatest coverage (#180 )	2023-04-25 15:54:31 +02:00
Guillaume Klein	e06511f96b	Rename AudioInfo to TranscriptionInfo (#174 )	2023-04-24 16:29:17 +02:00
Anthony	338a725ff8	fix where the tokens are reset (#175 )	2023-04-24 16:28:47 +02:00
Amar Sood	f893113759	Align segment structure with openai/whisper (#154 ) * Align segment structure with openai/whisper * Update code to apply requested changes * Move increment below the segment filtering --------- Co-authored-by: Guillaume Klein <guillaumekln@users.noreply.github.com>	2023-04-24 15:04:42 +02:00
FlippFuzz	2b51a97e61	Add transcription_options to AudioInfo (#170 ) * Add transcription_options to AudioInfo It would be great if we can include the transcription_options in AudioInfo. My application is only making a few changes but leaving the rest as default. However, I would like to record down all settings (including those that I did not specify) so that the audio can be transcribed again identically in future if need be. * Make TranscriptionOptions appear before AudioInfo * Remove unnecessary whitespace	2023-04-24 15:02:19 +02:00
Jordi Mas	358d373691	Allow specifying local_files_only to prevent checking the Internet everytime (#166 )	2023-04-20 14:26:06 +02:00
Ewald Enzinger	2b53dee6b6	Expose download location in WhisperModel constructor (#126 ) This increases compatibility with OpenAI Whisper's whisper.load_model() and is useful for downstream integrations	2023-04-08 10:02:36 +02:00
Guillaume Klein	e9a082dcf2	Keep segment timestamps aligned with words timestamps after VAD (#119 )	2023-04-06 11:54:40 +02:00
Guillaume Klein	051b3350e5	Add some info and debug logs (#113 )	2023-04-05 16:57:59 +02:00
Guillaume Klein	19698c95f8	Support VAD filter (#95 ) * Support VAD filter * Generalize function collect_samples * Define AudioSegment class * Only pass prompt and prefix to the first chunk * Add dict argument vad_parameters * Fix isort format * Rename method * Update README * Add shortcut when the chunk offset is 0 * Reword readme * Fix end property * Concatenate the speech chunks * Cleanup diff * Increase default speech pad * Update README * Increase default speech pad	2023-04-03 17:22:48 +02:00
palladium123	b4c1c57781	Added retrieval mechanism (avg_log_prob/no_speech_prob) (#103 ) * Added retrieval mechanism Added retrieval mechanism to retrieve avg_log_prob and no_speech_prob from the Transcribe method. * Update transcribe.py * Update transcribe.py * Initial commit	2023-04-03 16:56:35 +02:00
Guillaume Klein	1a968a4323	Pass prefix only to the first window	2023-04-01 09:27:20 +02:00
Guillaume Klein	d03383f902	Simplify reuse of the encoder output	2023-03-30 15:58:27 +02:00
Guillaume Klein	39fddba886	Suppress some special tokens when the default set is not used	2023-03-30 12:42:29 +02:00
Guillaume Klein	0224400584	Add large-v1 model	2023-03-28 14:36:10 +02:00
Guillaume Klein	e2705d11c9	Raise an explicit error message if the model size is invalid	2023-03-26 16:30:00 +02:00
Jordi Mas	f8d2fb169f	Fix variable name reference (#77 )	2023-03-25 10:00:59 +01:00
Guillaume Klein	de7682a2f0	Automatically download converted models from the Hugging Face Hub (#70 ) * Automatically download converted models from the Hugging Face Hub * Remove unused import * Remove non needed requirements in dev mode * Remove extra index URL when pip install in CI * Allow downloading to a specific directory * Update docstring * Add argument to disable the progess bars * Fix typo in docstring	2023-03-24 10:55:55 +01:00
Guillaume Klein	523ae2180f	Run the encoder only once for each 30-second window (#73 )	2023-03-24 10:53:49 +01:00
Guillaume Klein	52264f2277	Fix typing for device_index argument	2023-03-22 13:51:12 +01:00
Guillaume Klein	0ab8db2b37	Remove debug prints	2023-03-18 09:48:02 +01:00
Guillaume Klein	a70aac18ae	Remove unused import	2023-03-18 09:47:02 +01:00
Guillaume Klein	cce6b53e45	Fix incorrect attribute access	2023-03-16 10:32:36 +01:00
Guillaume Klein	2007adf0b5	Fix typing of words attribute	2023-03-15 17:49:07 +01:00
Guillaume Klein	ae9898f0d8	Include duration in AudioInfo structure	2023-03-15 15:30:29 +01:00
Guillaume Klein	eafb2c79a3	Add more typing annotations	2023-03-15 15:22:53 +01:00
Guillaume Klein	8bd013ea99	Add word-level timestamps (#43 ) * Add word-level timestamps * Fix alignment between the segments and the lists of words * Fix truncated words list when the replacement character is decoded * Check for empty text_tokens * Add usage example in the readme * Update ctranslate2 to 3.9 * Skip empty segment * Set typing for the new methods	2023-03-15 15:02:28 +01:00
Guillaume Klein	3301dd9273	Make get_input a free function	2023-03-09 12:54:41 +01:00

1 2

78 Commits