Guillaume Klein
0f55c436fe
Invalidate the cached encoder output when no_speech threshold is met ( #376 )
2023-07-24 10:57:15 +02:00
KH
e786e26f75
Return result with best log prob when all temperature fallbacks failed ( #356 )
...
* Resolve Inference Selection Bug
* Refactor for better readability
* Filter out results with compression_ratio
* Refactor to avoid variable repetition
* Fix incorrect index and perform minor refactoring
* Remove final_temperature variable
2023-07-20 16:13:11 +02:00
KH
687db319e0
Remove duplicate code ( #359 )
2023-07-18 16:03:01 +02:00
Guillaume Klein
171d90dd1f
Bump version to 0.7.0
2023-07-18 15:23:47 +02:00
Guillaume Klein
0e051a5b77
Prepend prefix tokens with the initial timestamp token ( #358 )
2023-07-18 15:22:39 +02:00
Hoon
3b4a6aa1c2
Improve timestamp heuristics ( #336 )
...
* Improve timestamp heuristics
* Chore
2023-07-05 15:16:53 +02:00
zh-plus
c7cb2aa8d4
Add support for using whisper models from Huggingface by specifying the model id. ( #334 )
...
* Add support for downloading CTranslate-converted models from Huggingface.
* Update utils.py to pass Flake8.
* Update utils.py to pass black.
* Remove redundant usage instructions.
* Apply suggestions from code review
Co-authored-by: Guillaume Klein <guillaumekln@users.noreply.github.com >
---------
Co-authored-by: Guillaume Klein <guillaumekln@users.noreply.github.com >
2023-07-03 17:40:10 +02:00
Guillaume Klein
c0d93d0829
Avoid computing higher temperatures on no_speech segments ( #225 )
...
Port commit e334ff141d
2023-07-03 10:20:36 +02:00
Guillaume Klein
19c294f978
Squash long words at window and sentence boundaries ( #226 )
...
Port commit 255887f219
2023-07-03 10:20:20 +02:00
FlippFuzz
fee52c9229
Allow users to input an Iterable of token ids into initial_prompt ( #306 )
...
* Allow users to input an Iterable of token ids into initial_prompt
* Need to check for String first because string is also an Iterable
2023-06-21 14:46:20 +02:00
Guillaume Klein
efc4f61d85
Do not specify the vocabulary file extension in the download pattern ( #311 )
2023-06-20 10:53:11 +02:00
kh
ad58ba26ab
Fix typo ( #304 )
...
https://github.com/snakers4/silero-vad/discussions/319#discussion-5081706
2023-06-16 07:37:45 +02:00
Guillaume Klein
2a00621564
Bump version to 0.6.0
2023-05-24 16:15:01 +02:00
Guillaume Klein
cf7c021573
Export __version__ at the module level ( #258 )
2023-05-24 15:50:37 +02:00
Guillaume Klein
4db549b800
Make get_speech_timestamps backward compatible with the previous usage ( #259 )
2023-05-24 15:49:36 +02:00
Guillaume Klein
723cb97483
Fix occasional IndexError on empty segments ( #227 )
2023-05-24 12:55:04 +02:00
Guillaume Klein
6a2da9a95c
Also catch client-side network exceptions when synchronizing models ( #228 )
2023-05-11 15:07:15 +02:00
Guillaume Klein
2d7c984bfc
Reformat function download_model for clarity
2023-05-11 14:47:22 +02:00
David Axelrod
53d247b0bb
retry model download locally if huggingface throws an http error. ( #215 )
...
* rety model download locally if huggingface throws an http error.
* appease the linter
* key error fix
* use non internal lib error
Co-authored-by: Guillaume Klein <guillaumekln@users.noreply.github.com >
---------
Co-authored-by: Guillaume Klein <guillaumekln@users.noreply.github.com >
2023-05-09 17:20:22 +02:00
Ozan Caglayan
91f948b0d6
transcribe: return all language probabilities if requested ( #210 )
...
* transcribe: return all language probabilities if requested
If return_all_language_probs is True, TranscriptionInfo structure
will have a list of tuples reflecting all language probabilities
as returned by the model.
* transcribe: fix docstring
* transcribe: remove return_all_lang_probs parameter
2023-05-09 14:53:47 +02:00
FlippFuzz
5d8f3e2d90
Implement VadOptions ( #198 )
...
* Implement VadOptions
* Fix line too long
./faster_whisper/transcribe.py:226:101: E501 line too long (111 > 100 characters)
* Reformatted files with black
* black .\faster_whisper\vad.py
* black .\faster_whisper\transcribe.py
* Fix import order with isort
* isort .\faster_whisper\vad.py
* isort .\faster_whisper\transcribe.py
* Made recommended changes
Recommended in https://github.com/guillaumekln/faster-whisper/pull/198
* Fix typing of vad_options argument
---------
Co-authored-by: Guillaume Klein <guillaumekln@users.noreply.github.com >
2023-05-09 12:47:02 +02:00
Guillaume Klein
89a4c7f1f0
Update docstring to clarify download_root and output_dir
2023-04-26 17:37:51 +02:00
Guillaume Klein
6f9d68dd6b
Fix typing of local_files_only
2023-04-26 17:36:24 +02:00
Jordi Mas
68df3214ba
Use cache_dir instead of local_dir ( #182 )
...
* Use cache_dir instead of local_dir
* Fix unit test
* Use cache_dir and preserve local_dir parameter
* Remove blank line at the end
* Disable ut
* Implement download_root suggestion
* Use cache_dir=download_root
2023-04-26 16:35:18 +02:00
Guillaume Klein
8340e04dc6
Assign words to the speech chunk with the greatest coverage ( #180 )
2023-04-25 15:54:31 +02:00
Guillaume Klein
8cf5d5a4b3
Increase the default value of speech_pad_ms to 400 ms ( #179 )
2023-04-25 15:54:22 +02:00
Guillaume Klein
e06511f96b
Rename AudioInfo to TranscriptionInfo ( #174 )
2023-04-24 16:29:17 +02:00
Anthony
338a725ff8
fix where the tokens are reset ( #175 )
2023-04-24 16:28:47 +02:00
Amar Sood
f893113759
Align segment structure with openai/whisper ( #154 )
...
* Align segment structure with openai/whisper
* Update code to apply requested changes
* Move increment below the segment filtering
---------
Co-authored-by: Guillaume Klein <guillaumekln@users.noreply.github.com >
2023-04-24 15:04:42 +02:00
FlippFuzz
2b51a97e61
Add transcription_options to AudioInfo ( #170 )
...
* Add transcription_options to AudioInfo
It would be great if we can include the transcription_options in AudioInfo.
My application is only making a few changes but leaving the rest as default.
However, I would like to record down all settings (including those that I did not specify) so that the audio can be transcribed again identically in future if need be.
* Make TranscriptionOptions appear before AudioInfo
* Remove unnecessary whitespace
2023-04-24 15:02:19 +02:00
Jordi Mas
358d373691
Allow specifying local_files_only to prevent checking the Internet everytime ( #166 )
2023-04-20 14:26:06 +02:00
Ewald Enzinger
2b53dee6b6
Expose download location in WhisperModel constructor ( #126 )
...
This increases compatibility with OpenAI Whisper's whisper.load_model() and is useful for downstream integrations
2023-04-08 10:02:36 +02:00
Guillaume Klein
e9a082dcf2
Keep segment timestamps aligned with words timestamps after VAD ( #119 )
2023-04-06 11:54:40 +02:00
Guillaume Klein
051b3350e5
Add some info and debug logs ( #113 )
2023-04-05 16:57:59 +02:00
Guillaume Klein
a5d03e55fa
Prevent out of range error in method split_tokens_on_unicode ( #111 )
2023-04-04 10:51:14 +02:00
Guillaume Klein
9fa1989073
Revert "Prevent out of range error in method split_tokens_on_unicode"
...
This reverts commit 36160c1e7e .
2023-04-04 10:25:41 +02:00
Guillaume Klein
36160c1e7e
Prevent out of range error in method split_tokens_on_unicode
2023-04-04 10:17:56 +02:00
Guillaume Klein
2f266eb844
Fix VAD index error when a predicted timestamps is too large ( #107 )
2023-04-03 19:34:54 +02:00
Guillaume Klein
19698c95f8
Support VAD filter ( #95 )
...
* Support VAD filter
* Generalize function collect_samples
* Define AudioSegment class
* Only pass prompt and prefix to the first chunk
* Add dict argument vad_parameters
* Fix isort format
* Rename method
* Update README
* Add shortcut when the chunk offset is 0
* Reword readme
* Fix end property
* Concatenate the speech chunks
* Cleanup diff
* Increase default speech pad
* Update README
* Increase default speech pad
2023-04-03 17:22:48 +02:00
palladium123
b4c1c57781
Added retrieval mechanism (avg_log_prob/no_speech_prob) ( #103 )
...
* Added retrieval mechanism
Added retrieval mechanism to retrieve avg_log_prob and no_speech_prob from the Transcribe method.
* Update transcribe.py
* Update transcribe.py
* Initial commit
2023-04-03 16:56:35 +02:00
Guillaume Klein
f20bb258de
Support separating the left and right audio channels ( #97 )
2023-04-03 11:22:43 +02:00
Guillaume Klein
1a968a4323
Pass prefix only to the first window
2023-04-01 09:27:20 +02:00
Guillaume Klein
d03383f902
Simplify reuse of the encoder output
2023-03-30 15:58:27 +02:00
Guillaume Klein
39fddba886
Suppress some special tokens when the default set is not used
2023-03-30 12:42:29 +02:00
Guillaume Klein
eda840f8ff
Always disable the progress bar specific to snapshot_download
2023-03-29 12:11:24 +02:00
Guillaume Klein
0224400584
Add large-v1 model
2023-03-28 14:36:10 +02:00
Guillaume Klein
8246479fda
Ignore the invalid audio frames ( #82 )
2023-03-27 10:19:22 +02:00
Guillaume Klein
e2705d11c9
Raise an explicit error message if the model size is invalid
2023-03-26 16:30:00 +02:00
Jordi Mas
f8d2fb169f
Fix variable name reference ( #77 )
2023-03-25 10:00:59 +01:00
Guillaume Klein
a10732c74a
Only download the required model files
2023-03-24 17:59:11 +01:00