Commit Graph

100 Commits

Author SHA1 Message Date
Guillaume Klein
32dc625f11 Update README.md 2023-04-25 15:47:38 +02:00
Guillaume Klein
e06511f96b Rename AudioInfo to TranscriptionInfo (#174) 2023-04-24 16:29:17 +02:00
Anthony
338a725ff8 fix where the tokens are reset (#175) 2023-04-24 16:28:47 +02:00
Amar Sood
f893113759 Align segment structure with openai/whisper (#154)
* Align segment structure with openai/whisper

* Update code to apply requested changes

* Move increment below the segment filtering

---------

Co-authored-by: Guillaume Klein <guillaumekln@users.noreply.github.com>
2023-04-24 15:04:42 +02:00
FlippFuzz
2b51a97e61 Add transcription_options to AudioInfo (#170)
* Add transcription_options to AudioInfo

It would be great if we can include the transcription_options in AudioInfo.

My application is only making a few changes but leaving the rest as default.
However, I would like to record down all settings (including those that I did not specify) so that the audio can be transcribed again identically in future if need be.

* Make TranscriptionOptions appear before AudioInfo

* Remove unnecessary whitespace
2023-04-24 15:02:19 +02:00
Jordi Mas
358d373691 Allow specifying local_files_only to prevent checking the Internet everytime (#166) 2023-04-20 14:26:06 +02:00
Guillaume Klein
3adcc12d0f Clarify that the returned segments value is a generator (#144)
* Clarify that the returned segments value is a generator

* Update README.md
2023-04-13 09:50:53 +02:00
Ewald Enzinger
2b53dee6b6 Expose download location in WhisperModel constructor (#126)
This increases compatibility with OpenAI Whisper's whisper.load_model() and is useful for downstream integrations
2023-04-08 10:02:36 +02:00
Bekir Bakar
06d24056e9 Configure ignore for more files. (#122) 2023-04-06 19:13:09 +02:00
Guillaume Klein
e9a082dcf2 Keep segment timestamps aligned with words timestamps after VAD (#119) 2023-04-06 11:54:40 +02:00
Guillaume Klein
051b3350e5 Add some info and debug logs (#113) 2023-04-05 16:57:59 +02:00
Guillaume Klein
746f2698db Bump version to 0.4.1 2023-04-04 12:16:23 +02:00
Guillaume Klein
a5d03e55fa Prevent out of range error in method split_tokens_on_unicode (#111) 2023-04-04 10:51:14 +02:00
Guillaume Klein
9fa1989073 Revert "Prevent out of range error in method split_tokens_on_unicode"
This reverts commit 36160c1e7e.
2023-04-04 10:25:41 +02:00
Guillaume Klein
36160c1e7e Prevent out of range error in method split_tokens_on_unicode 2023-04-04 10:17:56 +02:00
Guillaume Klein
2f266eb844 Fix VAD index error when a predicted timestamps is too large (#107) 2023-04-03 19:34:54 +02:00
Guillaume Klein
8c36ac1be8 Bump version to 0.4.0 2023-04-03 17:24:49 +02:00
Guillaume Klein
19698c95f8 Support VAD filter (#95)
* Support VAD filter

* Generalize function collect_samples

* Define AudioSegment class

* Only pass prompt and prefix to the first chunk

* Add dict argument vad_parameters

* Fix isort format

* Rename method

* Update README

* Add shortcut when the chunk offset is 0

* Reword readme

* Fix end property

* Concatenate the speech chunks

* Cleanup diff

* Increase default speech pad

* Update README

* Increase default speech pad
2023-04-03 17:22:48 +02:00
palladium123
b4c1c57781 Added retrieval mechanism (avg_log_prob/no_speech_prob) (#103)
* Added retrieval mechanism 

Added retrieval mechanism to retrieve avg_log_prob and no_speech_prob from the Transcribe method.

* Update transcribe.py

* Update transcribe.py

* Initial commit
2023-04-03 16:56:35 +02:00
Guillaume Klein
f20bb258de Support separating the left and right audio channels (#97) 2023-04-03 11:22:43 +02:00
Guillaume Klein
1a968a4323 Pass prefix only to the first window 2023-04-01 09:27:20 +02:00
Guillaume Klein
def70d8496 Update headings in the Usage section 2023-03-31 18:54:55 +02:00
mayeaux
7301df7f8b Update README.md (#101)
* Update README.md

* Update README.md

Co-authored-by: Guillaume Klein <guillaumekln@users.noreply.github.com>

* Update README.md

Co-authored-by: Guillaume Klein <guillaumekln@users.noreply.github.com>

---------

Co-authored-by: Guillaume Klein <guillaumekln@users.noreply.github.com>
2023-03-31 17:06:44 +02:00
Guillaume Klein
d03383f902 Simplify reuse of the encoder output 2023-03-30 15:58:27 +02:00
Guillaume Klein
39fddba886 Suppress some special tokens when the default set is not used 2023-03-30 12:42:29 +02:00
Guillaume Klein
eda840f8ff Always disable the progress bar specific to snapshot_download 2023-03-29 12:11:24 +02:00
Guillaume Klein
0224400584 Add large-v1 model 2023-03-28 14:36:10 +02:00
Guillaume Klein
8246479fda Ignore the invalid audio frames (#82) 2023-03-27 10:19:22 +02:00
Guillaume Klein
e2705d11c9 Raise an explicit error message if the model size is invalid 2023-03-26 16:30:00 +02:00
Jordi Mas
f8d2fb169f Fix variable name reference (#77) 2023-03-25 10:00:59 +01:00
Guillaume Klein
a10732c74a Only download the required model files 2023-03-24 17:59:11 +01:00
Guillaume Klein
7808eddf06 Bump version to 0.3.0 2023-03-24 10:56:42 +01:00
Guillaume Klein
de7682a2f0 Automatically download converted models from the Hugging Face Hub (#70)
* Automatically download converted models from the Hugging Face Hub

* Remove unused import

* Remove non needed requirements in dev mode

* Remove extra index URL when pip install in CI

* Allow downloading to a specific directory

* Update docstring

* Add argument to disable the progess bars

* Fix typo in docstring
2023-03-24 10:55:55 +01:00
Guillaume Klein
523ae2180f Run the encoder only once for each 30-second window (#73) 2023-03-24 10:53:49 +01:00
Guillaume Klein
2b7be47041 Update README.md 2023-03-24 09:15:05 +01:00
Guillaume Klein
3f02c53610 Add .gitignore file 2023-03-23 20:52:46 +01:00
Guillaume Klein
e663186a4b Add some badges at the top of the README 2023-03-23 20:33:19 +01:00
Guillaume Klein
e44a8c7ba0 Update the README following the PyPI release 2023-03-22 21:07:27 +01:00
Guillaume Klein
33f41d84e3 Add job to push a package for each new Git tag 2023-03-22 21:01:53 +01:00
Guillaume Klein
c910ec0293 Bump version to 0.2.0 2023-03-22 20:54:07 +01:00
Guillaume Klein
e9dfe23eaa Complete the package metadata 2023-03-22 20:53:51 +01:00
Guillaume Klein
66efd02bd0 Run some automatic tests with GitHub Actions (#68) 2023-03-22 20:50:03 +01:00
Guillaume Klein
52264f2277 Fix typing for device_index argument 2023-03-22 13:51:12 +01:00
Guillaume Klein
c27c010f96 Ignore Unicode errors in input file metadata 2023-03-21 17:13:37 +01:00
Guillaume Klein
0ab8db2b37 Remove debug prints 2023-03-18 09:48:02 +01:00
Guillaume Klein
a70aac18ae Remove unused import 2023-03-18 09:47:02 +01:00
Guillaume Klein
d82be59d5f Fix unset attribute when using English-only models 2023-03-17 18:33:16 +01:00
Guillaume Klein
58f4447964 Update benchmark results with latest openai/whisper and faster-whisper 2023-03-17 16:44:07 +01:00
Guillaume Klein
cce6b53e45 Fix incorrect attribute access 2023-03-16 10:32:36 +01:00
Guillaume Klein
2007adf0b5 Fix typing of words attribute 2023-03-15 17:49:07 +01:00