whisper

Author	SHA1	Message	Date
Jong Wook Kim	924e1f8e06	Try installing triton only if linux & x86_64 (#1051 )	2023-03-07 11:31:40 -08:00
Jong Wook Kim	4b0d5e58d0	Update setup.py	2023-03-07 04:47:46 -08:00
Jong Wook Kim	8180fde939	Release 20230306	2023-03-06 18:53:04 -08:00
Local State	c6e4e5efb3	remove auxiliary audio extension (#1021 ) Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-03-06 17:48:14 -08:00
Jong Wook Kim	b80bcf610d	apply formatting with `black` (#1038 ) * applying black (with the default 88-column limit) * add flake8 * add isort * fix isort	2023-03-06 15:50:37 -08:00
Jong Wook Kim	500d0fe966	word-level timestamps in `transcribe()` (#869 ) * word-level timestamps in `transcribe()` * moving to `timing.py` * numba implementation for dtw, replacing dtw-python * triton implementation for dtw * add test for dtw implementations * triton implementation of median_filter * a simple word-level timestamps test * add scipy as dev dependency * installs an older version of Triton if CUDA < 11.4 * fix broken merge * loosen nvcc version match regex * find_alignment() function * miscellaneous improvements * skip median filtering when the input is too small * Expose punctuation options in cli and transcribe() (#973) * fix merge error * fix merge error 2 * annotating that word_timestamps is experimental --------- Co-authored-by: ryanheise <ryan@ryanheise.com>	2023-03-06 14:00:49 -08:00
Jong Wook Kim	eab8d920ed	Decoding improvements (#1033 ) * suppress task tokens (transcribe/translate) * not ignoring the last segment ending with one timestamp	2023-03-06 11:32:32 -08:00
Roman Vasilenko	3e1780fd37	Update README.md (#894 ) Fixed a few typos and made general improvements for clarity. Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-03-03 16:41:59 -08:00
Andrey Chernykh	7858aa9c08	Fix infinite loop caused by incorrect timestamp tokens prediction (#914 ) * Fix infinite loop caused by incorrect timestamp tokens prediction https://github.com/openai/whisper/discussions/810 * Update decoding.py --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-02-01 15:46:51 -08:00
Jong Wook Kim	5c1a8c10e7	clarify that 3.11 is not supported	2023-01-27 00:01:49 -08:00
Jong Wook Kim	4e635c6644	Update README.md about Python 3.8+ requirement	2023-01-24 14:45:56 -08:00
Jong Wook Kim	a6b36ede1f	drop python 3.7 support (#889 )	2023-01-24 14:05:57 -08:00
Jong Wook Kim	55f690af79	Release 20230124	2023-01-24 11:11:08 -08:00
Jong Wook Kim	7f1ef223ab	handle printing even if sys.stdout.buffer is not available (#887 )	2023-01-24 10:12:04 -08:00
Niels Mayer	f5bfe004ec	Add TSV formatted output in transcript, using integer start/end times in milliseconds. (#228 ) * Add CSV format output in transcript, containing lines of characters formatted like: <startTime-in-integer-milliseconds>, <endTime-in-integer-milliseconds>, <transcript-including-commas> * for easier reading by spreadsheets importing CSV, the third column of the CSV file is delimited by quotes, and any quote characters that might be in the transcript (which would interfere with parsing the third column as a string) are converted to "''". * fix syntax error * docstring edit Co-authored-by: Jong Wook Kim <jongwook@openai.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-22 00:27:17 -08:00
Aaryan YVS	da600abd2b	Added --output_format option (#333 ) * Added --output option --output option will help select the output files that will be generated. Corrected the logic, which wrongly shows progress bar when verbose is set to False * Changed output_files variable * Changed back the tqdm verbose * refactor output format handling Co-authored-by: Jong Wook Kim <jongwook@openai.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-21 23:58:38 -08:00
zer0-x	9f7aba6099	Handle XDG_CACHE_HOME properly for download_root (#864 ) Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-01-21 01:09:39 -08:00
Jong Wook Kim	12e1089462	use stdout for printing transcription progress (#867 )	2023-01-20 00:54:05 -08:00
Markus Hennerbichler	ea1c266709	Fix bug where mm is mistakenly replaced with hmm in e.g. 20mm (#659 ) Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-18 10:41:11 -08:00
Jong Wook Kim	8135a7c31c	verbose outputs from pytest	2023-01-18 10:30:18 -08:00
Jong Wook Kim	9d646db9d8	print '?' if a letter can't be encoded using the system default encoding (#859 )	2023-01-17 23:28:36 -08:00
Jong Wook Kim	37a4f1be6d	Release 20230117	2023-01-17 16:08:28 -08:00
Romain Beaumont	b9f9b433ae	Add github action to automatically push to pypi on Release x.y.z commit (#681 ) * Add github action to automatically push to pypi on Release x.y.z commit * some housekeeping for pypi upload * add version.py Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-17 15:50:26 -08:00
Umar Farooqi	f0083e7eb2	Use ndimage.median_filter instead of signal.medfilter (#812 ) For a 30s long audio file which didn't have any silence, ndimage.median_filter took 7s where signa.medfilter took 30s. Co-authored-by: Umar Farooqi <umar@paystash.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-17 14:43:05 -08:00
Jong Wook Kim	a84191faae	rename GitHub workflow	2023-01-17 13:54:40 -08:00
Jong Wook Kim	b1d213c0c7	allow test_transcribe to run on CPU when CUDA is not available	2023-01-17 13:43:36 -08:00
Jong Wook Kim	493dfffa37	add github action to run pytest	2023-01-17 13:38:33 -08:00
Mikko Vedru	0f39c89d92	Update README.md (#804 )	2023-01-16 23:46:42 -08:00
Markus Hennerbichler	6df3ea1fb5	Support batch-dimension in log_mel_spectogram (#839 )	2023-01-16 23:46:15 -08:00
adamreis	70861c7ce3	Fix tiny transcribe() docstring typo (#857 ) s/successfully/successively, which I believe was the intent.	2023-01-16 22:42:01 -08:00
Jong Wook Kim	f82bc59f5e	torch.concatenate -> torch.cat for compatibility	2023-01-10 10:53:18 -08:00
Jong Wook Kim	28769fcfe5	word-level timestamps in Multilingual_ASR notebook	2022-12-31 10:03:42 -07:00
Jong Wook Kim	53807677fe	MultiHeadAttention to return qk as well	2022-12-30 01:53:57 -07:00
Jong Wook Kim	9323b2526c	Revert "saving the qk matrix in the attention module for convenience" This reverts commit `68e44bd83c`.	2022-12-29 23:53:31 -07:00
Jong Wook Kim	68e44bd83c	saving the qk matrix in the attention module for convenience	2022-12-29 23:02:52 -07:00
Jong Wook Kim	0b5dcfdef7	large-v2 figure and arxiv url update	2022-12-09 00:12:39 -05:00
altryne	b9265e5796	Update Hebrew language code to he per IANA registry (#401 ) * Update Hebrew language code to he per IANA registry Per [IANA registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), `iw` was deprecated as the code for Hebrew in 1989 and the preferred code is `he` The correct subtag: ``` %% Type: language Subtag: he Description: Hebrew Added: 2005-10-16 Suppress-Script: Hebr %% ``` And the deprecation ``` %% Type: language Subtag: iw Description: Hebrew Added: 2005-10-16 Deprecated: 1989-01-01 Preferred-Value: he Suppress-Script: Hebr %% ``` * Update hebrew ISO code to he Per discussion, it's ok to make this change without backwards compatibility	2022-12-07 13:45:31 -05:00
Paul Harter	fd8f80c8b8	Explicitly closing model file after reading it (#630 )	2022-12-06 12:07:19 -05:00
Jong Wook Kim	4179ed2475	add large-v2 model - The "large-v2" model is trained for more epochs with regularization and shows improved performance compared to the previous large. - It has the same architecture as the original large model. - When `load_model("large")` is called, the "large-v2" model will be loaded. - We will soon update the paper regarding this new model.	2022-12-05 11:07:14 -05:00
jumon	ec1b34bb90	fix compression ratio function (#561 )	2022-12-04 17:27:42 -06:00
Jong Wook Kim	eff383b27b	invoking __call__ instead of forward()	2022-11-16 04:18:50 -08:00
Jong Wook Kim	02aa851a49	fix to return only the text token ids	2022-11-15 16:25:11 -08:00
jumon	76148a56c5	suppress generating non-timestamp tokens at the beginning (#532 )	2022-11-15 11:44:36 -08:00
Vicki Anand	9f70a352f9	Fix attention caching to make it actually work (#370 )	2022-10-19 16:44:03 -07:00
Sumana Harihareswara	7f3e408e09	Add package metadata to setup.py (#315 ) Add project summary, license, etc. for display with "pip show" and similar Python package distribution tools.	2022-10-17 13:51:16 -07:00
Michael Monashev	f680570016	Fix bug (#305 ) Fix bug: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)	2022-10-17 11:38:20 -07:00
Jong Wook Kim	d18e9ea5dd	transcribe() on English-only model won't complain when language="en" is not given	2022-10-09 02:40:12 -07:00
David Marx	82725cea9c	infer download_root from XDG_CACHE_HOME if avail (#257 )	2022-10-09 02:14:03 -07:00
eudoxos	35713c66e0	Add --threads option to transcribe (#278 ) * Add --threads option to transcribe Torch on CPU uses by default number_of_cores/2. This option allows to override this default. * Update transcribe.py Co-authored-by: Jong Wook Kim <ilikekjw@gmail.com>	2022-10-09 02:11:15 -07:00
Corentin Jemine	9e653bd0ea	Fixed CoW RuntimeError in DecodingTask.run() (#240 )	2022-10-04 08:49:31 -07:00

1 2

88 Commits