whisper

Author	SHA1	Message	Date
Jong Wook Kim	839639a223	Use tiktoken (#1044 ) * use tiktoken==0.3.0 * formatting * tuple should be safer * Update whisper/tokenizer.py Co-authored-by: Ruhollah Majdoddin <r.majdodin@gmail.com> * use tiktoken 0.3.1 * reflecting suggestions * cleanup * bypassing load_tiktoken_bpe to avoid blobfile dep --------- Co-authored-by: Ruhollah Majdoddin <r.majdodin@gmail.com>	2023-03-13 02:34:16 -07:00
Jong Wook Kim	ad3250a846	Release 20230308	2023-03-08 15:48:57 -08:00
Jong Wook Kim	c4b50c0824	kwargs in decode() for convenience (#1061 ) * kwargs in decode() for convenience * formatting fix	2023-03-08 15:46:38 -08:00
Jong Wook Kim	38f2f4d99d	fix all_tokens handling that caused more repetitions and discrepancy in JSON (#1060 )	2023-03-08 15:34:07 -08:00
Jong Wook Kim	26807ec6d3	Release 20230307	2023-03-07 20:36:29 -08:00
Jong Wook Kim	919a713499	attempt to fix the repetition/hallucination issue identified in #1046 (#1052 ) * attempt to fix the repetition/hallucination issue identified in #1046 * zero-pad the audio instead of spectrogram * formatting fix * delete debug print	2023-03-07 20:08:45 -08:00
Jong Wook Kim	8180fde939	Release 20230306	2023-03-06 18:53:04 -08:00
Local State	c6e4e5efb3	remove auxiliary audio extension (#1021 ) Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-03-06 17:48:14 -08:00
Jong Wook Kim	b80bcf610d	apply formatting with `black` (#1038 ) * applying black (with the default 88-column limit) * add flake8 * add isort * fix isort	2023-03-06 15:50:37 -08:00
Jong Wook Kim	500d0fe966	word-level timestamps in `transcribe()` (#869 ) * word-level timestamps in `transcribe()` * moving to `timing.py` * numba implementation for dtw, replacing dtw-python * triton implementation for dtw * add test for dtw implementations * triton implementation of median_filter * a simple word-level timestamps test * add scipy as dev dependency * installs an older version of Triton if CUDA < 11.4 * fix broken merge * loosen nvcc version match regex * find_alignment() function * miscellaneous improvements * skip median filtering when the input is too small * Expose punctuation options in cli and transcribe() (#973) * fix merge error * fix merge error 2 * annotating that word_timestamps is experimental --------- Co-authored-by: ryanheise <ryan@ryanheise.com>	2023-03-06 14:00:49 -08:00
Jong Wook Kim	eab8d920ed	Decoding improvements (#1033 ) * suppress task tokens (transcribe/translate) * not ignoring the last segment ending with one timestamp	2023-03-06 11:32:32 -08:00
Andrey Chernykh	7858aa9c08	Fix infinite loop caused by incorrect timestamp tokens prediction (#914 ) * Fix infinite loop caused by incorrect timestamp tokens prediction https://github.com/openai/whisper/discussions/810 * Update decoding.py --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-02-01 15:46:51 -08:00
Jong Wook Kim	a6b36ede1f	drop python 3.7 support (#889 )	2023-01-24 14:05:57 -08:00
Jong Wook Kim	55f690af79	Release 20230124	2023-01-24 11:11:08 -08:00
Jong Wook Kim	7f1ef223ab	handle printing even if sys.stdout.buffer is not available (#887 )	2023-01-24 10:12:04 -08:00
Niels Mayer	f5bfe004ec	Add TSV formatted output in transcript, using integer start/end times in milliseconds. (#228 ) * Add CSV format output in transcript, containing lines of characters formatted like: <startTime-in-integer-milliseconds>, <endTime-in-integer-milliseconds>, <transcript-including-commas> * for easier reading by spreadsheets importing CSV, the third column of the CSV file is delimited by quotes, and any quote characters that might be in the transcript (which would interfere with parsing the third column as a string) are converted to "''". * fix syntax error * docstring edit Co-authored-by: Jong Wook Kim <jongwook@openai.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-22 00:27:17 -08:00
Aaryan YVS	da600abd2b	Added --output_format option (#333 ) * Added --output option --output option will help select the output files that will be generated. Corrected the logic, which wrongly shows progress bar when verbose is set to False * Changed output_files variable * Changed back the tqdm verbose * refactor output format handling Co-authored-by: Jong Wook Kim <jongwook@openai.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-21 23:58:38 -08:00
zer0-x	9f7aba6099	Handle XDG_CACHE_HOME properly for download_root (#864 ) Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-01-21 01:09:39 -08:00
Jong Wook Kim	12e1089462	use stdout for printing transcription progress (#867 )	2023-01-20 00:54:05 -08:00
Markus Hennerbichler	ea1c266709	Fix bug where mm is mistakenly replaced with hmm in e.g. 20mm (#659 ) Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-18 10:41:11 -08:00
Jong Wook Kim	9d646db9d8	print '?' if a letter can't be encoded using the system default encoding (#859 )	2023-01-17 23:28:36 -08:00
Romain Beaumont	b9f9b433ae	Add github action to automatically push to pypi on Release x.y.z commit (#681 ) * Add github action to automatically push to pypi on Release x.y.z commit * some housekeeping for pypi upload * add version.py Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-17 15:50:26 -08:00
Markus Hennerbichler	6df3ea1fb5	Support batch-dimension in log_mel_spectogram (#839 )	2023-01-16 23:46:15 -08:00
adamreis	70861c7ce3	Fix tiny transcribe() docstring typo (#857 ) s/successfully/successively, which I believe was the intent.	2023-01-16 22:42:01 -08:00
Jong Wook Kim	53807677fe	MultiHeadAttention to return qk as well	2022-12-30 01:53:57 -07:00
Jong Wook Kim	9323b2526c	Revert "saving the qk matrix in the attention module for convenience" This reverts commit `68e44bd83c`.	2022-12-29 23:53:31 -07:00
Jong Wook Kim	68e44bd83c	saving the qk matrix in the attention module for convenience	2022-12-29 23:02:52 -07:00
altryne	b9265e5796	Update Hebrew language code to he per IANA registry (#401 ) * Update Hebrew language code to he per IANA registry Per [IANA registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), `iw` was deprecated as the code for Hebrew in 1989 and the preferred code is `he` The correct subtag: ``` %% Type: language Subtag: he Description: Hebrew Added: 2005-10-16 Suppress-Script: Hebr %% ``` And the deprecation ``` %% Type: language Subtag: iw Description: Hebrew Added: 2005-10-16 Deprecated: 1989-01-01 Preferred-Value: he Suppress-Script: Hebr %% ``` * Update hebrew ISO code to he Per discussion, it's ok to make this change without backwards compatibility	2022-12-07 13:45:31 -05:00
Paul Harter	fd8f80c8b8	Explicitly closing model file after reading it (#630 )	2022-12-06 12:07:19 -05:00
Jong Wook Kim	4179ed2475	add large-v2 model - The "large-v2" model is trained for more epochs with regularization and shows improved performance compared to the previous large. - It has the same architecture as the original large model. - When `load_model("large")` is called, the "large-v2" model will be loaded. - We will soon update the paper regarding this new model.	2022-12-05 11:07:14 -05:00
jumon	ec1b34bb90	fix compression ratio function (#561 )	2022-12-04 17:27:42 -06:00
Jong Wook Kim	eff383b27b	invoking __call__ instead of forward()	2022-11-16 04:18:50 -08:00
Jong Wook Kim	02aa851a49	fix to return only the text token ids	2022-11-15 16:25:11 -08:00
jumon	76148a56c5	suppress generating non-timestamp tokens at the beginning (#532 )	2022-11-15 11:44:36 -08:00
Vicki Anand	9f70a352f9	Fix attention caching to make it actually work (#370 )	2022-10-19 16:44:03 -07:00
Michael Monashev	f680570016	Fix bug (#305 ) Fix bug: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)	2022-10-17 11:38:20 -07:00
Jong Wook Kim	d18e9ea5dd	transcribe() on English-only model won't complain when language="en" is not given	2022-10-09 02:40:12 -07:00
David Marx	82725cea9c	infer download_root from XDG_CACHE_HOME if avail (#257 )	2022-10-09 02:14:03 -07:00
eudoxos	35713c66e0	Add --threads option to transcribe (#278 ) * Add --threads option to transcribe Torch on CPU uses by default number_of_cores/2. This option allows to override this default. * Update transcribe.py Co-authored-by: Jong Wook Kim <ilikekjw@gmail.com>	2022-10-09 02:11:15 -07:00
Corentin Jemine	9e653bd0ea	Fixed CoW RuntimeError in DecodingTask.run() (#240 )	2022-10-04 08:49:31 -07:00
Tom Stuart	02b74308ff	Fix timestamps and strip extraneous whitespace in WebVTT output (#219 ) * Use two-digit hours in WebVTT timestamps Per the WebVTT specification [0]: > A WebVTT timestamp consists of the following components, in the given > order: > > 1. Optionally (required if hours is non-zero): > 1. Two or more ASCII digits, representing the hours as a base ten > integer. > 2. A U+003A COLON character (:) YouTube won’t accept timestamps containing single-digit hours. [0] https://www.w3.org/TR/webvtt1/#webvtt-timestamp * Strip segment text in WebVTT output We already do this for plain text and SubRip output, so we should do it for WebVTT too.	2022-10-03 14:51:07 -07:00
Jibin Mathew	0b1ba3d46e	Add model_dir to arguments (#202 ) * Add model_dir to arguments * minor formatting change Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2022-09-30 14:45:51 -07:00
Caleb McQuillin	60132ade70	Use , character instead of . for SRT output. (#197 ) The SRT format uses the decimal comma character as the fractional separator rather than the decimal point character. Adjust format_timestamp and write_srt to specify the separator character. See https://en.wikipedia.org/wiki/SubRip#:~:text=the%20fractional%20separator%20used%20is%20the%20comma%2C%20since%20the%20program%20was%20written%20in%20france.	2022-09-29 20:44:12 -07:00
Jong Wook Kim	7cb4cc21bf	allowing nonzero initial temperature	2022-09-29 18:05:12 -07:00
sawadata	deafef05f3	Update audio.py (#178 ) add '-nostdin' argument	2022-09-29 12:34:04 -07:00
Vicki Anand	2b0c2971af	Don't update duration if last timestamp is same as begin (#191 )	2022-09-29 12:27:48 -07:00
Jong Wook Kim	62fe7f1009	patience definition to match the paper	2022-09-27 19:00:41 -07:00
Nick Konovalchuk	b4308c4782	fix: transcribe verbosity (#140 )	2022-09-26 11:46:21 -07:00
Michael Goin	9c8183a179	Use PyTorch as logits transpose for ONNX support (#141 )	2022-09-26 10:54:26 -07:00
VulumeCode	2037b65f3f	Context prompt (#128 ) Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2022-09-26 05:22:33 -07:00

1 2

65 Commits