whisper

Author	SHA1	Message	Date
heimoshuiyu	67fac2a4ce	Check multiple prompts	2023-04-18 19:08:43 +08:00
heimoshuiyu	6fc4e6f230	fix id start	2023-04-18 15:39:14 +08:00
heimoshuiyu	55756284ac	Ignore repeated prompt	2023-04-18 12:15:21 +08:00
Jong Wook Kim	c09a7ae299	Update decoding.py (#1219 )	2023-04-11 15:13:13 -07:00
Fernando O. Gallego	b0022b3283	Update decoding.py (#1155 ) * Update decoding.py Following the suggestions of @Jeronymous in https://github.com/openai/whisper/pull/914 and https://github.com/openai/whisper/discussions/924, it solves the problem of endless loop. * Removed blank line and whitespaces in empty lines. * Suggested changes according to the linter --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-04-11 15:06:03 -07:00
Arseniy Bushyn	76c901ab8d	Update README.md to reference tiktoken (#1105 ) Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-04-10 17:39:17 -07:00
ryanheise	43940fc978	Implement max line width and max line count, and make word highlighting optional (#1184 ) * Add highlight_words, max_line_width, max_line_count * Refactor subtitle generator --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-04-10 17:28:35 -07:00
ryanheise	255887f219	Squash long words at window and sentence boundaries. (#1114 ) * Squash long words at window and sentence boundaries. * Formatting requirements. * Fix squashing logic to point to correct words. --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-04-10 17:23:53 -07:00
K.B.Dharun Krishna	a151816b6b	python-publish.yml: bump actions version to fix node warning (#1211 )	2023-04-10 13:54:09 -07:00
Jong Wook Kim	b5851c6c40	Update tokenizer.py (#1163 )	2023-03-29 13:12:36 -07:00
Jong Wook Kim	6dea21fd7f	Release 20230314	2023-03-15 00:39:19 -07:00
Jong Wook Kim	79c43e4859	abort find_alignment on empty input (#1090 )	2023-03-14 12:47:58 -07:00
Guillaume Klein	5f9ac653b7	Fix truncated words list when the replacement character is decoded (#1089 )	2023-03-14 09:32:41 -07:00
Akash Mahajan	ba88b8e1b3	fix github language stats getting dominated by jupyter notebook (#1076 ) Co-authored-by: Akash Mahajan <akash.mahajan@microsoft.com> Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-03-14 00:07:09 -07:00
Guillaume Klein	671ac5a4ce	Fix alignment between the segments and the list of words (#1087 ) * Fix alignment between the segments and the list of words * Ensure the word index does not overflow	2023-03-13 16:34:09 -07:00
Jong Wook Kim	839639a223	Use tiktoken (#1044 ) * use tiktoken==0.3.0 * formatting * tuple should be safer * Update whisper/tokenizer.py Co-authored-by: Ruhollah Majdoddin <r.majdodin@gmail.com> * use tiktoken 0.3.1 * reflecting suggestions * cleanup * bypassing load_tiktoken_bpe to avoid blobfile dep --------- Co-authored-by: Ruhollah Majdoddin <r.majdodin@gmail.com>	2023-03-13 02:34:16 -07:00
Jong Wook Kim	ad3250a846	Release 20230308	2023-03-08 15:48:57 -08:00
Jong Wook Kim	c4b50c0824	kwargs in decode() for convenience (#1061 ) * kwargs in decode() for convenience * formatting fix	2023-03-08 15:46:38 -08:00
Jong Wook Kim	38f2f4d99d	fix all_tokens handling that caused more repetitions and discrepancy in JSON (#1060 )	2023-03-08 15:34:07 -08:00
Jong Wook Kim	aac47c9834	fix typo	2023-03-07 20:43:49 -08:00
Jong Wook Kim	26807ec6d3	Release 20230307	2023-03-07 20:36:29 -08:00
Jong Wook Kim	919a713499	attempt to fix the repetition/hallucination issue identified in #1046 (#1052 ) * attempt to fix the repetition/hallucination issue identified in #1046 * zero-pad the audio instead of spectrogram * formatting fix * delete debug print	2023-03-07 20:08:45 -08:00
Jong Wook Kim	38e990d853	Use triton==2.0.0 (#1053 )	2023-03-07 16:56:31 -08:00
Jong Wook Kim	924e1f8e06	Try installing triton only if linux & x86_64 (#1051 )	2023-03-07 11:31:40 -08:00
Jong Wook Kim	4b0d5e58d0	Update setup.py	2023-03-07 04:47:46 -08:00
Jong Wook Kim	8180fde939	Release 20230306	2023-03-06 18:53:04 -08:00
Local State	c6e4e5efb3	remove auxiliary audio extension (#1021 ) Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-03-06 17:48:14 -08:00
Jong Wook Kim	b80bcf610d	apply formatting with `black` (#1038 ) * applying black (with the default 88-column limit) * add flake8 * add isort * fix isort	2023-03-06 15:50:37 -08:00
Jong Wook Kim	500d0fe966	word-level timestamps in `transcribe()` (#869 ) * word-level timestamps in `transcribe()` * moving to `timing.py` * numba implementation for dtw, replacing dtw-python * triton implementation for dtw * add test for dtw implementations * triton implementation of median_filter * a simple word-level timestamps test * add scipy as dev dependency * installs an older version of Triton if CUDA < 11.4 * fix broken merge * loosen nvcc version match regex * find_alignment() function * miscellaneous improvements * skip median filtering when the input is too small * Expose punctuation options in cli and transcribe() (#973) * fix merge error * fix merge error 2 * annotating that word_timestamps is experimental --------- Co-authored-by: ryanheise <ryan@ryanheise.com>	2023-03-06 14:00:49 -08:00
Jong Wook Kim	eab8d920ed	Decoding improvements (#1033 ) * suppress task tokens (transcribe/translate) * not ignoring the last segment ending with one timestamp	2023-03-06 11:32:32 -08:00
Roman Vasilenko	3e1780fd37	Update README.md (#894 ) Fixed a few typos and made general improvements for clarity. Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-03-03 16:41:59 -08:00
Andrey Chernykh	7858aa9c08	Fix infinite loop caused by incorrect timestamp tokens prediction (#914 ) * Fix infinite loop caused by incorrect timestamp tokens prediction https://github.com/openai/whisper/discussions/810 * Update decoding.py --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-02-01 15:46:51 -08:00
Jong Wook Kim	5c1a8c10e7	clarify that 3.11 is not supported	2023-01-27 00:01:49 -08:00
Jong Wook Kim	4e635c6644	Update README.md about Python 3.8+ requirement	2023-01-24 14:45:56 -08:00
Jong Wook Kim	a6b36ede1f	drop python 3.7 support (#889 )	2023-01-24 14:05:57 -08:00
Jong Wook Kim	55f690af79	Release 20230124	2023-01-24 11:11:08 -08:00
Jong Wook Kim	7f1ef223ab	handle printing even if sys.stdout.buffer is not available (#887 )	2023-01-24 10:12:04 -08:00
Niels Mayer	f5bfe004ec	Add TSV formatted output in transcript, using integer start/end times in milliseconds. (#228 ) * Add CSV format output in transcript, containing lines of characters formatted like: <startTime-in-integer-milliseconds>, <endTime-in-integer-milliseconds>, <transcript-including-commas> * for easier reading by spreadsheets importing CSV, the third column of the CSV file is delimited by quotes, and any quote characters that might be in the transcript (which would interfere with parsing the third column as a string) are converted to "''". * fix syntax error * docstring edit Co-authored-by: Jong Wook Kim <jongwook@openai.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-22 00:27:17 -08:00
Aaryan YVS	da600abd2b	Added --output_format option (#333 ) * Added --output option --output option will help select the output files that will be generated. Corrected the logic, which wrongly shows progress bar when verbose is set to False * Changed output_files variable * Changed back the tqdm verbose * refactor output format handling Co-authored-by: Jong Wook Kim <jongwook@openai.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-21 23:58:38 -08:00
zer0-x	9f7aba6099	Handle XDG_CACHE_HOME properly for download_root (#864 ) Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-01-21 01:09:39 -08:00
Jong Wook Kim	12e1089462	use stdout for printing transcription progress (#867 )	2023-01-20 00:54:05 -08:00
Markus Hennerbichler	ea1c266709	Fix bug where mm is mistakenly replaced with hmm in e.g. 20mm (#659 ) Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-18 10:41:11 -08:00
Jong Wook Kim	8135a7c31c	verbose outputs from pytest	2023-01-18 10:30:18 -08:00
Jong Wook Kim	9d646db9d8	print '?' if a letter can't be encoded using the system default encoding (#859 )	2023-01-17 23:28:36 -08:00
Jong Wook Kim	37a4f1be6d	Release 20230117	2023-01-17 16:08:28 -08:00
Romain Beaumont	b9f9b433ae	Add github action to automatically push to pypi on Release x.y.z commit (#681 ) * Add github action to automatically push to pypi on Release x.y.z commit * some housekeeping for pypi upload * add version.py Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-17 15:50:26 -08:00
Umar Farooqi	f0083e7eb2	Use ndimage.median_filter instead of signal.medfilter (#812 ) For a 30s long audio file which didn't have any silence, ndimage.median_filter took 7s where signa.medfilter took 30s. Co-authored-by: Umar Farooqi <umar@paystash.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-17 14:43:05 -08:00
Jong Wook Kim	a84191faae	rename GitHub workflow	2023-01-17 13:54:40 -08:00
Jong Wook Kim	b1d213c0c7	allow test_transcribe to run on CPU when CUDA is not available	2023-01-17 13:43:36 -08:00
Jong Wook Kim	493dfffa37	add github action to run pytest	2023-01-17 13:38:33 -08:00

1 2 3

111 Commits