Commit Graph

111 Commits

Author SHA1 Message Date
67fac2a4ce Check multiple prompts 2023-04-18 19:08:43 +08:00
6fc4e6f230 fix id start 2023-04-18 15:39:14 +08:00
55756284ac Ignore repeated prompt 2023-04-18 12:15:21 +08:00
Jong Wook Kim
c09a7ae299 Update decoding.py (#1219) 2023-04-11 15:13:13 -07:00
Fernando O. Gallego
b0022b3283 Update decoding.py (#1155)
* Update decoding.py

Following the suggestions of @Jeronymous in https://github.com/openai/whisper/pull/914 and https://github.com/openai/whisper/discussions/924, it solves the problem of endless loop.

* Removed blank line and whitespaces in empty lines.

* Suggested changes according to the linter

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-04-11 15:06:03 -07:00
Arseniy Bushyn
76c901ab8d Update README.md to reference tiktoken (#1105)
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-04-10 17:39:17 -07:00
ryanheise
43940fc978 Implement max line width and max line count, and make word highlighting optional (#1184)
* Add highlight_words, max_line_width, max_line_count

* Refactor subtitle generator

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-04-10 17:28:35 -07:00
ryanheise
255887f219 Squash long words at window and sentence boundaries. (#1114)
* Squash long words at window and sentence boundaries.

* Formatting requirements.

* Fix squashing logic to point to correct words.

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-04-10 17:23:53 -07:00
K.B.Dharun Krishna
a151816b6b python-publish.yml: bump actions version to fix node warning (#1211) 2023-04-10 13:54:09 -07:00
Jong Wook Kim
b5851c6c40 Update tokenizer.py (#1163) 2023-03-29 13:12:36 -07:00
Jong Wook Kim
6dea21fd7f Release 20230314 2023-03-15 00:39:19 -07:00
Jong Wook Kim
79c43e4859 abort find_alignment on empty input (#1090) 2023-03-14 12:47:58 -07:00
Guillaume Klein
5f9ac653b7 Fix truncated words list when the replacement character is decoded (#1089) 2023-03-14 09:32:41 -07:00
Akash Mahajan
ba88b8e1b3 fix github language stats getting dominated by jupyter notebook (#1076)
Co-authored-by: Akash Mahajan <akash.mahajan@microsoft.com>
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-03-14 00:07:09 -07:00
Guillaume Klein
671ac5a4ce Fix alignment between the segments and the list of words (#1087)
* Fix alignment between the segments and the list of words

* Ensure the word index does not overflow
2023-03-13 16:34:09 -07:00
Jong Wook Kim
839639a223 Use tiktoken (#1044)
* use tiktoken==0.3.0

* formatting

* tuple should be safer

* Update whisper/tokenizer.py

Co-authored-by: Ruhollah Majdoddin <r.majdodin@gmail.com>

* use tiktoken 0.3.1

* reflecting suggestions

* cleanup

* bypassing load_tiktoken_bpe to avoid blobfile dep

---------

Co-authored-by: Ruhollah Majdoddin <r.majdodin@gmail.com>
2023-03-13 02:34:16 -07:00
Jong Wook Kim
ad3250a846 Release 20230308 2023-03-08 15:48:57 -08:00
Jong Wook Kim
c4b50c0824 kwargs in decode() for convenience (#1061)
* kwargs in decode() for convenience

* formatting fix
2023-03-08 15:46:38 -08:00
Jong Wook Kim
38f2f4d99d fix all_tokens handling that caused more repetitions and discrepancy in JSON (#1060) 2023-03-08 15:34:07 -08:00
Jong Wook Kim
aac47c9834 fix typo 2023-03-07 20:43:49 -08:00
Jong Wook Kim
26807ec6d3 Release 20230307 2023-03-07 20:36:29 -08:00
Jong Wook Kim
919a713499 attempt to fix the repetition/hallucination issue identified in #1046 (#1052)
* attempt to fix the repetition/hallucination issue identified in #1046

* zero-pad the audio instead of spectrogram

* formatting fix

* delete debug print
2023-03-07 20:08:45 -08:00
Jong Wook Kim
38e990d853 Use triton==2.0.0 (#1053) 2023-03-07 16:56:31 -08:00
Jong Wook Kim
924e1f8e06 Try installing triton only if linux & x86_64 (#1051) 2023-03-07 11:31:40 -08:00
Jong Wook Kim
4b0d5e58d0 Update setup.py 2023-03-07 04:47:46 -08:00
Jong Wook Kim
8180fde939 Release 20230306 2023-03-06 18:53:04 -08:00
Local State
c6e4e5efb3 remove auxiliary audio extension (#1021)
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-03-06 17:48:14 -08:00
Jong Wook Kim
b80bcf610d apply formatting with black (#1038)
* applying black (with the default 88-column limit)

* add flake8

* add isort

* fix isort
2023-03-06 15:50:37 -08:00
Jong Wook Kim
500d0fe966 word-level timestamps in transcribe() (#869)
* word-level timestamps in `transcribe()`

* moving to `timing.py`

* numba implementation for dtw, replacing dtw-python

* triton implementation for dtw

* add test for dtw implementations

* triton implementation of median_filter

* a simple word-level timestamps test

* add scipy as dev dependency

* installs an older version of Triton if CUDA < 11.4

* fix broken merge

* loosen nvcc version match regex

* find_alignment() function

* miscellaneous improvements

* skip median filtering when the input is too small

* Expose punctuation options in cli and transcribe() (#973)

* fix merge error

* fix merge error 2

* annotating that word_timestamps is experimental

---------

Co-authored-by: ryanheise <ryan@ryanheise.com>
2023-03-06 14:00:49 -08:00
Jong Wook Kim
eab8d920ed Decoding improvements (#1033)
* suppress task tokens (transcribe/translate)

* not ignoring the last segment ending with one timestamp
2023-03-06 11:32:32 -08:00
Roman Vasilenko
3e1780fd37 Update README.md (#894)
Fixed a few typos and made general improvements for clarity.

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-03-03 16:41:59 -08:00
Andrey Chernykh
7858aa9c08 Fix infinite loop caused by incorrect timestamp tokens prediction (#914)
* Fix infinite loop caused by incorrect timestamp tokens prediction

https://github.com/openai/whisper/discussions/810

* Update decoding.py

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-02-01 15:46:51 -08:00
Jong Wook Kim
5c1a8c10e7 clarify that 3.11 is not supported 2023-01-27 00:01:49 -08:00
Jong Wook Kim
4e635c6644 Update README.md about Python 3.8+ requirement 2023-01-24 14:45:56 -08:00
Jong Wook Kim
a6b36ede1f drop python 3.7 support (#889) 2023-01-24 14:05:57 -08:00
Jong Wook Kim
55f690af79 Release 20230124 2023-01-24 11:11:08 -08:00
Jong Wook Kim
7f1ef223ab handle printing even if sys.stdout.buffer is not available (#887) 2023-01-24 10:12:04 -08:00
Niels Mayer
f5bfe004ec Add TSV formatted output in transcript, using integer start/end times in milliseconds. (#228)
* Add CSV format output in transcript, containing lines of characters formatted like: <startTime-in-integer-milliseconds>, <endTime-in-integer-milliseconds>, <transcript-including-commas>

* for easier reading by spreadsheets importing CSV, the third

column of the CSV file is delimited by quotes, and any quote
characters that might be in the transcript (which would interfere with
parsing the third column as a string) are converted to "''".

* fix syntax error

* docstring edit

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-22 00:27:17 -08:00
Aaryan YVS
da600abd2b Added --output_format option (#333)
* Added --output option

--output option will help select the output files that will be generated.

Corrected the logic, which wrongly shows progress bar when verbose is set to False

* Changed output_files variable

* Changed back the tqdm verbose

* refactor output format handling

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-21 23:58:38 -08:00
zer0-x
9f7aba6099 Handle XDG_CACHE_HOME properly for download_root (#864)
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-01-21 01:09:39 -08:00
Jong Wook Kim
12e1089462 use stdout for printing transcription progress (#867) 2023-01-20 00:54:05 -08:00
Markus Hennerbichler
ea1c266709 Fix bug where mm is mistakenly replaced with hmm in e.g. 20mm (#659)
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-18 10:41:11 -08:00
Jong Wook Kim
8135a7c31c verbose outputs from pytest 2023-01-18 10:30:18 -08:00
Jong Wook Kim
9d646db9d8 print '?' if a letter can't be encoded using the system default encoding (#859) 2023-01-17 23:28:36 -08:00
Jong Wook Kim
37a4f1be6d Release 20230117 2023-01-17 16:08:28 -08:00
Romain Beaumont
b9f9b433ae Add github action to automatically push to pypi on Release x.y.z commit (#681)
* Add github action to automatically push to pypi on Release x.y.z commit

* some housekeeping for pypi upload

* add version.py

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-17 15:50:26 -08:00
Umar Farooqi
f0083e7eb2 Use ndimage.median_filter instead of signal.medfilter (#812)
For a 30s long audio file which didn't have any silence, ndimage.median_filter took 7s where signa.medfilter took 30s.

Co-authored-by: Umar Farooqi <umar@paystash.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-17 14:43:05 -08:00
Jong Wook Kim
a84191faae rename GitHub workflow 2023-01-17 13:54:40 -08:00
Jong Wook Kim
b1d213c0c7 allow test_transcribe to run on CPU when CUDA is not available 2023-01-17 13:43:36 -08:00
Jong Wook Kim
493dfffa37 add github action to run pytest 2023-01-17 13:38:33 -08:00