Commit Graph

65 Commits

Author SHA1 Message Date
Jong Wook Kim
839639a223 Use tiktoken (#1044)
* use tiktoken==0.3.0

* formatting

* tuple should be safer

* Update whisper/tokenizer.py

Co-authored-by: Ruhollah Majdoddin <r.majdodin@gmail.com>

* use tiktoken 0.3.1

* reflecting suggestions

* cleanup

* bypassing load_tiktoken_bpe to avoid blobfile dep

---------

Co-authored-by: Ruhollah Majdoddin <r.majdodin@gmail.com>
2023-03-13 02:34:16 -07:00
Jong Wook Kim
ad3250a846 Release 20230308 2023-03-08 15:48:57 -08:00
Jong Wook Kim
c4b50c0824 kwargs in decode() for convenience (#1061)
* kwargs in decode() for convenience

* formatting fix
2023-03-08 15:46:38 -08:00
Jong Wook Kim
38f2f4d99d fix all_tokens handling that caused more repetitions and discrepancy in JSON (#1060) 2023-03-08 15:34:07 -08:00
Jong Wook Kim
26807ec6d3 Release 20230307 2023-03-07 20:36:29 -08:00
Jong Wook Kim
919a713499 attempt to fix the repetition/hallucination issue identified in #1046 (#1052)
* attempt to fix the repetition/hallucination issue identified in #1046

* zero-pad the audio instead of spectrogram

* formatting fix

* delete debug print
2023-03-07 20:08:45 -08:00
Jong Wook Kim
8180fde939 Release 20230306 2023-03-06 18:53:04 -08:00
Local State
c6e4e5efb3 remove auxiliary audio extension (#1021)
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-03-06 17:48:14 -08:00
Jong Wook Kim
b80bcf610d apply formatting with black (#1038)
* applying black (with the default 88-column limit)

* add flake8

* add isort

* fix isort
2023-03-06 15:50:37 -08:00
Jong Wook Kim
500d0fe966 word-level timestamps in transcribe() (#869)
* word-level timestamps in `transcribe()`

* moving to `timing.py`

* numba implementation for dtw, replacing dtw-python

* triton implementation for dtw

* add test for dtw implementations

* triton implementation of median_filter

* a simple word-level timestamps test

* add scipy as dev dependency

* installs an older version of Triton if CUDA < 11.4

* fix broken merge

* loosen nvcc version match regex

* find_alignment() function

* miscellaneous improvements

* skip median filtering when the input is too small

* Expose punctuation options in cli and transcribe() (#973)

* fix merge error

* fix merge error 2

* annotating that word_timestamps is experimental

---------

Co-authored-by: ryanheise <ryan@ryanheise.com>
2023-03-06 14:00:49 -08:00
Jong Wook Kim
eab8d920ed Decoding improvements (#1033)
* suppress task tokens (transcribe/translate)

* not ignoring the last segment ending with one timestamp
2023-03-06 11:32:32 -08:00
Andrey Chernykh
7858aa9c08 Fix infinite loop caused by incorrect timestamp tokens prediction (#914)
* Fix infinite loop caused by incorrect timestamp tokens prediction

https://github.com/openai/whisper/discussions/810

* Update decoding.py

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-02-01 15:46:51 -08:00
Jong Wook Kim
a6b36ede1f drop python 3.7 support (#889) 2023-01-24 14:05:57 -08:00
Jong Wook Kim
55f690af79 Release 20230124 2023-01-24 11:11:08 -08:00
Jong Wook Kim
7f1ef223ab handle printing even if sys.stdout.buffer is not available (#887) 2023-01-24 10:12:04 -08:00
Niels Mayer
f5bfe004ec Add TSV formatted output in transcript, using integer start/end times in milliseconds. (#228)
* Add CSV format output in transcript, containing lines of characters formatted like: <startTime-in-integer-milliseconds>, <endTime-in-integer-milliseconds>, <transcript-including-commas>

* for easier reading by spreadsheets importing CSV, the third

column of the CSV file is delimited by quotes, and any quote
characters that might be in the transcript (which would interfere with
parsing the third column as a string) are converted to "''".

* fix syntax error

* docstring edit

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-22 00:27:17 -08:00
Aaryan YVS
da600abd2b Added --output_format option (#333)
* Added --output option

--output option will help select the output files that will be generated.

Corrected the logic, which wrongly shows progress bar when verbose is set to False

* Changed output_files variable

* Changed back the tqdm verbose

* refactor output format handling

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-21 23:58:38 -08:00
zer0-x
9f7aba6099 Handle XDG_CACHE_HOME properly for download_root (#864)
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-01-21 01:09:39 -08:00
Jong Wook Kim
12e1089462 use stdout for printing transcription progress (#867) 2023-01-20 00:54:05 -08:00
Markus Hennerbichler
ea1c266709 Fix bug where mm is mistakenly replaced with hmm in e.g. 20mm (#659)
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-18 10:41:11 -08:00
Jong Wook Kim
9d646db9d8 print '?' if a letter can't be encoded using the system default encoding (#859) 2023-01-17 23:28:36 -08:00
Romain Beaumont
b9f9b433ae Add github action to automatically push to pypi on Release x.y.z commit (#681)
* Add github action to automatically push to pypi on Release x.y.z commit

* some housekeeping for pypi upload

* add version.py

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-17 15:50:26 -08:00
Markus Hennerbichler
6df3ea1fb5 Support batch-dimension in log_mel_spectogram (#839) 2023-01-16 23:46:15 -08:00
adamreis
70861c7ce3 Fix tiny transcribe() docstring typo (#857)
s/successfully/successively, which I believe was the intent.
2023-01-16 22:42:01 -08:00
Jong Wook Kim
53807677fe MultiHeadAttention to return qk as well 2022-12-30 01:53:57 -07:00
Jong Wook Kim
9323b2526c Revert "saving the qk matrix in the attention module for convenience"
This reverts commit 68e44bd83c.
2022-12-29 23:53:31 -07:00
Jong Wook Kim
68e44bd83c saving the qk matrix in the attention module for convenience 2022-12-29 23:02:52 -07:00
altryne
b9265e5796 Update Hebrew language code to he per IANA registry (#401)
* Update Hebrew language code to he per IANA registry

Per [IANA registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), `iw` was deprecated as the code for Hebrew in 1989 and the preferred code is `he`

The correct subtag: 
```
%%
Type: language
Subtag: he
Description: Hebrew
Added: 2005-10-16
Suppress-Script: Hebr
%%
``` 
And the deprecation
```
%%
Type: language
Subtag: iw
Description: Hebrew
Added: 2005-10-16
Deprecated: 1989-01-01
Preferred-Value: he
Suppress-Script: Hebr
%%
```

* Update hebrew ISO code to he

Per discussion, it's ok to make this change without backwards compatibility
2022-12-07 13:45:31 -05:00
Paul Harter
fd8f80c8b8 Explicitly closing model file after reading it (#630) 2022-12-06 12:07:19 -05:00
Jong Wook Kim
4179ed2475 add large-v2 model
- The "large-v2" model is trained for more epochs with regularization and shows improved performance compared to the previous large.
- It has the same architecture as the original large model.
- When `load_model("large")` is called, the "large-v2" model will be loaded.
- We will soon update the paper regarding this new model.
2022-12-05 11:07:14 -05:00
jumon
ec1b34bb90 fix compression ratio function (#561) 2022-12-04 17:27:42 -06:00
Jong Wook Kim
eff383b27b invoking __call__ instead of forward() 2022-11-16 04:18:50 -08:00
Jong Wook Kim
02aa851a49 fix to return only the text token ids 2022-11-15 16:25:11 -08:00
jumon
76148a56c5 suppress generating non-timestamp tokens at the beginning (#532) 2022-11-15 11:44:36 -08:00
Vicki Anand
9f70a352f9 Fix attention caching to make it actually work (#370) 2022-10-19 16:44:03 -07:00
Michael Monashev
f680570016 Fix bug (#305)
Fix bug: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)
2022-10-17 11:38:20 -07:00
Jong Wook Kim
d18e9ea5dd transcribe() on English-only model won't complain when language="en" is not given 2022-10-09 02:40:12 -07:00
David Marx
82725cea9c infer download_root from XDG_CACHE_HOME if avail (#257) 2022-10-09 02:14:03 -07:00
eudoxos
35713c66e0 Add --threads option to transcribe (#278)
* Add --threads option to transcribe

Torch on CPU uses by default number_of_cores/2. This option allows to
override this default.

* Update transcribe.py

Co-authored-by: Jong Wook Kim <ilikekjw@gmail.com>
2022-10-09 02:11:15 -07:00
Corentin Jemine
9e653bd0ea Fixed CoW RuntimeError in DecodingTask.run() (#240) 2022-10-04 08:49:31 -07:00
Tom Stuart
02b74308ff Fix timestamps and strip extraneous whitespace in WebVTT output (#219)
* Use two-digit hours in WebVTT timestamps

Per the WebVTT specification [0]:

> A WebVTT timestamp consists of the following components, in the given
> order:
>
> 1. Optionally (required if hours is non-zero):
>   1. Two or more ASCII digits, representing the hours as a base ten
>      integer.
>   2. A U+003A COLON character (:)

YouTube won’t accept timestamps containing single-digit hours.

[0] https://www.w3.org/TR/webvtt1/#webvtt-timestamp

* Strip segment text in WebVTT output

We already do this for plain text and SubRip output, so we should do it
for WebVTT too.
2022-10-03 14:51:07 -07:00
Jibin Mathew
0b1ba3d46e Add model_dir to arguments (#202)
* Add model_dir to arguments

* minor formatting change

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2022-09-30 14:45:51 -07:00
Caleb McQuillin
60132ade70 Use , character instead of . for SRT output. (#197)
The SRT format uses the decimal comma character as the fractional separator rather than the decimal point character. Adjust format_timestamp and write_srt to specify the separator character.

See https://en.wikipedia.org/wiki/SubRip#:~:text=the%20fractional%20separator%20used%20is%20the%20comma%2C%20since%20the%20program%20was%20written%20in%20france.
2022-09-29 20:44:12 -07:00
Jong Wook Kim
7cb4cc21bf allowing nonzero initial temperature 2022-09-29 18:05:12 -07:00
sawadata
deafef05f3 Update audio.py (#178)
add '-nostdin' argument
2022-09-29 12:34:04 -07:00
Vicki Anand
2b0c2971af Don't update duration if last timestamp is same as begin (#191) 2022-09-29 12:27:48 -07:00
Jong Wook Kim
62fe7f1009 patience definition to match the paper 2022-09-27 19:00:41 -07:00
Nick Konovalchuk
b4308c4782 fix: transcribe verbosity (#140) 2022-09-26 11:46:21 -07:00
Michael Goin
9c8183a179 Use PyTorch as logits transpose for ONNX support (#141) 2022-09-26 10:54:26 -07:00
VulumeCode
2037b65f3f Context prompt (#128)
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2022-09-26 05:22:33 -07:00