Commit Graph

88 Commits

Author SHA1 Message Date
Jong Wook Kim
924e1f8e06 Try installing triton only if linux & x86_64 (#1051) 2023-03-07 11:31:40 -08:00
Jong Wook Kim
4b0d5e58d0 Update setup.py 2023-03-07 04:47:46 -08:00
Jong Wook Kim
8180fde939 Release 20230306 2023-03-06 18:53:04 -08:00
Local State
c6e4e5efb3 remove auxiliary audio extension (#1021)
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-03-06 17:48:14 -08:00
Jong Wook Kim
b80bcf610d apply formatting with black (#1038)
* applying black (with the default 88-column limit)

* add flake8

* add isort

* fix isort
2023-03-06 15:50:37 -08:00
Jong Wook Kim
500d0fe966 word-level timestamps in transcribe() (#869)
* word-level timestamps in `transcribe()`

* moving to `timing.py`

* numba implementation for dtw, replacing dtw-python

* triton implementation for dtw

* add test for dtw implementations

* triton implementation of median_filter

* a simple word-level timestamps test

* add scipy as dev dependency

* installs an older version of Triton if CUDA < 11.4

* fix broken merge

* loosen nvcc version match regex

* find_alignment() function

* miscellaneous improvements

* skip median filtering when the input is too small

* Expose punctuation options in cli and transcribe() (#973)

* fix merge error

* fix merge error 2

* annotating that word_timestamps is experimental

---------

Co-authored-by: ryanheise <ryan@ryanheise.com>
2023-03-06 14:00:49 -08:00
Jong Wook Kim
eab8d920ed Decoding improvements (#1033)
* suppress task tokens (transcribe/translate)

* not ignoring the last segment ending with one timestamp
2023-03-06 11:32:32 -08:00
Roman Vasilenko
3e1780fd37 Update README.md (#894)
Fixed a few typos and made general improvements for clarity.

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-03-03 16:41:59 -08:00
Andrey Chernykh
7858aa9c08 Fix infinite loop caused by incorrect timestamp tokens prediction (#914)
* Fix infinite loop caused by incorrect timestamp tokens prediction

https://github.com/openai/whisper/discussions/810

* Update decoding.py

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-02-01 15:46:51 -08:00
Jong Wook Kim
5c1a8c10e7 clarify that 3.11 is not supported 2023-01-27 00:01:49 -08:00
Jong Wook Kim
4e635c6644 Update README.md about Python 3.8+ requirement 2023-01-24 14:45:56 -08:00
Jong Wook Kim
a6b36ede1f drop python 3.7 support (#889) 2023-01-24 14:05:57 -08:00
Jong Wook Kim
55f690af79 Release 20230124 2023-01-24 11:11:08 -08:00
Jong Wook Kim
7f1ef223ab handle printing even if sys.stdout.buffer is not available (#887) 2023-01-24 10:12:04 -08:00
Niels Mayer
f5bfe004ec Add TSV formatted output in transcript, using integer start/end times in milliseconds. (#228)
* Add CSV format output in transcript, containing lines of characters formatted like: <startTime-in-integer-milliseconds>, <endTime-in-integer-milliseconds>, <transcript-including-commas>

* for easier reading by spreadsheets importing CSV, the third

column of the CSV file is delimited by quotes, and any quote
characters that might be in the transcript (which would interfere with
parsing the third column as a string) are converted to "''".

* fix syntax error

* docstring edit

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-22 00:27:17 -08:00
Aaryan YVS
da600abd2b Added --output_format option (#333)
* Added --output option

--output option will help select the output files that will be generated.

Corrected the logic, which wrongly shows progress bar when verbose is set to False

* Changed output_files variable

* Changed back the tqdm verbose

* refactor output format handling

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-21 23:58:38 -08:00
zer0-x
9f7aba6099 Handle XDG_CACHE_HOME properly for download_root (#864)
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-01-21 01:09:39 -08:00
Jong Wook Kim
12e1089462 use stdout for printing transcription progress (#867) 2023-01-20 00:54:05 -08:00
Markus Hennerbichler
ea1c266709 Fix bug where mm is mistakenly replaced with hmm in e.g. 20mm (#659)
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-18 10:41:11 -08:00
Jong Wook Kim
8135a7c31c verbose outputs from pytest 2023-01-18 10:30:18 -08:00
Jong Wook Kim
9d646db9d8 print '?' if a letter can't be encoded using the system default encoding (#859) 2023-01-17 23:28:36 -08:00
Jong Wook Kim
37a4f1be6d Release 20230117 2023-01-17 16:08:28 -08:00
Romain Beaumont
b9f9b433ae Add github action to automatically push to pypi on Release x.y.z commit (#681)
* Add github action to automatically push to pypi on Release x.y.z commit

* some housekeeping for pypi upload

* add version.py

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-17 15:50:26 -08:00
Umar Farooqi
f0083e7eb2 Use ndimage.median_filter instead of signal.medfilter (#812)
For a 30s long audio file which didn't have any silence, ndimage.median_filter took 7s where signa.medfilter took 30s.

Co-authored-by: Umar Farooqi <umar@paystash.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-17 14:43:05 -08:00
Jong Wook Kim
a84191faae rename GitHub workflow 2023-01-17 13:54:40 -08:00
Jong Wook Kim
b1d213c0c7 allow test_transcribe to run on CPU when CUDA is not available 2023-01-17 13:43:36 -08:00
Jong Wook Kim
493dfffa37 add github action to run pytest 2023-01-17 13:38:33 -08:00
Mikko Vedru
0f39c89d92 Update README.md (#804) 2023-01-16 23:46:42 -08:00
Markus Hennerbichler
6df3ea1fb5 Support batch-dimension in log_mel_spectogram (#839) 2023-01-16 23:46:15 -08:00
adamreis
70861c7ce3 Fix tiny transcribe() docstring typo (#857)
s/successfully/successively, which I believe was the intent.
2023-01-16 22:42:01 -08:00
Jong Wook Kim
f82bc59f5e torch.concatenate -> torch.cat for compatibility 2023-01-10 10:53:18 -08:00
Jong Wook Kim
28769fcfe5 word-level timestamps in Multilingual_ASR notebook 2022-12-31 10:03:42 -07:00
Jong Wook Kim
53807677fe MultiHeadAttention to return qk as well 2022-12-30 01:53:57 -07:00
Jong Wook Kim
9323b2526c Revert "saving the qk matrix in the attention module for convenience"
This reverts commit 68e44bd83c.
2022-12-29 23:53:31 -07:00
Jong Wook Kim
68e44bd83c saving the qk matrix in the attention module for convenience 2022-12-29 23:02:52 -07:00
Jong Wook Kim
0b5dcfdef7 large-v2 figure and arxiv url update 2022-12-09 00:12:39 -05:00
altryne
b9265e5796 Update Hebrew language code to he per IANA registry (#401)
* Update Hebrew language code to he per IANA registry

Per [IANA registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), `iw` was deprecated as the code for Hebrew in 1989 and the preferred code is `he`

The correct subtag: 
```
%%
Type: language
Subtag: he
Description: Hebrew
Added: 2005-10-16
Suppress-Script: Hebr
%%
``` 
And the deprecation
```
%%
Type: language
Subtag: iw
Description: Hebrew
Added: 2005-10-16
Deprecated: 1989-01-01
Preferred-Value: he
Suppress-Script: Hebr
%%
```

* Update hebrew ISO code to he

Per discussion, it's ok to make this change without backwards compatibility
2022-12-07 13:45:31 -05:00
Paul Harter
fd8f80c8b8 Explicitly closing model file after reading it (#630) 2022-12-06 12:07:19 -05:00
Jong Wook Kim
4179ed2475 add large-v2 model
- The "large-v2" model is trained for more epochs with regularization and shows improved performance compared to the previous large.
- It has the same architecture as the original large model.
- When `load_model("large")` is called, the "large-v2" model will be loaded.
- We will soon update the paper regarding this new model.
2022-12-05 11:07:14 -05:00
jumon
ec1b34bb90 fix compression ratio function (#561) 2022-12-04 17:27:42 -06:00
Jong Wook Kim
eff383b27b invoking __call__ instead of forward() 2022-11-16 04:18:50 -08:00
Jong Wook Kim
02aa851a49 fix to return only the text token ids 2022-11-15 16:25:11 -08:00
jumon
76148a56c5 suppress generating non-timestamp tokens at the beginning (#532) 2022-11-15 11:44:36 -08:00
Vicki Anand
9f70a352f9 Fix attention caching to make it actually work (#370) 2022-10-19 16:44:03 -07:00
Sumana Harihareswara
7f3e408e09 Add package metadata to setup.py (#315)
Add project summary, license, etc. for display with
"pip show" and similar Python package distribution tools.
2022-10-17 13:51:16 -07:00
Michael Monashev
f680570016 Fix bug (#305)
Fix bug: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)
2022-10-17 11:38:20 -07:00
Jong Wook Kim
d18e9ea5dd transcribe() on English-only model won't complain when language="en" is not given 2022-10-09 02:40:12 -07:00
David Marx
82725cea9c infer download_root from XDG_CACHE_HOME if avail (#257) 2022-10-09 02:14:03 -07:00
eudoxos
35713c66e0 Add --threads option to transcribe (#278)
* Add --threads option to transcribe

Torch on CPU uses by default number_of_cores/2. This option allows to
override this default.

* Update transcribe.py

Co-authored-by: Jong Wook Kim <ilikekjw@gmail.com>
2022-10-09 02:11:15 -07:00
Corentin Jemine
9e653bd0ea Fixed CoW RuntimeError in DecodingTask.run() (#240) 2022-10-04 08:49:31 -07:00