This is comparison between whisper.cpp and faster-whisper. The faster-whisper readme has some benchmarks on the readme but wanted to test it myself. For whisper, I just ran manually. For faster-whisper, wrote this small script
- https://github.com/guillaumekln/faster-whisper/
- https://github.com/ggerganov/whisper.cpp
- quanitization w faster-whisper: https://opennmt.net/CTranslate2/quantization.html
- quantization w whisper.cpp: https://github.com/ggerganov/whisper.cpp#quantization
- ./main -bs 5 -p 2 -f steve2.wav -m models/ggml-small.en.bin
- Total 8 CPU threads on my 12 core machine
- -bs 2 : actually performs better about 10s faster.
- Audio length: ~2m
- Beam size: openai/whisper#1314
- Changing threads/processor count had minimal effects.
load time = 210.93 ms fallbacks = 0 p / 0 h mel time = 534.97 ms sample time = 4601.12 ms / 773 runs ( 5.95 ms per run) encode time = 24789.57 ms / 3 runs ( 8263.19 ms per run) decode time = 15115.56 ms / 761 runs ( 19.86 ms per run) total time = 46346.90 ms
load time = 581.65 ms fallbacks = 0 p / 0 h mel time = 582.50 ms sample time = 6143.45 ms / 659 runs ( 9.32 ms per run) encode time = 68596.05 ms / 2 runs (34298.02 ms per run) decode time = 37984.16 ms / 651 runs ( 58.35 ms per run) total time = 130021.68 ms
load time = 1128.55 ms fallbacks = 0 p / 0 h mel time = 531.48 ms sample time = 10956.46 ms / 667 runs ( 16.43 ms per run) encode time = 137022.25 ms / 2 runs (68511.12 ms per run) decode time = 76914.08 ms / 659 runs ( 116.71 ms per run) total time = 253307.20 ms
fallbacks = 0 p / 0 h mel time = 554.00 ms sample time = 5255.80 ms / 774 runs ( 6.79 ms per run) encode time = 4691.07 ms / 3 runs ( 1563.69 ms per run) decode time = 16272.99 ms / 762 runs ( 21.36 ms per run) total time = 29835.26 ms
load time = 5175.25 ms fallbacks = 0 p / 0 h mel time = 535.35 ms sample time = 7234.65 ms / 659 runs ( 10.98 ms per run) encode time = 10368.82 ms / 2 runs ( 5184.41 ms per run) decode time = 42721.30 ms / 651 runs ( 65.62 ms per run) total time = 73852.74 ms
load time = 2050.85 ms fallbacks = 0 p / 0 h mel time = 572.59 ms sample time = 11943.09 ms / 667 runs ( 17.91 ms per run) encode time = 16994.45 ms / 2 runs ( 8497.23 ms per run) decode time = 78852.77 ms / 659 runs ( 119.66 ms per run) total time = 123859.55 ms
- ~2min audio
- See script
- small.en-cpu-int8 : 14.0
- medium-cpu-int8 : 43.1
- large-v1-cpu-int8 : 76.0
- large-v2-cpu-int8 : 75.4
- small.en-cuda-int8 : 5.3
- small.en-cuda-float16 : 2.5
- small.en-cuda-int8_float16 : 2.8
- medium-cuda-int8 : 7.7
- medium-cuda-float16 : 6.1
- medium-cuda-int8_float16 : 6.2
- large-v1-cuda-int8 : 12.9
- large-v1-cuda-float16 : 9.8
- large-v1-cuda-int8_float16 : 10.6
- large-v2-cuda-int8 : 12.5
- large-v2-cuda-float16 : 9.2
- large-v2-cuda-int8_float16 : 10.6