How to Convert Whisper from HF's Transformer format into Ctranslate2 format (needed for FasterWhisper)
# Create a virtual environment named myenv
python -m venv myvenv
# Activate this venv
source myvenv/bin/activate
# Now the venv is activated, install the packages
pip install transformers ctranslate2
ct2-transformers-converter \
--model whisper-large-v2
--output_dir whisper-large-v2-ct2 \
--copy_files tokenizer_config.json \
--quantization float16
is a python package for running OpenAI's model efficiently. It allows you to transcribe (and translate) speech with lower memory requirements and lower latency. However, this package only supports 's models; it cannot use the Huggingface's 's models. You need to manually convert these models from tranformers (pytorch) into ctranslate2. This way, you can use any of the finetuned whisper models available on
To be able to convert the models from HF's transformers into Ctranslate2, you need the following pacakges:
- transformers
- ctranslate2
That's all we need :) You can easily install them using pip as follow's:
pip install transformers ctranslate2
It's generally recommended to create a python's virtual environment before installing these packages to prevent conflict. You can do that as follows:
# Create a virtual environment named myenv
python -m venv myvenv
# Activate this venv
source myvenv/bin/activate
# Now the venv is activated, install the packages
pip install transformers ctranslate2
Now we can easily convert a model from transformers into ctranslate2. There are three steps to convert the model:
- Load the model into memory transformers format
- Convert it into Ctranslate2 format
- Save the converted model in ctranslate2 for later usage
- [Optional] Copy the tokenizer into the model directory for easier packaging
This can be done as follows:
Assuming the transformers model is in a directory named whisper-large-v2
and we want to save it into a directory named whisper-large-v2-ct2
and add the tokenizer tokenizer_config.json
to it:
ct2-transformers-converter \
--model whisper-large-v2
--output_dir whisper-large-v2-ct2 \
--copy_files tokenizer_config.json \
--quantization float16
Now we can easily use this model in Faster Whisper as follows:
from faster_whisper import WhisperModel
model_path = "whisper-large-v2-ct2"
# Load model on GPU with FP16
model = WhisperModel(model_path, device="cuda", compute_type="float16")
# Transcrive a wav file
segments, info = model.transcribe("83.wav", beam_size=1, language='ar', task="translate")
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
# Print transcript with timestamps
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))