Skip to content

Instantly share code, notes, and snippets.

@Siss3l
Created May 3, 2024 17:15
Show Gist options
  • Save Siss3l/8e7ccb37e43e0762d33d2dff9859399a to your computer and use it in GitHub Desktop.
Save Siss3l/8e7ccb37e43e0762d33d2dff9859399a to your computer and use it in GitHub Desktop.
Multi-lingual multi-tasking NVIDIA models for TTS & STT
asr_model = __import__("nemo").collections.asr.models.EncDecMultiTaskModel.from_pretrained("nvidia/canary-1b")
asr_model.encoder.change_attention_model("rel_pos_local_attn", [128, 128])
asr_model.encoder.change_subsampling_conv_chunking_factor(1)
asr_model.to(__import__("torch").device("cuda:0")) # cpu
print(asr_model.transcribe(audio="long_audio.wav", batch_size=4*4))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment