Pipeline Configuration
Contents
Pipeline Configuration#
在最简单的用例中,您可以部署 ASR pipeline 以与 StreamingRecognize
API 调用(请参阅 riva/proto/riva_asr.proto)一起使用,而无需任何语言模型,如下所示
riva-build speech_recognition \
/servicemaker-dev/<rmir_filename>:<encryption_key> \
/servicemaker-dev/<riva_filename>:<encryption_key> \
--name=<pipeline_name> \
--wfst_tokenizer_model=<wfst_tokenizer_model> \
--wfst_verbalizer_model=<wfst_verbalizer_model> \
--decoder_type=greedy
其中
<rmir_filename>
是生成的 Rivarmir
文件<riva_filename>
是要用作输入的riva
文件的名称<encryption_key>
是用于加密文件的密钥。在 NGC 上上传的预训练 Riva 模型的加密密钥是tlt_encode
。<name>
、<acoustic_model_name>
和<featurizer_name>
是模型库中组件的可选用户定义名称。<wfst_tokenizer_model>
是用于 ASR 转录的逆文本规范化的 WFST 分词器模型文件的名称。有关更多详细信息,请参阅 inverse-text-normalization。<wfst_verbalizer_model>
是用于 ASR 转录的逆文本规范化的 WFST 动词化器模型文件的名称。有关更多详细信息,请参阅 inverse-text-normalization。decoder_type
是要使用的解码器的类型。有效值为flashlight
、os2s
、greedy
和pass_through
。我们建议对所有 CTC 模型使用flashlight
。有关更多详细信息,请参阅 解码器超参数。
成功完成此命令后,将在 /servicemaker-dev/
文件夹中创建一个名为 <rmir_filename>
的文件。由于未指定语言模型,因此 Riva greedy 解码器用于根据声学模型的输出来预测转录。如果您的 .riva
归档已加密,则需要在 RMIR 文件名和 Riva 文件名的末尾包含 :<encryption_key>
。否则,这是不必要的。
对于嵌入式平台,建议使用批处理大小为 1,因为它可实现最低的内存占用。要使用批处理大小为 1,请参阅 riva-build-optional-parameters 部分,并在执行 riva-build
命令时将各种 min_batch_size
、max_batch_size
、opt_batch_size
和 max_execution_batch_size
参数设置为 1。
以下摘要列出了用于从不同模型、模式及其限制的快速入门脚本生成 RMIR 文件的 riva-build
命令
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-en-US-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--flashlight_decoder.asr_model_delay=-1 \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_decoding_vocab_file> \
--flashlight_decoder.lm_weight=0.8 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--profane_words_file=<txt_profane_words_file> \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-en-US-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-en-US-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=flashlight \
--flashlight_decoder.asr_model_delay=-1 \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_decoding_vocab_file> \
--flashlight_decoder.lm_weight=0.8 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--profane_words_file=<txt_profane_words_file> \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-en-US-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-en-US-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=flashlight \
--flashlight_decoder.asr_model_delay=-1 \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_decoding_vocab_file> \
--flashlight_decoder.lm_weight=0.8 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--profane_words_file=<txt_profane_words_file> \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-en-US-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-en-GB-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<vocab_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=en-GB \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-en-GB-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-GB \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-en-GB-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<vocab_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=en-GB \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-en-GB-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-GB \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-en-GB-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<vocab_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=en-GB \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-en-GB-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-GB \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-es-US-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=es-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-es-US-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=es-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-es-US-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=es-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-es-US-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=es-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-es-US-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=es-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-es-US-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=es-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-es-ES-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.beam_size=32 \
--language_code=es-ES \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-es-ES-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=es-ES \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-es-ES-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.beam_size=32 \
--language_code=es-ES \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-es-ES-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=es-ES \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-es-ES-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.beam_size=32 \
--language_code=es-ES \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-es-ES-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=es-ES \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-de-DE-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=de-DE \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-de-DE-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=de-DE \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-de-DE-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=de-DE \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-de-DE-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=de-DE \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-de-DE-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=de-DE \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-de-DE-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=de-DE \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-ru-RU-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.8 \
--flashlight_decoder.word_insertion_score=0.75 \
--flashlight_decoder.beam_threshold=20. \
--language_code=ru-RU \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-ru-RU-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=ru-RU \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-ru-RU-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.8 \
--flashlight_decoder.word_insertion_score=0.75 \
--flashlight_decoder.beam_threshold=20. \
--language_code=ru-RU \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-ru-RU-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=ru-RU \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-ru-RU-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.8 \
--flashlight_decoder.word_insertion_score=0.75 \
--flashlight_decoder.beam_threshold=20. \
--language_code=ru-RU \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-ru-RU-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=ru-RU \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-zh-CN-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=30. \
--flashlight_decoder.lm_weight=0.7 \
--flashlight_decoder.word_insertion_score=0.5 \
--language_code=zh-CN \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-zh-CN-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=zh-CN \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-zh-CN-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=30. \
--flashlight_decoder.lm_weight=0.7 \
--flashlight_decoder.word_insertion_score=0.5 \
--language_code=zh-CN \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-zh-CN-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=zh-CN \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-zh-CN-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=30. \
--flashlight_decoder.lm_weight=0.7 \
--flashlight_decoder.word_insertion_score=0.5 \
--language_code=zh-CN \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-zh-CN-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=zh-CN \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-hi-IN-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=hi-IN
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-hi-IN-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=hi-IN
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-hi-IN-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=hi-IN
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-hi-IN-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=hi-IN
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-hi-IN-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=hi-IN
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-hi-IN-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=hi-IN
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-ja-JP-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--append_space_to_transcripts=False \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--flashlight_decoder.use_lexicon_free_decoding=True \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=30. \
--flashlight_decoder.lm_weight=0.5 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.blank_token=_ \
--flashlight_decoder.sil_token=" \
" \
--language_code=ja-JP
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-ja-JP-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--append_space_to_transcripts=False \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=ja-JP
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-ja-JP-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--append_space_to_transcripts=False \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--flashlight_decoder.use_lexicon_free_decoding=True \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=30. \
--flashlight_decoder.lm_weight=0.5 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.blank_token=_ \
--flashlight_decoder.sil_token=" \
" \
--language_code=ja-JP
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-ja-JP-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--append_space_to_transcripts=False \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=ja-JP
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-ja-JP-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--append_space_to_transcripts=False \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--flashlight_decoder.use_lexicon_free_decoding=True \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=30. \
--flashlight_decoder.lm_weight=0.5 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.blank_token=_ \
--flashlight_decoder.sil_token=" \
" \
--language_code=ja-JP
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-ja-JP-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--append_space_to_transcripts=False \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=ja-JP
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-ar-AR-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.7 \
--flashlight_decoder.word_insertion_score=0.75 \
--flashlight_decoder.beam_threshold=20. \
--language_code=ar-AR \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-ar-AR-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=ar-AR \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-ar-AR-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.7 \
--flashlight_decoder.word_insertion_score=0.75 \
--flashlight_decoder.beam_threshold=20. \
--language_code=ar-AR \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-ar-AR-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=ar-AR \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-ar-AR-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.7 \
--flashlight_decoder.word_insertion_score=0.75 \
--flashlight_decoder.beam_threshold=20. \
--language_code=ar-AR \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-ar-AR-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=ar-AR \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-it-IT-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.num_tokenization=1 \
--language_code=it-IT
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-it-IT-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=it-IT
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-it-IT-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.num_tokenization=1 \
--language_code=it-IT
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-it-IT-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=it-IT
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-it-IT-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.num_tokenization=1 \
--language_code=it-IT
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-it-IT-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=it-IT
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-ko-KR-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.beam_size=32 \
--language_code=ko-KR
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-ko-KR-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=ko-KR
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-ko-KR-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.beam_size=32 \
--language_code=ko-KR
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-ko-KR-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=ko-KR
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-ko-KR-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.beam_size=32 \
--language_code=ko-KR
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-ko-KR-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=ko-KR
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-pt-BR-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=pt-BR
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-pt-BR-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=pt-BR
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-pt-BR-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=pt-BR
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-pt-BR-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=pt-BR
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-pt-BR-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=pt-BR
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-pt-BR-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=pt-BR
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-fr-FR-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.7 \
--flashlight_decoder.word_insertion_score=0.75 \
--flashlight_decoder.beam_threshold=20. \
--language_code=fr-FR \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-fr-FR-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=fr-FR \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-fr-FR-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.7 \
--flashlight_decoder.word_insertion_score=0.75 \
--flashlight_decoder.beam_threshold=20. \
--language_code=fr-FR \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-fr-FR-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=fr-FR \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-fr-FR-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.7 \
--flashlight_decoder.word_insertion_score=0.75 \
--flashlight_decoder.beam_threshold=20. \
--language_code=fr-FR \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-fr-FR-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=fr-FR \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-nl-NL-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=30. \
--flashlight_decoder.lm_weight=0.7 \
--flashlight_decoder.word_insertion_score=0.75 \
--language_code=nl-NL \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-nl-NL-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=nl-NL \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-nl-NL-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=30. \
--flashlight_decoder.lm_weight=0.7 \
--flashlight_decoder.word_insertion_score=0.75 \
--language_code=nl-NL \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-nl-NL-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=nl-NL \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-nl-NL-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=30. \
--flashlight_decoder.lm_weight=0.7 \
--flashlight_decoder.word_insertion_score=0.75 \
--language_code=nl-NL \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-nl-NL-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=nl-NL \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-nl-BE-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=30. \
--flashlight_decoder.lm_weight=0.7 \
--flashlight_decoder.word_insertion_score=0.75 \
--language_code=nl-BE \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-nl-BE-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=nl-BE \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-nl-BE-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=30. \
--flashlight_decoder.lm_weight=0.7 \
--flashlight_decoder.word_insertion_score=0.75 \
--language_code=nl-BE \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-nl-BE-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=nl-BE \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-nl-BE-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=30. \
--flashlight_decoder.lm_weight=0.7 \
--flashlight_decoder.word_insertion_score=0.75 \
--language_code=nl-BE \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-nl-BE-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=nl-BE \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-xl-en-US-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.use_trt_fp32 \
--max_batch_size=4 \
--nn.max_batch_size=4 \
--nn.opt_batch_size=4 \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--flashlight_decoder.asr_model_delay=-1 \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_decoding_vocab_file> \
--flashlight_decoder.lm_weight=0.8 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--profane_words_file=<txt_profane_words_file> \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-xl-en-US-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.use_trt_fp32 \
--max_batch_size=4 \
--nn.max_batch_size=4 \
--nn.opt_batch_size=4 \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-xl-en-US-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.use_trt_fp32 \
--max_batch_size=4 \
--nn.max_batch_size=4 \
--nn.opt_batch_size=4 \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=flashlight \
--flashlight_decoder.asr_model_delay=-1 \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_decoding_vocab_file> \
--flashlight_decoder.lm_weight=0.8 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--profane_words_file=<txt_profane_words_file> \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-xl-en-US-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.use_trt_fp32 \
--max_batch_size=4 \
--nn.max_batch_size=4 \
--nn.opt_batch_size=4 \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-xl-en-US-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.use_trt_fp32 \
--max_batch_size=4 \
--nn.max_batch_size=4 \
--nn.opt_batch_size=4 \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--featurizer.max_batch_size=256 \
--featurizer.max_execution_batch_size=256 \
--decoder_type=flashlight \
--flashlight_decoder.asr_model_delay=-1 \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_decoding_vocab_file> \
--flashlight_decoder.lm_weight=0.8 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--profane_words_file=<txt_profane_words_file> \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-xl-en-US-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.use_trt_fp32 \
--max_batch_size=4 \
--nn.max_batch_size=4 \
--nn.opt_batch_size=4 \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--featurizer.max_batch_size=256 \
--featurizer.max_execution_batch_size=256 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-unified-de-DE-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=de-DE \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-unified-de-DE-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=de-DE \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-unified-de-DE-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=de-DE \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-unified-de-DE-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=de-DE \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-unified-de-DE-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=de-DE \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-unified-de-DE-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=de-DE \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-unified-zh-CN-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--flashlight_decoder.use_lexicon_free_decoding=True \
--flashlight_decoder.beam_size=16 \
--flashlight_decoder.beam_size_token=16 \
--flashlight_decoder.lm_weight=0.7 \
--flashlight_decoder.word_insertion_score=0.5 \
--flashlight_decoder.beam_threshold=10. \
--flashlight_decoder.blank_token=_ \
--flashlight_decoder.sil_token=" \
" \
--append_space_to_transcripts=False \
--language_code=zh-CN \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-unified-zh-CN-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=zh-CN \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-unified-zh-CN-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--flashlight_decoder.use_lexicon_free_decoding=True \
--flashlight_decoder.beam_size=16 \
--flashlight_decoder.beam_size_token=16 \
--flashlight_decoder.lm_weight=0.7 \
--flashlight_decoder.word_insertion_score=0.5 \
--flashlight_decoder.beam_threshold=10. \
--flashlight_decoder.blank_token=_ \
--flashlight_decoder.sil_token=" \
" \
--append_space_to_transcripts=False \
--language_code=zh-CN \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-unified-zh-CN-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=zh-CN \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-unified-zh-CN-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--flashlight_decoder.use_lexicon_free_decoding=True \
--flashlight_decoder.beam_size=16 \
--flashlight_decoder.beam_size_token=16 \
--flashlight_decoder.lm_weight=0.7 \
--flashlight_decoder.word_insertion_score=0.5 \
--flashlight_decoder.beam_threshold=10. \
--flashlight_decoder.blank_token=_ \
--flashlight_decoder.sil_token=" \
" \
--append_space_to_transcripts=False \
--language_code=zh-CN \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-unified-zh-CN-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=zh-CN \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-unified-ja-JP-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--append_space_to_transcripts=False \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--flashlight_decoder.use_lexicon_free_decoding=True \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=30. \
--flashlight_decoder.lm_weight=0.5 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.blank_token=_ \
--flashlight_decoder.sil_token=" \
" \
--language_code=ja-JP
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-unified-ja-JP-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--append_space_to_transcripts=False \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=ja-JP
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-unified-ja-JP-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--append_space_to_transcripts=False \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--flashlight_decoder.use_lexicon_free_decoding=True \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=30. \
--flashlight_decoder.lm_weight=0.5 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.blank_token=_ \
--flashlight_decoder.sil_token=" \
" \
--language_code=ja-JP
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-unified-ja-JP-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--append_space_to_transcripts=False \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=ja-JP
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-unified-ja-JP-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--append_space_to_transcripts=False \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--flashlight_decoder.use_lexicon_free_decoding=True \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=30. \
--flashlight_decoder.lm_weight=0.5 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.blank_token=_ \
--flashlight_decoder.sil_token=" \
" \
--language_code=ja-JP
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-unified-ja-JP-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--append_space_to_transcripts=False \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=ja-JP
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-ml-cs-es-en-US-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=es-en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-ml-cs-es-en-US-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=es-en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-ml-cs-es-en-US-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=es-en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-ml-cs-es-en-US-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=es-en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-ml-cs-es-en-US-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=es-en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-ml-cs-es-en-US-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=es-en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-unified-ml-cs-ja-en-JP-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--append_space_to_transcripts=False \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--flashlight_decoder.use_lexicon_free_decoding=True \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=30. \
--flashlight_decoder.lm_weight=0.5 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.blank_token=_ \
--flashlight_decoder.sil_token=" \
" \
--language_code=ja-en-JP
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-unified-ml-cs-ja-en-JP-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--append_space_to_transcripts=False \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=ja-en-JP
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-unified-ml-cs-ja-en-JP-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--append_space_to_transcripts=False \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--flashlight_decoder.use_lexicon_free_decoding=True \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=30. \
--flashlight_decoder.lm_weight=0.5 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.blank_token=_ \
--flashlight_decoder.sil_token=" \
" \
--language_code=ja-en-JP
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-unified-ml-cs-ja-en-JP-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--append_space_to_transcripts=False \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=ja-en-JP
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-unified-ml-cs-ja-en-JP-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--append_space_to_transcripts=False \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--flashlight_decoder.use_lexicon_free_decoding=True \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=30. \
--flashlight_decoder.lm_weight=0.5 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.blank_token=_ \
--flashlight_decoder.sil_token=" \
" \
--language_code=ja-en-JP
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-unified-ml-cs-ja-en-JP-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--unified_acoustic_model \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--append_space_to_transcripts=False \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=ja-en-JP
riva-build diarizer \
<rmir_filename>:<key> \
<riva_vad_file>:<key> \
<riva_speaker_recognition_file>:<key> \
--diarizer_backend.offline \
--embedding_extractor_nn.max_batch_size=32 \
--embedding_extractor_nn.use_onnx_runtime \
--embedding_extractor_nn.optimization_graph_level=-1 \
--clustering_backend.max_batch_size=0 \
--chunk_size=300 \
--audio_sec_limit=4001 \
--diarizer_backend.language_code=generic
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-0.6b-en-US-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-16 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--flashlight_decoder.asr_model_delay=-1 \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_decoding_vocab_file> \
--flashlight_decoder.lm_weight=0.8 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--profane_words_file=<txt_profane_words_file> \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-0.6b-en-US-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-16 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-0.6b-en-US-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-16 \
--chunk_size=0.96 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--flashlight_decoder.asr_model_delay=-1 \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_decoding_vocab_file> \
--flashlight_decoder.lm_weight=0.8 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--profane_words_file=<txt_profane_words_file> \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-0.6b-en-US-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-16 \
--chunk_size=0.96 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=parakeet-0.6b-en-US-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-16 \
--decoder_type=flashlight \
--flashlight_decoder.asr_model_delay=-1 \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_decoding_vocab_file> \
--flashlight_decoder.lm_weight=0.8 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--profane_words_file=<txt_profane_words_file> \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=parakeet-0.6b-en-US-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-16 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-0.6b-unified-en-US-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=0.16 \
--left_padding_size=3.92 \
--right_padding_size=3.92 \
--decoder_type=flashlight \
--flashlight_decoder.asr_model_delay=-1 \
--decoding_language_model_binary=<bin_file> \
--decoding_lexicon=<txt_decoding_lexicon_file> \
--flashlight_decoder.lm_weight=0.1 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--profane_words_file=<txt_profane_words_file> \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-0.6b-unified-en-US-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=0.16 \
--left_padding_size=3.92 \
--right_padding_size=3.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-0.6b-unified-en-US-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=0.96 \
--left_padding_size=3.92 \
--right_padding_size=3.92 \
--decoder_type=flashlight \
--flashlight_decoder.asr_model_delay=-1 \
--decoding_language_model_binary=<bin_file> \
--decoding_lexicon=<txt_decoding_lexicon_file> \
--flashlight_decoder.lm_weight=0.1 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--profane_words_file=<txt_profane_words_file> \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-0.6b-unified-en-US-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=0.96 \
--left_padding_size=3.92 \
--right_padding_size=3.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=parakeet-0.6b-unified-en-US-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--featurizer.max_batch_size=256 \
--featurizer.max_execution_batch_size=256 \
--decoder_type=flashlight \
--flashlight_decoder.asr_model_delay=-1 \
--decoding_language_model_binary=<bin_file> \
--decoding_lexicon=<txt_decoding_lexicon_file> \
--flashlight_decoder.lm_weight=0.1 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--profane_words_file=<txt_profane_words_file> \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=parakeet-0.6b-unified-en-US-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--featurizer.max_batch_size=256 \
--featurizer.max_execution_batch_size=256 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-0.6b-unified-ml-cs-es-en-US-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-16 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=0.32 \
--left_padding_size=3.92 \
--right_padding_size=3.92 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_lexicon=<txt_decoding_lexicon_file> \
--flashlight_decoder.lm_weight=0.8 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--language_code=es-en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-0.6b-unified-ml-cs-es-en-US-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-16 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=0.32 \
--left_padding_size=3.92 \
--right_padding_size=3.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=es-en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-0.6b-unified-ml-cs-es-en-US-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-16 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=0.96 \
--left_padding_size=3.92 \
--right_padding_size=3.92 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_lexicon=<txt_decoding_lexicon_file> \
--flashlight_decoder.lm_weight=0.8 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--language_code=es-en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-0.6b-unified-ml-cs-es-en-US-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-16 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=0.96 \
--left_padding_size=3.92 \
--right_padding_size=3.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=es-en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=parakeet-0.6b-unified-ml-cs-es-en-US-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-16 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--featurizer.max_batch_size=256 \
--featurizer.max_execution_batch_size=256 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_lexicon=<txt_decoding_lexicon_file> \
--flashlight_decoder.lm_weight=0.8 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--language_code=es-en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=parakeet-0.6b-unified-ml-cs-es-en-US-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-16 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--featurizer.max_batch_size=256 \
--featurizer.max_execution_batch_size=256 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=es-en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-1.1b-en-US-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.fp16_needs_obey_precision_pass \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--flashlight_decoder.asr_model_delay=-1 \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_decoding_vocab_file> \
--flashlight_decoder.lm_weight=0.8 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--profane_words_file=<txt_profane_words_file> \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-1.1b-en-US-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.fp16_needs_obey_precision_pass \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-1.1b-en-US-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.fp16_needs_obey_precision_pass \
--chunk_size=0.96 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--flashlight_decoder.asr_model_delay=-1 \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_decoding_vocab_file> \
--flashlight_decoder.lm_weight=0.8 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--profane_words_file=<txt_profane_words_file> \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-1.1b-en-US-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.fp16_needs_obey_precision_pass \
--chunk_size=0.96 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=parakeet-1.1b-en-US-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--nn.fp16_needs_obey_precision_pass \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--featurizer.max_batch_size=256 \
--featurizer.max_execution_batch_size=256 \
--decoder_type=flashlight \
--flashlight_decoder.asr_model_delay=-1 \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_decoding_vocab_file> \
--flashlight_decoder.lm_weight=0.8 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--profane_words_file=<txt_profane_words_file> \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=parakeet-1.1b-en-US-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--nn.fp16_needs_obey_precision_pass \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--featurizer.max_batch_size=256 \
--featurizer.max_execution_batch_size=256 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-1.1b-unified-ml-cs-universal-multi-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=0.32 \
--left_padding_size=4.64 \
--right_padding_size=4.64 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN,he-IL,nb-NO,nl-NL,cs-CZ,da-DK,fr-CA,pl-PL,sv-SE,th-TH,tr-TR,pt-PT,nn-NO,multi
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-1.1b-unified-ml-cs-universal-multi-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=1.6 \
--left_padding_size=4.0 \
--right_padding_size=4.0 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN,he-IL,nb-NO,nl-NL,cs-CZ,da-DK,fr-CA,pl-PL,sv-SE,th-TH,tr-TR,pt-PT,nn-NO,multi
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=parakeet-1.1b-unified-ml-cs-universal-multi-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=4.8 \
--left_padding_size=3.2 \
--right_padding_size=3.2 \
--featurizer.max_batch_size=256 \
--featurizer.max_execution_batch_size=256 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN,he-IL,nb-NO,nl-NL,cs-CZ,da-DK,fr-CA,pl-PL,sv-SE,th-TH,tr-TR,pt-PT,nn-NO,multi
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-1.1b-unified-ml-cs-concat-multi-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=0.32 \
--left_padding_size=4.64 \
--right_padding_size=4.64 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN,he-IL,nb-NO,nl-NL,cs-CZ,da-DK,fr-CA,pl-PL,sv-SE,th-TH,tr-TR,pt-PT,nn-NO,multi
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-1.1b-unified-ml-cs-concat-multi-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=1.6 \
--left_padding_size=4.0 \
--right_padding_size=4.0 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN,he-IL,nb-NO,nl-NL,cs-CZ,da-DK,fr-CA,pl-PL,sv-SE,th-TH,tr-TR,pt-PT,nn-NO,multi
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=parakeet-1.1b-unified-ml-cs-concat-multi-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=4.8 \
--left_padding_size=3.2 \
--right_padding_size=3.2 \
--featurizer.max_batch_size=256 \
--featurizer.max_execution_batch_size=256 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN,he-IL,nb-NO,nl-NL,cs-CZ,da-DK,fr-CA,pl-PL,sv-SE,th-TH,tr-TR,pt-PT,nn-NO,multi
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-1.1b-unified-ml-cs-em-ea-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=0.32 \
--left_padding_size=4.64 \
--right_padding_size=4.64 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_decoding_vocab_file> \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.lm_weight=0.33 \
--flashlight_decoder.word_insertion_score=0.01 \
--language_code=em-ea
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-1.1b-unified-ml-cs-em-ea-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=0.32 \
--left_padding_size=4.64 \
--right_padding_size=4.64 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=em-ea
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-1.1b-unified-ml-cs-em-ea-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=1.6 \
--left_padding_size=4.0 \
--right_padding_size=4.0 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_decoding_vocab_file> \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.lm_weight=0.33 \
--flashlight_decoder.word_insertion_score=0.01 \
--language_code=em-ea
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-1.1b-unified-ml-cs-em-ea-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=1.6 \
--left_padding_size=4.0 \
--right_padding_size=4.0 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=em-ea
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=parakeet-1.1b-unified-ml-cs-em-ea-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=4.8 \
--left_padding_size=3.2 \
--right_padding_size=3.2 \
--featurizer.max_batch_size=256 \
--featurizer.max_execution_batch_size=256 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_decoding_vocab_file> \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.lm_weight=0.33 \
--flashlight_decoder.word_insertion_score=0.01 \
--language_code=em-ea
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=parakeet-1.1b-unified-ml-cs-em-ea-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=4.8 \
--left_padding_size=3.2 \
--right_padding_size=3.2 \
--featurizer.max_batch_size=256 \
--featurizer.max_execution_batch_size=256 \
--decoder_type=greedy \
--greedy_decoder.asr_model_delay=-1 \
--language_code=em-ea
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=parakeet-rnnt-1.1b-en-US-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--nn.fp16_needs_obey_precision_pass \
--chunk_size=8.0 \
--left_padding_size=0 \
--right_padding_size=0 \
--featurizer.max_batch_size=256 \
--featurizer.max_execution_batch_size=256 \
--max_batch_size=128 \
--decoder_type=nemo \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=whisper-large-v3-multi-asr-offline \
--return_separate_utterances=True \
--chunk_size 30 \
--left_padding_size 0 \
--right_padding_size 0 \
--decoder_type trtllm \
--unified_acoustic_model \
--feature_extractor_type torch \
--featurizer.norm_per_feature false \
--max_batch_size 8 \
--featurizer.precalc_norm_params False \
--featurizer.max_batch_size=8 \
--featurizer.max_execution_batch_size=8 \
--language_code=en,zh,de,es,ru,ko,fr,ja,pt,tr,pl,ca,nl,ar,sv,it,id,hi,fi,vi,he,uk,el,ms,cs,ro,da,hu,ta,no,th,ur,hr,bg,lt,la,mi,ml,cy,sk,te,fa,lv,bn,sr,az,sl,kn,et,mk,br,eu,is,hy,ne,mn,bs,kk,sq,sw,gl,mr,pa,si,km,sn,yo,so,af,oc,ka,be,tg,sd,gu,am,yi,lo,uz,fo,ht,ps,tk,nn,mt,sa,lb,my,bo,tl,mg,as,tt,haw,ln,ha,ba,jw,su,yue,multi
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=distil-whisper-large-v3-en-US-asr-offline \
--return_separate_utterances=True \
--chunk_size 30 \
--left_padding_size 0 \
--right_padding_size 0 \
--decoder_type trtllm \
--unified_acoustic_model \
--feature_extractor_type torch \
--featurizer.norm_per_feature false \
--max_batch_size 8 \
--featurizer.precalc_norm_params False \
--featurizer.max_batch_size=8 \
--featurizer.max_execution_batch_size=8 \
--language_code=en-US
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=canary-1b-multi-asr-offline \
--return_separate_utterances=True \
--chunk_size 30 \
--left_padding_size 0 \
--right_padding_size 0 \
--decoder_type nemo \
--nemo_decoder.nemo_decoder_type canary \
--feature_extractor_type torch \
--torch_feature_type nemo \
--featurizer.norm_per_feature false \
--max_batch_size 8 \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_params False \
--featurizer.max_batch_size=128 \
--featurizer.max_execution_batch_size=128 \
--language_code=en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=canary-0.6b-turbo-multi-asr-offline \
--return_separate_utterances=True \
--chunk_size 30 \
--left_padding_size 0 \
--right_padding_size 0 \
--decoder_type nemo \
--nemo_decoder.nemo_decoder_type canary \
--feature_extractor_type torch \
--torch_feature_type nemo \
--featurizer.norm_per_feature false \
--max_batch_size 8 \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_params False \
--featurizer.max_batch_size=128 \
--featurizer.max_execution_batch_size=128 \
--language_code=en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN
有关传递给 riva-build
以自定义 ASR pipeline 的参数的详细信息,请运行
riva-build <pipeline> -h
流式/离线识别#
Riva ASR pipeline 可以配置为流式和离线识别用例。当使用 StreamingRecognize
API 调用(请参阅 riva/proto/riva_asr.proto)时,我们建议使用以下 riva-build
参数,用于使用 Conformer 声学模型进行低延迟流式识别
riva-build speech_recognition \
/servicemaker-dev/<rmir_filename>:<encryption_key> \
/servicemaker-dev/<riva_filename>:<encryption_key> \
--name=<pipeline_name> \
--wfst_tokenizer_model=<wfst_tokenizer_model> \
--wfst_verbalizer_model=<wfst_verbalizer_model> \
--decoder_type=greedy \
--chunk_size=0.16 \
--padding_size=1.92 \
--ms_per_timestep=40 \
--nn.fp16_needs_obey_precision_pass \
--greedy_decoder.asr_model_delay=-1 \
--endpointing.residue_blanks_at_start=-2 \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False
对于使用 StreamingRecognize
API 调用进行高吞吐量流式识别,chunk_size
和 padding_size
可以按如下方式设置
--chunk_size=0.8 \
--padding_size=1.6
最后,要配置用于使用 Recognize
API 调用(请参阅 riva/proto/riva_asr.proto)进行离线识别的 ASR pipeline,我们建议使用以下 Conformer 声学模型设置
--offline \
--chunk_size=4.8 \
--padding_size=1.6
注意
当使用 riva-deploy
部署离线 ASR 模型时,日志中可能会出现 TensorRT 警告,指示格式转换的内存要求无法满足。这些警告不应影响功能,可以忽略。
语言模型#
Riva ASR 支持使用 n-gram 语言模型进行解码。n-gram 语言模型可以通过几种不同的方式提供。
一个
.arpa
格式文件。一个 KenLM 二进制格式文件。
有关构建语言模型的更多信息,请参阅 training-language-models 部分。
ARPA 格式语言模型#
要配置 Riva ASR pipeline 以使用以 arpa
格式存储的 n-gram 语言模型,请替换
--decoder_type=greedy
为
--decoder_type=flashlight \
--decoding_language_model_arpa=<arpa_filename> \
--decoding_vocab=<decoder_vocab_file>
KenLM 二进制语言模型#
要在使用 KenLM 二进制文件指定语言模型时生成 Riva RMIR 文件,请替换
--decoder_type=greedy
为
--decoder_type=flashlight \
--decoding_language_model_binary=<KENLM_binary_filename> \
--decoding_vocab=<decoder_vocab_file>
解码器超参数#
解码器语言模型超参数也可以从 riva-build
命令中指定。
您可以通过指定以下内容来指定 Flashlight 解码器超参数 beam_size
、beam_size_token
、beam_threshold
、lm_weight
和 word_insertion_score
--decoder_type=flashlight \
--decoding_language_model_binary=<arpa_filename> \
--decoding_vocab=<decoder_vocab_file> \
--flashlight_decoder.beam_size=<beam_size> \
--flashlight_decoder.beam_size_token=<beam_size_token> \
--flashlight_decoder.beam_threshold=<beam_threshold> \
--flashlight_decoder.lm_weight=<lm_weight> \
--flashlight_decoder.word_insertion_score=<word_insertion_score>
其中
beam_size
是解码器在每个步骤中保留的最大假设数beam_size_token
是解码器在每个步骤中考虑的最大令牌数beam_threshold
是修剪假设的阈值lm_weight
是在对假设评分时使用的语言模型的权重word_insertion_score
是在对假设评分时使用的词语插入分数
对于高级用户,还可以指定其他解码器超参数。有关这些参数及其描述的列表,请参阅 Riva-build 可选参数。
Flashlight 解码器词典#
Riva 中使用的 Flashlight 解码器是基于词典的解码器,仅发出 riva-build
命令传递的解码器词汇表文件中存在的词语。用于在快速入门脚本中生成 ASR pipeline 的解码器词汇表文件包括涵盖广泛领域的词语,并且应该为大多数应用提供准确的转录。
还可以使用您自己的解码器词汇表文件通过使用 riva-build
命令的参数 --decoding_vocab
来构建 ASR pipeline。例如,您可以从快速入门脚本中第 Pipeline Configuration 节中用于生成 ASR pipeline 的 riva-build
命令开始,并提供您自己的词典解码器词汇表文件。您将需要确保感兴趣的词语在解码器词汇表文件中。Riva ServiceMaker 会自动对解码器词汇表文件中的词语进行分词。解码器词汇表文件中每个词语的分词数可以使用 --flashlight_decoder.num_tokenization
参数进行控制。
(高级)手动添加词典中词语的其他分词#
还可以通过执行以下步骤手动为解码器词汇表中的词语添加其他分词
上一节中提供的 riva-build
和 riva-deploy
命令将词典存储在 Triton 模型库的 /data/models/citrinet-1024-en-US-asr-streaming-ctc-decoder-cpu-streaming/1/lexicon.txt
文件中。
要向词典添加其他分词,请复制词典文件
cp /data/models/citrinet-1024-en-US-asr-streaming-ctc-decoder-cpu-streaming/1/lexicon.txt decoding_lexicon.txt
并为感兴趣的词语添加 SentencePiece 分词。例如,您可以添加
manu ▁ma n u
manu ▁man n n ew
manu ▁man n ew
到 decoding_lexicon.txt
文件,以便如果声学模型预测这些令牌,则在转录中生成词语 manu
。您将需要确保新行遵循与文件其余部分相同的缩进/空格模式,并且使用的令牌是分词器模型的一部分。完成此操作后,通过将 --decoding_lexicon=decoding_lexicon.txt
传递给 riva-build
而不是 --decoding_vocab=decoding_vocab.txt
,使用新的解码词典重新生成模型库。
Flashlight 解码器无词典#
Flashlight 解码器也可以在没有词典的情况下使用。无词典解码使用基于字符的语言模型执行。可以通过将 --flashlight_decoder.use_lexicon_free_decoding=True
添加到 riva-build
并通过 --decoding_language_model_binary=<path/to/charlm>
指定基于字符的语言模型来启用 Flashlight 的无词典解码。
OpenSeq2Seq 解码器#
Riva 使用 OpenSeq2Seq 解码器进行使用语言模型的束搜索解码。例如
riva-build speech_recognition \
<rmir_filename>:<key> <riva_filename>:<key> \
--name=citrinet-1024-zh-CN-asr-streaming \
--ms_per_timestep=80 \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=os2s \
--os2s_decoder.language_model_alpha=0.5 \
--os2s_decoder.language_model_beta=1.0 \
--os2s_decoder.beam_search_width=128 \
--language_code=zh-CN
其中
--os2s_decoder.language_model_alpha
是在束搜索期间赋予语言模型的权重。--os2s_decoder.language_model_beta
是词语插入分数。--os2s_decoder.beam_search_width
是在束搜索的每个步骤中保留的部分假设的数量。
所有这些参数都会影响性能。随着这些参数值的增加,延迟也会增加。建议的范围如下所示。
参数 |
最小值 |
最大值 |
---|---|---|
|
16 |
64 |
|
0.5 |
1.5 |
|
1.0 |
3.0 |
话语开始/结束检测#
Riva ASR 使用一种算法来检测话语的开始和结束。此算法用于重置 ASR 解码器状态,并触发对标点模型的调用。默认情况下,当 300 毫秒窗口中 20% 的帧具有非空白字符时,话语的开始会被标记。当 800 毫秒窗口中 98% 的帧为空白字符时,话语的结束会被标记。您可以使用以下 riva-build
参数为您的特定用例调整这些值
--endpointing.start_history=300 \
--endpointing.start_th=0.2 \
--endpointing.stop_history=800 \
--endpointing.stop_th=0.98
此外,可以通过将 --endpointing_type=none
传递给 riva-build
来禁用话语开始/结束检测。
请注意,在这种情况下,解码器状态会在客户端发送完整音频信号后重置。同样,标点模型仅被调用一次。
基于神经Voice Activity Detection#
可以在 Riva ASR 中使用基于神经Voice Activity Detection (VAD) 算法。这可以帮助滤除音频中的噪声,并可以帮助减少 ASR 转录中出现的不真实词语。要在 ASR pipeline 中使用基于神经 VAD 算法,请将以下附加参数传递给 riva-build
Silero VAD#
<silero_vad_riva_filename>:<encryption_key>
--vad_type=silero
--neural_vad_nn.optimization_graph_level=-1
--neural_vad.filter_speech_first false
--neural_vad.min_duration_on=0.2
--neural_vad.onset=0.85
--neural_vad.offset=0.6
其中
<silero_vad_riva_filename>
是要使用的.riva
silero VAD 模型。例如,您可以使用 NGC 上提供的 Silero VAD Riva 模型。<encryption_key>
是用于加密文件的密钥。在 NGC 上上传的预训练 Riva 模型的加密密钥是tlt_encode
。
MarbleNet VAD#
<marblenet_vad_riva_filename>:<encryption_key>
--vad_type=neural
--neural_vad_nn.optimization_graph_level=-1
其中
<marblenet_vad_riva_filename>
是要使用的.riva
marblenet VAD 模型。例如,您可以使用 NGC 上提供的 MarbleNet VAD Riva 模型。<encryption_key>
是用于加密文件的密钥。在 NGC 上上传的预训练 Riva 模型的加密密钥是tlt_encode
。
请注意,在 ASR pipeline 中使用神经 VAD 组件将对已部署的 Riva ASR 服务器的延迟和吞吐量产生影响。
生成多个转录假设#
默认情况下,Riva ASR pipeline 配置为仅为每个话语生成最佳转录假设。可以通过将参数 --max_supported_transcripts=N
传递给 riva-build
命令来生成多个转录假设,其中 N
是要生成的最大假设数。通过这些更改,客户端应用程序可以通过将 RecognitionConfig
的 max_alternatives
字段设置为大于 1 的值来检索多个假设。
Chunk Size 和 Padding Size 对性能和准确性的影响(高级)#
用于配置 Riva ASR 的 chunk_size
和 padding_size
参数可能会对准确性和性能产生重大影响。这些参数的简要说明可以在 Riva-build 可选参数 部分找到。Riva 提供了预配置的 ASR pipeline,其中预设了 chunk_size
和 padding_size
的值:低延迟流式配置、高吞吐量流式配置和离线配置。这些配置应适合大多数部署场景。这些配置使用的 chunk_size
和 padding_size
值可以在 Pipeline Configuration 部分的表格中找到。
chunk_size
参数是 Riva 服务器为每个流式请求处理的音频块的持续时间(以秒为单位)。因此,在流式模式下,Riva 每 chunk_size
秒音频返回一个响应。因此,较低的 chunk_size
值将减少用户感知的延迟,因为转录将更频繁地更新。
padding_size
参数是以秒为单位的填充持续时间,该填充预先添加到 chunk_size
并附加到 chunk_size
。Riva 声学模型处理与音频持续时间相对应的输入张量 2*(padding_size) + chunk_size
,用于其接收的每个新音频块。增加 padding_size
或 chunk_size
通常有助于提高转录的准确性,因为声学模型可以访问更多上下文。但是,增加 padding_size
会减少 Riva ASR 支持的最大并发流数,因为它会增加馈送到声学模型的每个新块的输入张量的大小。
在多个 ASR Pipeline 之间共享声学和特征提取器模型(高级)#
可以配置 Riva ASR 服务,以便多个 ASR pipeline 共享相同的特征提取器和声学模型,从而减少 GPU 内存使用量。例如,此选项可用于部署多个 ASR pipeline,其中每个 pipeline 使用不同的语言模型,但共享相同的声学模型和特征提取器。这可以通过在 riva-build
命令中指定参数 acoustic_model_name
和 featurizer_name
来实现
riva-build speech_recognition \
/servicemaker-dev/<rmir_filename>:<encryption_key> \
/servicemaker-dev/<riva_filename>:<encryption_key> \
--name=<pipeline_name> \
--acoustic_model_name=<acoustic_model_name> \
--featurizer_name=<featurizer_name> \
--wfst_tokenizer_model=<wfst_tokenizer_model> \
--wfst_verbalizer_model=<wfst_verbalizer_model> \
--decoder_type=greedy
其中
<acoustic_model_name>
是 ASR pipeline 的声学模型组件的用户定义名称<featurizer_name>
是 ASR pipeline 的特征提取器组件的用户定义名称
如果构建了多个 ASR pipeline,每个 pipeline 具有不同的 name
,但具有相同的 acoustic_model_name
和 featurizer_name
,则它们将共享相同的声学和特征提取器模型。
运行 riva-deploy
命令时,必须传递 -f
选项,以确保正确初始化共享声学模型和特征提取器的所有 ASR pipeline。
注意
<acoustic_model_name>
和 <featurizer_name>
是全局的,并且可能在模型 pipeline 之间冲突。仅当您知道将要部署哪些其他模型并且想要在不同的 ASR pipeline 之间共享特征提取器和/或声学模型时,才覆盖此项。指定 <acoustic_model_name>
时,应确保声学模型权重或输入形状不会有任何不兼容之处。同样,指定 <featurizer_name>
时,应确保具有相同 <featurizer_name>
的所有 ASR pipeline 都使用相同的特征提取器参数。
Riva-build 可选参数#
有关传递给 riva-build
以自定义 ASR pipeline 的参数的详细信息,请发出
riva-build speech_recognition -h
以下列表包括当前 riva-build
识别的所有可选参数的说明
usage: riva-build speech_recognition [-h] [-f] [-v]
[--language_code LANGUAGE_CODE]
[--instance_group_count INSTANCE_GROUP_COUNT]
[--kind KIND]
[--max_batch_size MAX_BATCH_SIZE]
[--max_queue_delay_microseconds MAX_QUEUE_DELAY_MICROSECONDS]
[--batching_type BATCHING_TYPE]
[--acoustic_model_name ACOUSTIC_MODEL_NAME]
[--featurizer_name FEATURIZER_NAME]
[--name NAME] [--streaming STREAMING]
[--offline] [--vad_type VAD_TYPE]
[--unified_acoustic_model]
[--endpointing_type ENDPOINTING_TYPE]
[--chunk_size CHUNK_SIZE]
[--padding_factor PADDING_FACTOR]
[--left_padding_size LEFT_PADDING_SIZE]
[--right_padding_size RIGHT_PADDING_SIZE]
[--padding_size PADDING_SIZE]
[--max_supported_transcripts MAX_SUPPORTED_TRANSCRIPTS]
[--ms_per_timestep MS_PER_TIMESTEP]
[--force_decoder_reset_after_ms FORCE_DECODER_RESET_AFTER_MS]
[--lattice_beam LATTICE_BEAM]
[--decoding_language_model_arpa DECODING_LANGUAGE_MODEL_ARPA]
[--decoding_language_model_binary DECODING_LANGUAGE_MODEL_BINARY]
[--decoding_language_model_fst DECODING_LANGUAGE_MODEL_FST]
[--decoding_language_model_words DECODING_LANGUAGE_MODEL_WORDS]
[--rescoring_language_model_arpa RESCORING_LANGUAGE_MODEL_ARPA]
[--decoding_language_model_carpa DECODING_LANGUAGE_MODEL_CARPA]
[--rescoring_language_model_carpa RESCORING_LANGUAGE_MODEL_CARPA]
[--decoding_lexicon DECODING_LEXICON]
[--decoding_vocab DECODING_VOCAB]
[--tokenizer_model TOKENIZER_MODEL]
[--decoder_type DECODER_TYPE]
[--stddev_floor STDDEV_FLOOR]
[--wfst_tokenizer_model WFST_TOKENIZER_MODEL]
[--wfst_verbalizer_model WFST_VERBALIZER_MODEL]
[--wfst_pre_process_model WFST_PRE_PROCESS_MODEL]
[--wfst_post_process_model WFST_POST_PROCESS_MODEL]
[--speech_hints_model SPEECH_HINTS_MODEL]
[--buffer_look_ahead BUFFER_LOOK_AHEAD]
[--buffer_context_history BUFFER_CONTEXT_HISTORY]
[--buffer_threshold BUFFER_THRESHOLD]
[--buffer_max_timeout_frames BUFFER_MAX_TIMEOUT_FRAMES]
[--profane_words_file PROFANE_WORDS_FILE]
[--append_space_to_transcripts APPEND_SPACE_TO_TRANSCRIPTS]
[--return_separate_utterances RETURN_SEPARATE_UTTERANCES]
[--mel_basis_file_path MEL_BASIS_FILE_PATH]
[--feature_extractor_type FEATURE_EXTRACTOR_TYPE]
[--torch_feature_type TORCH_FEATURE_TYPE]
[--torch_feature_device TORCH_FEATURE_DEVICE]
[--execution_environment_path EXECUTION_ENVIRONMENT_PATH]
[--share_flags SHARE_FLAGS]
[--featurizer.max_sequence_idle_microseconds FEATURIZER.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--featurizer.max_batch_size FEATURIZER.MAX_BATCH_SIZE]
[--featurizer.min_batch_size FEATURIZER.MIN_BATCH_SIZE]
[--featurizer.opt_batch_size FEATURIZER.OPT_BATCH_SIZE]
[--featurizer.preferred_batch_size FEATURIZER.PREFERRED_BATCH_SIZE]
[--featurizer.batching_type FEATURIZER.BATCHING_TYPE]
[--featurizer.preserve_ordering FEATURIZER.PRESERVE_ORDERING]
[--featurizer.instance_group_count FEATURIZER.INSTANCE_GROUP_COUNT]
[--featurizer.max_queue_delay_microseconds FEATURIZER.MAX_QUEUE_DELAY_MICROSECONDS]
[--featurizer.optimization_graph_level FEATURIZER.OPTIMIZATION_GRAPH_LEVEL]
[--featurizer.max_execution_batch_size FEATURIZER.MAX_EXECUTION_BATCH_SIZE]
[--featurizer.gain FEATURIZER.GAIN]
[--featurizer.dither FEATURIZER.DITHER]
[--featurizer.use_utterance_norm_params FEATURIZER.USE_UTTERANCE_NORM_PARAMS]
[--featurizer.precalc_norm_time_steps FEATURIZER.PRECALC_NORM_TIME_STEPS]
[--featurizer.precalc_norm_params FEATURIZER.PRECALC_NORM_PARAMS]
[--featurizer.norm_per_feature FEATURIZER.NORM_PER_FEATURE]
[--featurizer.mean FEATURIZER.MEAN]
[--featurizer.stddev FEATURIZER.STDDEV]
[--featurizer.transpose FEATURIZER.TRANSPOSE]
[--featurizer.padding_size FEATURIZER.PADDING_SIZE]
[--featurizer.int64_features_length FEATURIZER.INT64_FEATURES_LENGTH]
[--nn.max_sequence_idle_microseconds NN.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--nn.max_batch_size NN.MAX_BATCH_SIZE]
[--nn.min_batch_size NN.MIN_BATCH_SIZE]
[--nn.opt_batch_size NN.OPT_BATCH_SIZE]
[--nn.preferred_batch_size NN.PREFERRED_BATCH_SIZE]
[--nn.batching_type NN.BATCHING_TYPE]
[--nn.preserve_ordering NN.PRESERVE_ORDERING]
[--nn.instance_group_count NN.INSTANCE_GROUP_COUNT]
[--nn.max_queue_delay_microseconds NN.MAX_QUEUE_DELAY_MICROSECONDS]
[--nn.optimization_graph_level NN.OPTIMIZATION_GRAPH_LEVEL]
[--nn.trt_max_workspace_size NN.TRT_MAX_WORKSPACE_SIZE]
[--nn.use_onnx_runtime]
[--nn.use_torchscript]
[--nn.use_trt_fp32]
[--nn.fp16_needs_obey_precision_pass]
[--nn.am_len_input_use_int64 NN.AM_LEN_INPUT_USE_INT64]
[--nn.language_code NN.LANGUAGE_CODE]
[--nn.engine_dir NN.ENGINE_DIR]
[--nn.EXECUTION_ENV_PATH NN.EXECUTION_ENV_PATH]
[--endpointing.max_sequence_idle_microseconds ENDPOINTING.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--endpointing.max_batch_size ENDPOINTING.MAX_BATCH_SIZE]
[--endpointing.min_batch_size ENDPOINTING.MIN_BATCH_SIZE]
[--endpointing.opt_batch_size ENDPOINTING.OPT_BATCH_SIZE]
[--endpointing.preferred_batch_size ENDPOINTING.PREFERRED_BATCH_SIZE]
[--endpointing.batching_type ENDPOINTING.BATCHING_TYPE]
[--endpointing.preserve_ordering ENDPOINTING.PRESERVE_ORDERING]
[--endpointing.instance_group_count ENDPOINTING.INSTANCE_GROUP_COUNT]
[--endpointing.max_queue_delay_microseconds ENDPOINTING.MAX_QUEUE_DELAY_MICROSECONDS]
[--endpointing.optimization_graph_level ENDPOINTING.OPTIMIZATION_GRAPH_LEVEL]
[--endpointing.ms_per_timestep ENDPOINTING.MS_PER_TIMESTEP]
[--endpointing.start_history ENDPOINTING.START_HISTORY]
[--endpointing.stop_history ENDPOINTING.STOP_HISTORY]
[--endpointing.stop_history_eou ENDPOINTING.STOP_HISTORY_EOU]
[--endpointing.start_th ENDPOINTING.START_TH]
[--endpointing.stop_th ENDPOINTING.STOP_TH]
[--endpointing.stop_th_eou ENDPOINTING.STOP_TH_EOU]
[--endpointing.residue_blanks_at_start ENDPOINTING.RESIDUE_BLANKS_AT_START]
[--endpointing.residue_blanks_at_end ENDPOINTING.RESIDUE_BLANKS_AT_END]
[--endpointing.vocab_file ENDPOINTING.VOCAB_FILE]
[--neural_vad.max_sequence_idle_microseconds NEURAL_VAD.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--neural_vad.max_batch_size NEURAL_VAD.MAX_BATCH_SIZE]
[--neural_vad.min_batch_size NEURAL_VAD.MIN_BATCH_SIZE]
[--neural_vad.opt_batch_size NEURAL_VAD.OPT_BATCH_SIZE]
[--neural_vad.preferred_batch_size NEURAL_VAD.PREFERRED_BATCH_SIZE]
[--neural_vad.batching_type NEURAL_VAD.BATCHING_TYPE]
[--neural_vad.preserve_ordering NEURAL_VAD.PRESERVE_ORDERING]
[--neural_vad.instance_group_count NEURAL_VAD.INSTANCE_GROUP_COUNT]
[--neural_vad.max_queue_delay_microseconds NEURAL_VAD.MAX_QUEUE_DELAY_MICROSECONDS]
[--neural_vad.optimization_graph_level NEURAL_VAD.OPTIMIZATION_GRAPH_LEVEL]
[--neural_vad.load_model NEURAL_VAD.LOAD_MODEL]
[--neural_vad.batch_mode NEURAL_VAD.BATCH_MODE]
[--neural_vad.decoupled_mode NEURAL_VAD.DECOUPLED_MODE]
[--neural_vad.onset NEURAL_VAD.ONSET]
[--neural_vad.offset NEURAL_VAD.OFFSET]
[--neural_vad.pad_onset NEURAL_VAD.PAD_ONSET]
[--neural_vad.pad_offset NEURAL_VAD.PAD_OFFSET]
[--neural_vad.min_duration_on NEURAL_VAD.MIN_DURATION_ON]
[--neural_vad.min_duration_off NEURAL_VAD.MIN_DURATION_OFF]
[--neural_vad.filter_speech_first NEURAL_VAD.FILTER_SPEECH_FIRST]
[--neural_vad.features_mask_value NEURAL_VAD.FEATURES_MASK_VALUE]
[--neural_vad_nn.max_sequence_idle_microseconds NEURAL_VAD_NN.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--neural_vad_nn.max_batch_size NEURAL_VAD_NN.MAX_BATCH_SIZE]
[--neural_vad_nn.min_batch_size NEURAL_VAD_NN.MIN_BATCH_SIZE]
[--neural_vad_nn.opt_batch_size NEURAL_VAD_NN.OPT_BATCH_SIZE]
[--neural_vad_nn.preferred_batch_size NEURAL_VAD_NN.PREFERRED_BATCH_SIZE]
[--neural_vad_nn.batching_type NEURAL_VAD_NN.BATCHING_TYPE]
[--neural_vad_nn.preserve_ordering NEURAL_VAD_NN.PRESERVE_ORDERING]
[--neural_vad_nn.instance_group_count NEURAL_VAD_NN.INSTANCE_GROUP_COUNT]
[--neural_vad_nn.max_queue_delay_microseconds NEURAL_VAD_NN.MAX_QUEUE_DELAY_MICROSECONDS]
[--neural_vad_nn.optimization_graph_level NEURAL_VAD_NN.OPTIMIZATION_GRAPH_LEVEL]
[--neural_vad_nn.trt_max_workspace_size NEURAL_VAD_NN.TRT_MAX_WORKSPACE_SIZE]
[--neural_vad_nn.use_onnx_runtime]
[--neural_vad_nn.use_torchscript]
[--neural_vad_nn.use_trt_fp32]
[--neural_vad_nn.fp16_needs_obey_precision_pass]
[--neural_vad_nn.onnx_path NEURAL_VAD_NN.ONNX_PATH]
[--neural_vad_nn.sample_rate NEURAL_VAD_NN.SAMPLE_RATE]
[--neural_vad_nn.min_seq_len NEURAL_VAD_NN.MIN_SEQ_LEN]
[--neural_vad_nn.opt_seq_len NEURAL_VAD_NN.OPT_SEQ_LEN]
[--neural_vad_nn.max_seq_len NEURAL_VAD_NN.MAX_SEQ_LEN]
[--flashlight_decoder.max_sequence_idle_microseconds FLASHLIGHT_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--flashlight_decoder.max_batch_size FLASHLIGHT_DECODER.MAX_BATCH_SIZE]
[--flashlight_decoder.min_batch_size FLASHLIGHT_DECODER.MIN_BATCH_SIZE]
[--flashlight_decoder.opt_batch_size FLASHLIGHT_DECODER.OPT_BATCH_SIZE]
[--flashlight_decoder.preferred_batch_size FLASHLIGHT_DECODER.PREFERRED_BATCH_SIZE]
[--flashlight_decoder.batching_type FLASHLIGHT_DECODER.BATCHING_TYPE]
[--flashlight_decoder.preserve_ordering FLASHLIGHT_DECODER.PRESERVE_ORDERING]
[--flashlight_decoder.instance_group_count FLASHLIGHT_DECODER.INSTANCE_GROUP_COUNT]
[--flashlight_decoder.max_queue_delay_microseconds FLASHLIGHT_DECODER.MAX_QUEUE_DELAY_MICROSECONDS]
[--flashlight_decoder.optimization_graph_level FLASHLIGHT_DECODER.OPTIMIZATION_GRAPH_LEVEL]
[--flashlight_decoder.max_execution_batch_size FLASHLIGHT_DECODER.MAX_EXECUTION_BATCH_SIZE]
[--flashlight_decoder.decoder_type FLASHLIGHT_DECODER.DECODER_TYPE]
[--flashlight_decoder.padding_size FLASHLIGHT_DECODER.PADDING_SIZE]
[--flashlight_decoder.max_supported_transcripts FLASHLIGHT_DECODER.MAX_SUPPORTED_TRANSCRIPTS]
[--flashlight_decoder.asr_model_delay FLASHLIGHT_DECODER.ASR_MODEL_DELAY]
[--flashlight_decoder.ms_per_timestep FLASHLIGHT_DECODER.MS_PER_TIMESTEP]
[--flashlight_decoder.vocab_file FLASHLIGHT_DECODER.VOCAB_FILE]
[--flashlight_decoder.decoder_num_worker_threads FLASHLIGHT_DECODER.DECODER_NUM_WORKER_THREADS]
[--flashlight_decoder.force_decoder_reset_after_ms FLASHLIGHT_DECODER.FORCE_DECODER_RESET_AFTER_MS]
[--flashlight_decoder.language_model_file FLASHLIGHT_DECODER.LANGUAGE_MODEL_FILE]
[--flashlight_decoder.lexicon_file FLASHLIGHT_DECODER.LEXICON_FILE]
[--flashlight_decoder.use_lexicon_free_decoding FLASHLIGHT_DECODER.USE_LEXICON_FREE_DECODING]
[--flashlight_decoder.beam_size FLASHLIGHT_DECODER.BEAM_SIZE]
[--flashlight_decoder.beam_size_token FLASHLIGHT_DECODER.BEAM_SIZE_TOKEN]
[--flashlight_decoder.beam_threshold FLASHLIGHT_DECODER.BEAM_THRESHOLD]
[--flashlight_decoder.lm_weight FLASHLIGHT_DECODER.LM_WEIGHT]
[--flashlight_decoder.blank_token FLASHLIGHT_DECODER.BLANK_TOKEN]
[--flashlight_decoder.sil_token FLASHLIGHT_DECODER.SIL_TOKEN]
[--flashlight_decoder.unk_token FLASHLIGHT_DECODER.UNK_TOKEN]
[--flashlight_decoder.set_default_index_to_unk_token FLASHLIGHT_DECODER.SET_DEFAULT_INDEX_TO_UNK_TOKEN]
[--flashlight_decoder.word_insertion_score FLASHLIGHT_DECODER.WORD_INSERTION_SCORE]
[--flashlight_decoder.forerunner_beam_size FLASHLIGHT_DECODER.FORERUNNER_BEAM_SIZE]
[--flashlight_decoder.forerunner_beam_size_token FLASHLIGHT_DECODER.FORERUNNER_BEAM_SIZE_TOKEN]
[--flashlight_decoder.forerunner_beam_threshold FLASHLIGHT_DECODER.FORERUNNER_BEAM_THRESHOLD]
[--flashlight_decoder.smearing_mode FLASHLIGHT_DECODER.SMEARING_MODE]
[--flashlight_decoder.forerunner_use_lm FLASHLIGHT_DECODER.FORERUNNER_USE_LM]
[--flashlight_decoder.num_tokenization FLASHLIGHT_DECODER.NUM_TOKENIZATION]
[--flashlight_decoder.unk_score FLASHLIGHT_DECODER.UNK_SCORE]
[--flashlight_decoder.log_add FLASHLIGHT_DECODER.LOG_ADD]
[--pass_through_decoder.max_sequence_idle_microseconds PASS_THROUGH_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--pass_through_decoder.max_batch_size PASS_THROUGH_DECODER.MAX_BATCH_SIZE]
[--pass_through_decoder.min_batch_size PASS_THROUGH_DECODER.MIN_BATCH_SIZE]
[--pass_through_decoder.opt_batch_size PASS_THROUGH_DECODER.OPT_BATCH_SIZE]
[--pass_through_decoder.preferred_batch_size PASS_THROUGH_DECODER.PREFERRED_BATCH_SIZE]
[--pass_through_decoder.batching_type PASS_THROUGH_DECODER.BATCHING_TYPE]
[--pass_through_decoder.preserve_ordering PASS_THROUGH_DECODER.PRESERVE_ORDERING]
[--pass_through_decoder.instance_group_count PASS_THROUGH_DECODER.INSTANCE_GROUP_COUNT]
[--pass_through_decoder.max_queue_delay_microseconds PASS_THROUGH_DECODER.MAX_QUEUE_DELAY_MICROSECONDS]
[--pass_through_decoder.optimization_graph_level PASS_THROUGH_DECODER.OPTIMIZATION_GRAPH_LEVEL]
[--pass_through_decoder.vocab_file PASS_THROUGH_DECODER.VOCAB_FILE]
[--pass_through_decoder.asr_model_delay PASS_THROUGH_DECODER.ASR_MODEL_DELAY]
[--nemo_decoder.max_sequence_idle_microseconds NEMO_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--nemo_decoder.max_batch_size NEMO_DECODER.MAX_BATCH_SIZE]
[--nemo_decoder.min_batch_size NEMO_DECODER.MIN_BATCH_SIZE]
[--nemo_decoder.opt_batch_size NEMO_DECODER.OPT_BATCH_SIZE]
[--nemo_decoder.preferred_batch_size NEMO_DECODER.PREFERRED_BATCH_SIZE]
[--nemo_decoder.batching_type NEMO_DECODER.BATCHING_TYPE]
[--nemo_decoder.preserve_ordering NEMO_DECODER.PRESERVE_ORDERING]
[--nemo_decoder.instance_group_count NEMO_DECODER.INSTANCE_GROUP_COUNT]
[--nemo_decoder.max_queue_delay_microseconds NEMO_DECODER.MAX_QUEUE_DELAY_MICROSECONDS]
[--nemo_decoder.optimization_graph_level NEMO_DECODER.OPTIMIZATION_GRAPH_LEVEL]
[--nemo_decoder.vocab_file NEMO_DECODER.VOCAB_FILE]
[--nemo_decoder.asr_model_delay NEMO_DECODER.ASR_MODEL_DELAY]
[--nemo_decoder.compute_dtype]
[--nemo_decoder.amp_dtype]
[--nemo_decoder.nemo_decoder_type NEMO_DECODER.NEMO_DECODER_TYPE]
[--nemo_decoder.use_stateful_decoding]
[--nemo_decoder.use_amp]
[--trtllm_decoder.max_sequence_idle_microseconds TRTLLM_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--trtllm_decoder.max_batch_size TRTLLM_DECODER.MAX_BATCH_SIZE]
[--trtllm_decoder.min_batch_size TRTLLM_DECODER.MIN_BATCH_SIZE]
[--trtllm_decoder.opt_batch_size TRTLLM_DECODER.OPT_BATCH_SIZE]
[--trtllm_decoder.preferred_batch_size TRTLLM_DECODER.PREFERRED_BATCH_SIZE]
[--trtllm_decoder.batching_type TRTLLM_DECODER.BATCHING_TYPE]
[--trtllm_decoder.preserve_ordering TRTLLM_DECODER.PRESERVE_ORDERING]
[--trtllm_decoder.instance_group_count TRTLLM_DECODER.INSTANCE_GROUP_COUNT]
[--trtllm_decoder.max_queue_delay_microseconds TRTLLM_DECODER.MAX_QUEUE_DELAY_MICROSECONDS]
[--trtllm_decoder.optimization_graph_level TRTLLM_DECODER.OPTIMIZATION_GRAPH_LEVEL]
[--trtllm_decoder.world_size TRTLLM_DECODER.WORLD_SIZE]
[--trtllm_decoder.quantize_dir TRTLLM_DECODER.QUANTIZE_DIR]
[--trtllm_decoder.dtype TRTLLM_DECODER.DTYPE]
[--trtllm_decoder.max_input_len TRTLLM_DECODER.MAX_INPUT_LEN]
[--trtllm_decoder.max_output_len TRTLLM_DECODER.MAX_OUTPUT_LEN]
[--trtllm_decoder.max_beam_width TRTLLM_DECODER.MAX_BEAM_WIDTH]
[--trtllm_decoder.use_gpt_attention_plugin TRTLLM_DECODER.USE_GPT_ATTENTION_PLUGIN]
[--trtllm_decoder.use_bert_attention_plugin TRTLLM_DECODER.USE_BERT_ATTENTION_PLUGIN]
[--trtllm_decoder.use_gemm_plugin TRTLLM_DECODER.USE_GEMM_PLUGIN]
[--trtllm_decoder.remove_input_padding TRTLLM_DECODER.REMOVE_INPUT_PADDING]
[--trtllm_decoder.enable_context_fmha TRTLLM_DECODER.ENABLE_CONTEXT_FMHA]
[--trtllm_decoder.use_weight_only TRTLLM_DECODER.USE_WEIGHT_ONLY]
[--trtllm_decoder.weight_only_precision TRTLLM_DECODER.WEIGHT_ONLY_PRECISION]
[--trtllm_decoder.int8_kv_cache TRTLLM_DECODER.INT8_KV_CACHE]
[--trtllm_decoder.debug_mode TRTLLM_DECODER.DEBUG_MODE]
[--trtllm_decoder.vocab_file TRTLLM_DECODER.VOCAB_FILE]
[--trtllm_decoder.asr_model_delay TRTLLM_DECODER.ASR_MODEL_DELAY]
[--greedy_decoder.max_sequence_idle_microseconds GREEDY_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--greedy_decoder.max_batch_size GREEDY_DECODER.MAX_BATCH_SIZE]
[--greedy_decoder.min_batch_size GREEDY_DECODER.MIN_BATCH_SIZE]
[--greedy_decoder.opt_batch_size GREEDY_DECODER.OPT_BATCH_SIZE]
[--greedy_decoder.preferred_batch_size GREEDY_DECODER.PREFERRED_BATCH_SIZE]
[--greedy_decoder.batching_type GREEDY_DECODER.BATCHING_TYPE]
[--greedy_decoder.preserve_ordering GREEDY_DECODER.PRESERVE_ORDERING]
[--greedy_decoder.instance_group_count GREEDY_DECODER.INSTANCE_GROUP_COUNT]
[--greedy_decoder.max_queue_delay_microseconds GREEDY_DECODER.MAX_QUEUE_DELAY_MICROSECONDS]
[--greedy_decoder.optimization_graph_level GREEDY_DECODER.OPTIMIZATION_GRAPH_LEVEL]
[--greedy_decoder.max_execution_batch_size GREEDY_DECODER.MAX_EXECUTION_BATCH_SIZE]
[--greedy_decoder.decoder_type GREEDY_DECODER.DECODER_TYPE]
[--greedy_decoder.padding_size GREEDY_DECODER.PADDING_SIZE]
[--greedy_decoder.max_supported_transcripts GREEDY_DECODER.MAX_SUPPORTED_TRANSCRIPTS]
[--greedy_decoder.asr_model_delay GREEDY_DECODER.ASR_MODEL_DELAY]
[--greedy_decoder.ms_per_timestep GREEDY_DECODER.MS_PER_TIMESTEP]
[--greedy_decoder.vocab_file GREEDY_DECODER.VOCAB_FILE]
[--greedy_decoder.decoder_num_worker_threads GREEDY_DECODER.DECODER_NUM_WORKER_THREADS]
[--greedy_decoder.force_decoder_reset_after_ms GREEDY_DECODER.FORCE_DECODER_RESET_AFTER_MS]
[--os2s_decoder.max_sequence_idle_microseconds OS2S_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--os2s_decoder.max_batch_size OS2S_DECODER.MAX_BATCH_SIZE]
[--os2s_decoder.min_batch_size OS2S_DECODER.MIN_BATCH_SIZE]
[--os2s_decoder.opt_batch_size OS2S_DECODER.OPT_BATCH_SIZE]
[--os2s_decoder.preferred_batch_size OS2S_DECODER.PREFERRED_BATCH_SIZE]
[--os2s_decoder.batching_type OS2S_DECODER.BATCHING_TYPE]
[--os2s_decoder.preserve_ordering OS2S_DECODER.PRESERVE_ORDERING]
[--os2s_decoder.instance_group_count OS2S_DECODER.INSTANCE_GROUP_COUNT]
[--os2s_decoder.max_queue_delay_microseconds OS2S_DECODER.MAX_QUEUE_DELAY_MICROSECONDS]
[--os2s_decoder.optimization_graph_level OS2S_DECODER.OPTIMIZATION_GRAPH_LEVEL]
[--os2s_decoder.max_execution_batch_size OS2S_DECODER.MAX_EXECUTION_BATCH_SIZE]
[--os2s_decoder.decoder_type OS2S_DECODER.DECODER_TYPE]
[--os2s_decoder.padding_size OS2S_DECODER.PADDING_SIZE]
[--os2s_decoder.max_supported_transcripts OS2S_DECODER.MAX_SUPPORTED_TRANSCRIPTS]
[--os2s_decoder.asr_model_delay OS2S_DECODER.ASR_MODEL_DELAY]
[--os2s_decoder.ms_per_timestep OS2S_DECODER.MS_PER_TIMESTEP]
[--os2s_decoder.vocab_file OS2S_DECODER.VOCAB_FILE]
[--os2s_decoder.decoder_num_worker_threads OS2S_DECODER.DECODER_NUM_WORKER_THREADS]
[--os2s_decoder.force_decoder_reset_after_ms OS2S_DECODER.FORCE_DECODER_RESET_AFTER_MS]
[--os2s_decoder.language_model_file OS2S_DECODER.LANGUAGE_MODEL_FILE]
[--os2s_decoder.beam_search_width OS2S_DECODER.BEAM_SEARCH_WIDTH]
[--os2s_decoder.language_model_alpha OS2S_DECODER.LANGUAGE_MODEL_ALPHA]
[--os2s_decoder.language_model_beta OS2S_DECODER.LANGUAGE_MODEL_BETA]
[--kaldi_decoder.max_sequence_idle_microseconds KALDI_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--kaldi_decoder.max_batch_size KALDI_DECODER.MAX_BATCH_SIZE]
[--kaldi_decoder.min_batch_size KALDI_DECODER.MIN_BATCH_SIZE]
[--kaldi_decoder.opt_batch_size KALDI_DECODER.OPT_BATCH_SIZE]
[--kaldi_decoder.preferred_batch_size KALDI_DECODER.PREFERRED_BATCH_SIZE]
[--kaldi_decoder.batching_type KALDI_DECODER.BATCHING_TYPE]
[--kaldi_decoder.preserve_ordering KALDI_DECODER.PRESERVE_ORDERING]
[--kaldi_decoder.instance_group_count KALDI_DECODER.INSTANCE_GROUP_COUNT]
[--kaldi_decoder.max_queue_delay_microseconds KALDI_DECODER.MAX_QUEUE_DELAY_MICROSECONDS]
[--kaldi_decoder.optimization_graph_level KALDI_DECODER.OPTIMIZATION_GRAPH_LEVEL]
[--kaldi_decoder.max_execution_batch_size KALDI_DECODER.MAX_EXECUTION_BATCH_SIZE]
[--kaldi_decoder.decoder_type KALDI_DECODER.DECODER_TYPE]
[--kaldi_decoder.padding_size KALDI_DECODER.PADDING_SIZE]
[--kaldi_decoder.max_supported_transcripts KALDI_DECODER.MAX_SUPPORTED_TRANSCRIPTS]
[--kaldi_decoder.asr_model_delay KALDI_DECODER.ASR_MODEL_DELAY]
[--kaldi_decoder.ms_per_timestep KALDI_DECODER.MS_PER_TIMESTEP]
[--kaldi_decoder.vocab_file KALDI_DECODER.VOCAB_FILE]
[--kaldi_decoder.decoder_num_worker_threads KALDI_DECODER.DECODER_NUM_WORKER_THREADS]
[--kaldi_decoder.force_decoder_reset_after_ms KALDI_DECODER.FORCE_DECODER_RESET_AFTER_MS]
[--kaldi_decoder.fst_filename KALDI_DECODER.FST_FILENAME]
[--kaldi_decoder.word_syms_filename KALDI_DECODER.WORD_SYMS_FILENAME]
[--kaldi_decoder.default_beam KALDI_DECODER.DEFAULT_BEAM]
[--kaldi_decoder.max_active KALDI_DECODER.MAX_ACTIVE]
[--kaldi_decoder.acoustic_scale KALDI_DECODER.ACOUSTIC_SCALE]
[--kaldi_decoder.decoder_num_copy_threads KALDI_DECODER.DECODER_NUM_COPY_THREADS]
[--kaldi_decoder.determinize_lattice KALDI_DECODER.DETERMINIZE_LATTICE]
[--rescorer.max_sequence_idle_microseconds RESCORER.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--rescorer.max_batch_size RESCORER.MAX_BATCH_SIZE]
[--rescorer.min_batch_size RESCORER.MIN_BATCH_SIZE]
[--rescorer.opt_batch_size RESCORER.OPT_BATCH_SIZE]
[--rescorer.preferred_batch_size RESCORER.PREFERRED_BATCH_SIZE]
[--rescorer.batching_type RESCORER.BATCHING_TYPE]
[--rescorer.preserve_ordering RESCORER.PRESERVE_ORDERING]
[--rescorer.instance_group_count RESCORER.INSTANCE_GROUP_COUNT]
[--rescorer.max_queue_delay_microseconds RESCORER.MAX_QUEUE_DELAY_MICROSECONDS]
[--rescorer.optimization_graph_level RESCORER.OPTIMIZATION_GRAPH_LEVEL]
[--rescorer.max_supported_transcripts RESCORER.MAX_SUPPORTED_TRANSCRIPTS]
[--rescorer.score_lm_carpa_filename RESCORER.SCORE_LM_CARPA_FILENAME]
[--rescorer.decode_lm_carpa_filename RESCORER.DECODE_LM_CARPA_FILENAME]
[--rescorer.word_syms_filename RESCORER.WORD_SYMS_FILENAME]
[--rescorer.word_insertion_penalty RESCORER.WORD_INSERTION_PENALTY]
[--rescorer.num_worker_threads RESCORER.NUM_WORKER_THREADS]
[--rescorer.ms_per_timestep RESCORER.MS_PER_TIMESTEP]
[--rescorer.boundary_character_ids RESCORER.BOUNDARY_CHARACTER_IDS]
[--rescorer.vocab_file RESCORER.VOCAB_FILE]
[--lm_decoder_cpu.beam_search_width LM_DECODER_CPU.BEAM_SEARCH_WIDTH]
[--lm_decoder_cpu.decoder_type LM_DECODER_CPU.DECODER_TYPE]
[--lm_decoder_cpu.padding_size LM_DECODER_CPU.PADDING_SIZE]
[--lm_decoder_cpu.language_model_file LM_DECODER_CPU.LANGUAGE_MODEL_FILE]
[--lm_decoder_cpu.max_supported_transcripts LM_DECODER_CPU.MAX_SUPPORTED_TRANSCRIPTS]
[--lm_decoder_cpu.asr_model_delay LM_DECODER_CPU.ASR_MODEL_DELAY]
[--lm_decoder_cpu.language_model_alpha LM_DECODER_CPU.LANGUAGE_MODEL_ALPHA]
[--lm_decoder_cpu.language_model_beta LM_DECODER_CPU.LANGUAGE_MODEL_BETA]
[--lm_decoder_cpu.ms_per_timestep LM_DECODER_CPU.MS_PER_TIMESTEP]
[--lm_decoder_cpu.vocab_file LM_DECODER_CPU.VOCAB_FILE]
[--lm_decoder_cpu.lexicon_file LM_DECODER_CPU.LEXICON_FILE]
[--lm_decoder_cpu.beam_size LM_DECODER_CPU.BEAM_SIZE]
[--lm_decoder_cpu.beam_size_token LM_DECODER_CPU.BEAM_SIZE_TOKEN]
[--lm_decoder_cpu.beam_threshold LM_DECODER_CPU.BEAM_THRESHOLD]
[--lm_decoder_cpu.lm_weight LM_DECODER_CPU.LM_WEIGHT]
[--lm_decoder_cpu.word_insertion_score LM_DECODER_CPU.WORD_INSERTION_SCORE]
[--lm_decoder_cpu.forerunner_beam_size LM_DECODER_CPU.FORERUNNER_BEAM_SIZE]
[--lm_decoder_cpu.forerunner_beam_size_token LM_DECODER_CPU.FORERUNNER_BEAM_SIZE_TOKEN]
[--lm_decoder_cpu.forerunner_beam_threshold LM_DECODER_CPU.FORERUNNER_BEAM_THRESHOLD]
[--lm_decoder_cpu.smearing_mode LM_DECODER_CPU.SMEARING_MODE]
[--lm_decoder_cpu.forerunner_use_lm LM_DECODER_CPU.FORERUNNER_USE_LM]
[--asr_ensemble_backend.max_sequence_idle_microseconds ASR_ENSEMBLE_BACKEND.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--asr_ensemble_backend.max_batch_size ASR_ENSEMBLE_BACKEND.MAX_BATCH_SIZE]
[--asr_ensemble_backend.min_batch_size ASR_ENSEMBLE_BACKEND.MIN_BATCH_SIZE]
[--asr_ensemble_backend.opt_batch_size ASR_ENSEMBLE_BACKEND.OPT_BATCH_SIZE]
[--asr_ensemble_backend.preferred_batch_size ASR_ENSEMBLE_BACKEND.PREFERRED_BATCH_SIZE]
[--asr_ensemble_backend.batching_type ASR_ENSEMBLE_BACKEND.BATCHING_TYPE]
[--asr_ensemble_backend.preserve_ordering ASR_ENSEMBLE_BACKEND.PRESERVE_ORDERING]
[--asr_ensemble_backend.instance_group_count ASR_ENSEMBLE_BACKEND.INSTANCE_GROUP_COUNT]
[--asr_ensemble_backend.max_queue_delay_microseconds ASR_ENSEMBLE_BACKEND.MAX_QUEUE_DELAY_MICROSECONDS]
[--asr_ensemble_backend.optimization_graph_level ASR_ENSEMBLE_BACKEND.OPTIMIZATION_GRAPH_LEVEL]
[--asr_ensemble_backend.language_code ASR_ENSEMBLE_BACKEND.LANGUAGE_CODE]
[--asr_ensemble_backend.streaming ASR_ENSEMBLE_BACKEND.STREAMING]
[--asr_ensemble_backend.offline]
[--asr_ensemble_backend.type]
output_path source_path [source_path ...]
Generate a Riva Model from a speech_recognition model trained with NVIDIA
NeMo.
positional arguments:
output_path Location to write compiled Riva pipeline
source_path Source file(s)
options:
-h, --help show this help message and exit
-f, --force Overwrite existing artifacts if they exist
-v, --verbose Verbose log outputs
--language_code LANGUAGE_CODE
Language of the model
--instance_group_count INSTANCE_GROUP_COUNT
How many instances in a group
--kind KIND Backend runs on CPU or GPU
--max_batch_size MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--max_queue_delay_microseconds MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--batching_type BATCHING_TYPE
--acoustic_model_name ACOUSTIC_MODEL_NAME
name of the acoustic model
--featurizer_name FEATURIZER_NAME
name of the feature extractor model
--name NAME name of the ASR pipeline, used to set the model names
in the Riva model repository
--streaming STREAMING
Execute model in streaming mode
--offline Marking the model to be used with offline API in Riva
--vad_type VAD_TYPE Type of pre-acoustic model VAD algorithm to use. Valid
entries are none, neural
--unified_acoustic_model
Marking the model as Unified Model (ASR+PnC combined)
--endpointing_type ENDPOINTING_TYPE
Type of post-acoustic model endpointing algorithm to
use. Valid entries are none, greedy_ctc
--chunk_size CHUNK_SIZE
Size of audio chunks to use during inference. If not
specified, default will be selected based on
online/offline setting
--padding_factor PADDING_FACTOR
Multiple on the chunk_size. Deprecated and will be
ignored
--left_padding_size LEFT_PADDING_SIZE
The duration in seconds of the backward looking
padding to prepend to the audio chunk. The acoustic
model input corresponds to a duration of
(left_padding_size + chunk_size + right_padding_size)
seconds
--right_padding_size RIGHT_PADDING_SIZE
The duration in seconds of the forward looking padding
to append to the audio chunk. The acoustic model input
corresponds to a duration of (left_padding_size +
chunk_size + right_padding_size) seconds
--padding_size PADDING_SIZE
padding_size
--max_supported_transcripts MAX_SUPPORTED_TRANSCRIPTS
The maximum number of hypothesized transcripts
generated per utterance
--ms_per_timestep MS_PER_TIMESTEP
The duration in milliseconds of one timestep of the
acoustic model output
--force_decoder_reset_after_ms FORCE_DECODER_RESET_AFTER_MS
force decoder reset after this number of milliseconds
--lattice_beam LATTICE_BEAM
--decoding_language_model_arpa DECODING_LANGUAGE_MODEL_ARPA
Language model .arpa used during decoding
--decoding_language_model_binary DECODING_LANGUAGE_MODEL_BINARY
Language model .binary used during decoding
--decoding_language_model_fst DECODING_LANGUAGE_MODEL_FST
Language model fst used during decoding
--decoding_language_model_words DECODING_LANGUAGE_MODEL_WORDS
Language model words used during decoding
--rescoring_language_model_arpa RESCORING_LANGUAGE_MODEL_ARPA
Language model .arpa used during lattice rescoring
--decoding_language_model_carpa DECODING_LANGUAGE_MODEL_CARPA
Language model .carpa used during decoding
--rescoring_language_model_carpa RESCORING_LANGUAGE_MODEL_CARPA
Language model .carpa used during lattice rescoring
--decoding_lexicon DECODING_LEXICON
Lexicon to use when decoding
--decoding_vocab DECODING_VOCAB
File of unique words separated by white space. Only
used if decoding_lexicon not provided.
--tokenizer_model TOKENIZER_MODEL
Sentencepiece model to use for encoding. Only include
if generating lexicon from vocab.
--decoder_type DECODER_TYPE
Type of decoder to use. Valid entries are greedy,
os2s, flashlight, kaldi, trtllm, nemo or pass_through.
--stddev_floor STDDEV_FLOOR
Add this value to computed features standard
deviation. Higher values help reduce spurious
transcripts with low energy signals.
--wfst_tokenizer_model WFST_TOKENIZER_MODEL
Sparrowhawk model to use for tokenization and
classification, must be in .far format
--wfst_verbalizer_model WFST_VERBALIZER_MODEL
Sparrowhawk model to use for verbalizer, must be in
.far format.
--wfst_pre_process_model WFST_PRE_PROCESS_MODEL
Sparrowhawk model to use for pre process, must be in
.far format.
--wfst_post_process_model WFST_POST_PROCESS_MODEL
Sparrowhawk model to use for post process, must be in
.far format.
--speech_hints_model SPEECH_HINTS_MODEL
Speechhints class far file used to enable speechhints
--buffer_look_ahead BUFFER_LOOK_AHEAD
Last 'n' words of the final transcript to be treated
as look ahead
--buffer_context_history BUFFER_CONTEXT_HISTORY
Number of words from last previous response to be
maintained for extra context
--buffer_threshold BUFFER_THRESHOLD
Minimum number of words (including history and
look_ahead) in buffer required for applying PnC.
Buffering is disabled by default.
--buffer_max_timeout_frames BUFFER_MAX_TIMEOUT_FRAMES
Number of time frames after which PnC will be applied
to the buffer
--profane_words_file PROFANE_WORDS_FILE
File containing newline separated profane words to be
filtered out if requested by user
--append_space_to_transcripts APPEND_SPACE_TO_TRANSCRIPTS
Boolean that controls if a space should be added to
transcripts after end of utterance detection
--return_separate_utterances RETURN_SEPARATE_UTTERANCES
Boolean flag to return each utterance separately
instead of returning concatenated utterances
--mel_basis_file_path MEL_BASIS_FILE_PATH
Pre calculated Mel basis file for pytorch Feature
Extractor
--feature_extractor_type FEATURE_EXTRACTOR_TYPE
Feature extractor type
--torch_feature_type TORCH_FEATURE_TYPE
Torch feature type ['whisper', 'nemo']
--torch_feature_device TORCH_FEATURE_DEVICE
Torch feature type ['cuda', 'cpu']
--execution_environment_path EXECUTION_ENVIRONMENT_PATH
Path to conda environment path for nemo runtime
--share_flags SHARE_FLAGS
Share batched start end flags and corr_ids with AM
featurizer:
--featurizer.max_sequence_idle_microseconds FEATURIZER.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--featurizer.max_batch_size FEATURIZER.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--featurizer.min_batch_size FEATURIZER.MIN_BATCH_SIZE
--featurizer.opt_batch_size FEATURIZER.OPT_BATCH_SIZE
--featurizer.preferred_batch_size FEATURIZER.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--featurizer.batching_type FEATURIZER.BATCHING_TYPE
--featurizer.preserve_ordering FEATURIZER.PRESERVE_ORDERING
Preserve ordering
--featurizer.instance_group_count FEATURIZER.INSTANCE_GROUP_COUNT
How many instances in a group
--featurizer.max_queue_delay_microseconds FEATURIZER.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--featurizer.optimization_graph_level FEATURIZER.OPTIMIZATION_GRAPH_LEVEL
The Graph optimization level to use in Triton model
configuration
--featurizer.max_execution_batch_size FEATURIZER.MAX_EXECUTION_BATCH_SIZE
Maximum Batch Size
--featurizer.gain FEATURIZER.GAIN
Adjust input signal with this gain multiplier prior to
feature extraction
--featurizer.dither FEATURIZER.DITHER
Augment signal with gaussian noise with this gain to
prevent quantization artifacts
--featurizer.use_utterance_norm_params FEATURIZER.USE_UTTERANCE_NORM_PARAMS
Apply normalization at utterance level
--featurizer.precalc_norm_time_steps FEATURIZER.PRECALC_NORM_TIME_STEPS
Weight of the precomputed normalization parameters, in
timesteps. Setting to 0 will disable use of
precalculated normalization parameters.
--featurizer.precalc_norm_params FEATURIZER.PRECALC_NORM_PARAMS
Boolean that controls if precalculated Normalization
Parameters should be used
--featurizer.norm_per_feature FEATURIZER.NORM_PER_FEATURE
Normalize Per Feature
--featurizer.mean FEATURIZER.MEAN
Pre-computed mean values
--featurizer.stddev FEATURIZER.STDDEV
Pre-computed Std Dev Values
--featurizer.transpose FEATURIZER.TRANSPOSE
Take transpose of output features
--featurizer.padding_size FEATURIZER.PADDING_SIZE
padding_size
--featurizer.int64_features_length FEATURIZER.INT64_FEATURES_LENGTH
Use int64 for features length
nn:
--nn.max_sequence_idle_microseconds NN.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--nn.max_batch_size NN.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--nn.min_batch_size NN.MIN_BATCH_SIZE
--nn.opt_batch_size NN.OPT_BATCH_SIZE
--nn.preferred_batch_size NN.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--nn.batching_type NN.BATCHING_TYPE
--nn.preserve_ordering NN.PRESERVE_ORDERING
Preserve ordering
--nn.instance_group_count NN.INSTANCE_GROUP_COUNT
How many instances in a group
--nn.max_queue_delay_microseconds NN.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--nn.optimization_graph_level NN.OPTIMIZATION_GRAPH_LEVEL
The Graph optimization level to use in Triton model
configuration
--nn.trt_max_workspace_size NN.TRT_MAX_WORKSPACE_SIZE
Maximum workspace size (in MB) to use for model export
to TensorRT
--nn.use_onnx_runtime
Use ONNX runtime instead of TensorRT
--nn.use_torchscript Use TorchScript instead of TensorRT
--nn.use_trt_fp32 Use TensorRT engine with FP32 instead of FP16
--nn.fp16_needs_obey_precision_pass
Flag to explicitly mark layers as float when parsing
the ONNX network
--nn.am_len_input_use_int64 NN.AM_LEN_INPUT_USE_INT64
Use int64 for features length
--nn.language_code NN.LANGUAGE_CODE
Language code of the model
--nn.engine_dir NN.ENGINE_DIR
Absolute model directory path
--nn.EXECUTION_ENV_PATH NN.EXECUTION_ENV_PATH
Path to conda environment file for Python backend
endpointing:
--endpointing.max_sequence_idle_microseconds ENDPOINTING.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--endpointing.max_batch_size ENDPOINTING.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--endpointing.min_batch_size ENDPOINTING.MIN_BATCH_SIZE
--endpointing.opt_batch_size ENDPOINTING.OPT_BATCH_SIZE
--endpointing.preferred_batch_size ENDPOINTING.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--endpointing.batching_type ENDPOINTING.BATCHING_TYPE
--endpointing.preserve_ordering ENDPOINTING.PRESERVE_ORDERING
Preserve ordering
--endpointing.instance_group_count ENDPOINTING.INSTANCE_GROUP_COUNT
How many instances in a group
--endpointing.max_queue_delay_microseconds ENDPOINTING.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--endpointing.optimization_graph_level ENDPOINTING.OPTIMIZATION_GRAPH_LEVEL
The Graph optimization level to use in Triton model
configuration
--endpointing.ms_per_timestep ENDPOINTING.MS_PER_TIMESTEP
--endpointing.start_history ENDPOINTING.START_HISTORY
Size of the window, in milliseconds, to use to detect
start of utterance. If (start_th) of (start_history)
ms of the acoustic model output have non-blank tokens,
start of utterance is detected.
--endpointing.stop_history ENDPOINTING.STOP_HISTORY
Size of the window, in milliseconds, to use to detect
end of utterance. If (stop_th) of (stop_history) ms of
the acoustic model output have non-blank tokens, end
of utterance is detected and decoder will be reset.
--endpointing.stop_history_eou ENDPOINTING.STOP_HISTORY_EOU
Size of the window, in milliseconds, to trigger end of
utterance first pass. If (stop_th_eou) of
(stop_history_eou) ms of the acoustic model output
have non-blank tokens, a partial transcript with high
stability will be generated.
--endpointing.start_th ENDPOINTING.START_TH
Percentage threshold to use to detect start of
utterance. If (start_th) of (start_history) ms of the
acoustic model output have non-blank tokens, start of
utterance is detected.
--endpointing.stop_th ENDPOINTING.STOP_TH
Percentage threshold to use to detect end of
utterance. If (stop_th) of (stop_history) ms of the
acoustic model output have non-blank tokens, end of
utterance is detected.
--endpointing.stop_th_eou ENDPOINTING.STOP_TH_EOU
Percentage threshold to use to detect end of
utterance. If (stop_th_eou) of (stop_history_eou) ms
of the acoustic model output have non-blank tokens,
end of utterance for the first pass will be triggered.
--endpointing.residue_blanks_at_start ENDPOINTING.RESIDUE_BLANKS_AT_START
(Advanced) Number of time steps to ignore at the
beginning of the acoustic model output when trying to
detect start/end of speech
--endpointing.residue_blanks_at_end ENDPOINTING.RESIDUE_BLANKS_AT_END
(Advanced) Number of time steps to ignore at the end
of the acoustic model output when trying to detect
start/end of speech
--endpointing.vocab_file ENDPOINTING.VOCAB_FILE
Vocab file to be used with decoder
neural_vad:
--neural_vad.max_sequence_idle_microseconds NEURAL_VAD.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--neural_vad.max_batch_size NEURAL_VAD.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--neural_vad.min_batch_size NEURAL_VAD.MIN_BATCH_SIZE
--neural_vad.opt_batch_size NEURAL_VAD.OPT_BATCH_SIZE
--neural_vad.preferred_batch_size NEURAL_VAD.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--neural_vad.batching_type NEURAL_VAD.BATCHING_TYPE
--neural_vad.preserve_ordering NEURAL_VAD.PRESERVE_ORDERING
Preserve ordering
--neural_vad.instance_group_count NEURAL_VAD.INSTANCE_GROUP_COUNT
How many instances in a group
--neural_vad.max_queue_delay_microseconds NEURAL_VAD.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--neural_vad.optimization_graph_level NEURAL_VAD.OPTIMIZATION_GRAPH_LEVEL
The Graph optimization level to use in Triton model
configuration
--neural_vad.load_model NEURAL_VAD.LOAD_MODEL
--neural_vad.batch_mode NEURAL_VAD.BATCH_MODE
Flag to enable batch inference mode
--neural_vad.decoupled_mode NEURAL_VAD.DECOUPLED_MODE
Flag to enable decoupled inference mode
--neural_vad.onset NEURAL_VAD.ONSET
Onset threshold for detecting the beginning and end of
a speech.
--neural_vad.offset NEURAL_VAD.OFFSET
Offset threshold for detecting the end of a speech.
--neural_vad.pad_onset NEURAL_VAD.PAD_ONSET
Add durations before each speech segment.
--neural_vad.pad_offset NEURAL_VAD.PAD_OFFSET
Add durations after each speech segment.
--neural_vad.min_duration_on NEURAL_VAD.MIN_DURATION_ON
Threshold for small non_speech deletion.
--neural_vad.min_duration_off NEURAL_VAD.MIN_DURATION_OFF
Threshold for short speech segment deletion.
--neural_vad.filter_speech_first NEURAL_VAD.FILTER_SPEECH_FIRST
Enable short speech segment deletion first.
--neural_vad.features_mask_value NEURAL_VAD.FEATURES_MASK_VALUE
Features value to use to mask the non-speech segments
neural_vad_nn:
--neural_vad_nn.max_sequence_idle_microseconds NEURAL_VAD_NN.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--neural_vad_nn.max_batch_size NEURAL_VAD_NN.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--neural_vad_nn.min_batch_size NEURAL_VAD_NN.MIN_BATCH_SIZE
--neural_vad_nn.opt_batch_size NEURAL_VAD_NN.OPT_BATCH_SIZE
--neural_vad_nn.preferred_batch_size NEURAL_VAD_NN.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--neural_vad_nn.batching_type NEURAL_VAD_NN.BATCHING_TYPE
--neural_vad_nn.preserve_ordering NEURAL_VAD_NN.PRESERVE_ORDERING
Preserve ordering
--neural_vad_nn.instance_group_count NEURAL_VAD_NN.INSTANCE_GROUP_COUNT
How many instances in a group
--neural_vad_nn.max_queue_delay_microseconds NEURAL_VAD_NN.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--neural_vad_nn.optimization_graph_level NEURAL_VAD_NN.OPTIMIZATION_GRAPH_LEVEL
The Graph optimization level to use in Triton model
configuration
--neural_vad_nn.trt_max_workspace_size NEURAL_VAD_NN.TRT_MAX_WORKSPACE_SIZE
Maximum workspace size (in MB) to use for model export
to TensorRT
--neural_vad_nn.use_onnx_runtime
Use ONNX runtime instead of TensorRT
--neural_vad_nn.use_torchscript
Use TorchScript instead of TensorRT
--neural_vad_nn.use_trt_fp32
Use TensorRT engine with FP32 instead of FP16
--neural_vad_nn.fp16_needs_obey_precision_pass
Flag to explicitly mark layers as float when parsing
the ONNX network
--neural_vad_nn.onnx_path NEURAL_VAD_NN.ONNX_PATH
--neural_vad_nn.sample_rate NEURAL_VAD_NN.SAMPLE_RATE
--neural_vad_nn.min_seq_len NEURAL_VAD_NN.MIN_SEQ_LEN
--neural_vad_nn.opt_seq_len NEURAL_VAD_NN.OPT_SEQ_LEN
--neural_vad_nn.max_seq_len NEURAL_VAD_NN.MAX_SEQ_LEN
flashlight_decoder:
--flashlight_decoder.max_sequence_idle_microseconds FLASHLIGHT_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--flashlight_decoder.max_batch_size FLASHLIGHT_DECODER.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--flashlight_decoder.min_batch_size FLASHLIGHT_DECODER.MIN_BATCH_SIZE
--flashlight_decoder.opt_batch_size FLASHLIGHT_DECODER.OPT_BATCH_SIZE
--flashlight_decoder.preferred_batch_size FLASHLIGHT_DECODER.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--flashlight_decoder.batching_type FLASHLIGHT_DECODER.BATCHING_TYPE
--flashlight_decoder.preserve_ordering FLASHLIGHT_DECODER.PRESERVE_ORDERING
Preserve ordering
--flashlight_decoder.instance_group_count FLASHLIGHT_DECODER.INSTANCE_GROUP_COUNT
How many instances in a group
--flashlight_decoder.max_queue_delay_microseconds FLASHLIGHT_DECODER.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--flashlight_decoder.optimization_graph_level FLASHLIGHT_DECODER.OPTIMIZATION_GRAPH_LEVEL
The Graph optimization level to use in Triton model
configuration
--flashlight_decoder.max_execution_batch_size FLASHLIGHT_DECODER.MAX_EXECUTION_BATCH_SIZE
--flashlight_decoder.decoder_type FLASHLIGHT_DECODER.DECODER_TYPE
--flashlight_decoder.padding_size FLASHLIGHT_DECODER.PADDING_SIZE
padding_size
--flashlight_decoder.max_supported_transcripts FLASHLIGHT_DECODER.MAX_SUPPORTED_TRANSCRIPTS
--flashlight_decoder.asr_model_delay FLASHLIGHT_DECODER.ASR_MODEL_DELAY
(Advanced) Number of time steps by which the acoustic
model output should be shifted when computing
timestamps. For Riva Conformer-Large models, one time
step corresponds to 40ms while for Citrinet-1024
models, one time step corresponds to 80ms. Decreasing
the asr_model_delay parameter by 1 will cause all
timestamps to be increased by 40ms for Conformer-Large
models, and 80ms for Citrinet-1024 models. This
parameter must be tuned since the CTC-based models are
not guaranteed to predict correct alignment.
--flashlight_decoder.ms_per_timestep FLASHLIGHT_DECODER.MS_PER_TIMESTEP
--flashlight_decoder.vocab_file FLASHLIGHT_DECODER.VOCAB_FILE
Vocab file to be used with decoder
--flashlight_decoder.decoder_num_worker_threads FLASHLIGHT_DECODER.DECODER_NUM_WORKER_THREADS
Number of threads to use for CPU decoders. If < 1,
maximum hardware concurrency is used.
--flashlight_decoder.force_decoder_reset_after_ms FLASHLIGHT_DECODER.FORCE_DECODER_RESET_AFTER_MS
force decoder reset after this number of milliseconds
--flashlight_decoder.language_model_file FLASHLIGHT_DECODER.LANGUAGE_MODEL_FILE
Language model file in binary format to be used by
KenLM
--flashlight_decoder.lexicon_file FLASHLIGHT_DECODER.LEXICON_FILE
Lexicon file to be used with decoder
--flashlight_decoder.use_lexicon_free_decoding FLASHLIGHT_DECODER.USE_LEXICON_FREE_DECODING
Enables lexicon-free decoding
--flashlight_decoder.beam_size FLASHLIGHT_DECODER.BEAM_SIZE
Maximum number of hypothesis the decoder holds after
each step
--flashlight_decoder.beam_size_token FLASHLIGHT_DECODER.BEAM_SIZE_TOKEN
Maximum number of tokens the decoder considers at each
step
--flashlight_decoder.beam_threshold FLASHLIGHT_DECODER.BEAM_THRESHOLD
Threshold to prune hypothesis
--flashlight_decoder.lm_weight FLASHLIGHT_DECODER.LM_WEIGHT
Weight of language model
--flashlight_decoder.blank_token FLASHLIGHT_DECODER.BLANK_TOKEN
Blank token
--flashlight_decoder.sil_token FLASHLIGHT_DECODER.SIL_TOKEN
Silence token
--flashlight_decoder.unk_token FLASHLIGHT_DECODER.UNK_TOKEN
Unknown token
--flashlight_decoder.set_default_index_to_unk_token FLASHLIGHT_DECODER.SET_DEFAULT_INDEX_TO_UNK_TOKEN
Flag that controls if default index should be set to
the index of the unk_token or not. If not, error will
be thrown if an invalid token is encountered in the
lexicon.
--flashlight_decoder.word_insertion_score FLASHLIGHT_DECODER.WORD_INSERTION_SCORE
Word insertion score
--flashlight_decoder.forerunner_beam_size FLASHLIGHT_DECODER.FORERUNNER_BEAM_SIZE
Maximum number of hypothesis the decoder holds after
each step, for forerunner transcript
--flashlight_decoder.forerunner_beam_size_token FLASHLIGHT_DECODER.FORERUNNER_BEAM_SIZE_TOKEN
Maximum number of tokens the decoder considers at each
step, for forerunner transcript
--flashlight_decoder.forerunner_beam_threshold FLASHLIGHT_DECODER.FORERUNNER_BEAM_THRESHOLD
Threshold to prune hypothesis, for forerunner
transcript
--flashlight_decoder.smearing_mode FLASHLIGHT_DECODER.SMEARING_MODE
Decoder smearing mode. Can be logadd, max or none
--flashlight_decoder.forerunner_use_lm FLASHLIGHT_DECODER.FORERUNNER_USE_LM
Bool that controls if the forerunner decoder should
use a language model
--flashlight_decoder.num_tokenization FLASHLIGHT_DECODER.NUM_TOKENIZATION
Number of tokenizations to generate for each word in
the lexicon
--flashlight_decoder.unk_score FLASHLIGHT_DECODER.UNK_SCORE
Coefficient for inserting unknown words in the
flashlight decoder. The higher it is, the more likely
it is to insert unknown words. See https://github.com/
flashlight/flashlight/blob/e16682fa32df30cbf675c8fe010
f929c61e3b833/flashlight/lib/text/decoder/LexiconDecod
er.h#L106
--flashlight_decoder.log_add FLASHLIGHT_DECODER.LOG_ADD
If true, when the same state is reached by two
separate paths in the decoder, add the paths' scores
with addition in loglikelihood space. Otherwise, just
pick the maximum likelihood score. See https://github.
com/flashlight/flashlight/blob/e16682fa32df30cbf675c8f
e010f929c61e3b833/flashlight/lib/text/decoder/Utils.h#
L105
pass_through_decoder:
--pass_through_decoder.max_sequence_idle_microseconds PASS_THROUGH_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--pass_through_decoder.max_batch_size PASS_THROUGH_DECODER.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--pass_through_decoder.min_batch_size PASS_THROUGH_DECODER.MIN_BATCH_SIZE
--pass_through_decoder.opt_batch_size PASS_THROUGH_DECODER.OPT_BATCH_SIZE
--pass_through_decoder.preferred_batch_size PASS_THROUGH_DECODER.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--pass_through_decoder.batching_type PASS_THROUGH_DECODER.BATCHING_TYPE
--pass_through_decoder.preserve_ordering PASS_THROUGH_DECODER.PRESERVE_ORDERING
Preserve ordering
--pass_through_decoder.instance_group_count PASS_THROUGH_DECODER.INSTANCE_GROUP_COUNT
How many instances in a group
--pass_through_decoder.max_queue_delay_microseconds PASS_THROUGH_DECODER.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--pass_through_decoder.optimization_graph_level PASS_THROUGH_DECODER.OPTIMIZATION_GRAPH_LEVEL
The Graph optimization level to use in Triton model
configuration
--pass_through_decoder.vocab_file PASS_THROUGH_DECODER.VOCAB_FILE
Vocab file to be used with decoder
--pass_through_decoder.asr_model_delay PASS_THROUGH_DECODER.ASR_MODEL_DELAY
(Advanced) Number of time steps by which the acoustic
model output should be shifted when computing
timestamps. For Riva Conformer-Large models, one time
step corresponds to 40ms while for Citrinet-1024
models, one time step corresponds to 80ms. Decreasing
the asr_model_delay parameter by 1 will cause all
timestamps to be increased by 40ms for Conformer-Large
models, and 80ms for Citrinet-1024 models. This
parameter must be tuned since the CTC-based models are
not guaranteed to predict correct alignment.
nemo_decoder:
--nemo_decoder.max_sequence_idle_microseconds NEMO_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--nemo_decoder.max_batch_size NEMO_DECODER.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--nemo_decoder.min_batch_size NEMO_DECODER.MIN_BATCH_SIZE
--nemo_decoder.opt_batch_size NEMO_DECODER.OPT_BATCH_SIZE
--nemo_decoder.preferred_batch_size NEMO_DECODER.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--nemo_decoder.batching_type NEMO_DECODER.BATCHING_TYPE
--nemo_decoder.preserve_ordering NEMO_DECODER.PRESERVE_ORDERING
Preserve ordering
--nemo_decoder.instance_group_count NEMO_DECODER.INSTANCE_GROUP_COUNT
How many instances in a group
--nemo_decoder.max_queue_delay_microseconds NEMO_DECODER.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--nemo_decoder.optimization_graph_level NEMO_DECODER.OPTIMIZATION_GRAPH_LEVEL
The Graph optimization level to use in Triton model
configuration
--nemo_decoder.vocab_file NEMO_DECODER.VOCAB_FILE
Vocab file to be used with decoder
--nemo_decoder.asr_model_delay NEMO_DECODER.ASR_MODEL_DELAY
(Advanced) Number of time steps by which the acoustic
model output should be shifted when computing
timestamps. For Riva Conformer-Large models, one time
step corresponds to 40ms while for Citrinet-1024
models, one time step corresponds to 80ms. Decreasing
the asr_model_delay parameter by 1 will cause all
timestamps to be increased by 40ms for Conformer-Large
models, and 80ms for Citrinet-1024 models. This
parameter must be tuned since the CTC-based models are
not guaranteed to predict correct alignment.
--nemo_decoder.compute_dtype
Datatype to use for ASR model
--nemo_decoder.amp_dtype
Datatype to use for AMP
--nemo_decoder.nemo_decoder_type NEMO_DECODER.NEMO_DECODER_TYPE
decoder to use for decoding
--nemo_decoder.use_stateful_decoding
Whether to pass states to next chunk
--nemo_decoder.use_amp
Whether to use AMP for inference
trtllm_decoder:
--trtllm_decoder.max_sequence_idle_microseconds TRTLLM_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--trtllm_decoder.max_batch_size TRTLLM_DECODER.MAX_BATCH_SIZE
Max batch size to use
--trtllm_decoder.min_batch_size TRTLLM_DECODER.MIN_BATCH_SIZE
--trtllm_decoder.opt_batch_size TRTLLM_DECODER.OPT_BATCH_SIZE
--trtllm_decoder.preferred_batch_size TRTLLM_DECODER.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--trtllm_decoder.batching_type TRTLLM_DECODER.BATCHING_TYPE
--trtllm_decoder.preserve_ordering TRTLLM_DECODER.PRESERVE_ORDERING
Preserve ordering
--trtllm_decoder.instance_group_count TRTLLM_DECODER.INSTANCE_GROUP_COUNT
How many instances in a group
--trtllm_decoder.max_queue_delay_microseconds TRTLLM_DECODER.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--trtllm_decoder.optimization_graph_level TRTLLM_DECODER.OPTIMIZATION_GRAPH_LEVEL
The Graph optimization level to use in Triton model
configuration
--trtllm_decoder.world_size TRTLLM_DECODER.WORLD_SIZE
--trtllm_decoder.quantize_dir TRTLLM_DECODER.QUANTIZE_DIR
--trtllm_decoder.dtype TRTLLM_DECODER.DTYPE
Valid choices are ["float16", "float32", "bfloat16"]
--trtllm_decoder.max_input_len TRTLLM_DECODER.MAX_INPUT_LEN
Max number of tokens in prompts
--trtllm_decoder.max_output_len TRTLLM_DECODER.MAX_OUTPUT_LEN
Max number of output tokens from the decoder
--trtllm_decoder.max_beam_width TRTLLM_DECODER.MAX_BEAM_WIDTH
Max beam width
--trtllm_decoder.use_gpt_attention_plugin TRTLLM_DECODER.USE_GPT_ATTENTION_PLUGIN
Activates attention plugin. You can specify the plugin
dtype or leave blank to use the model dtype.Choices
are ["float16", "float32", "bfloat16"]
--trtllm_decoder.use_bert_attention_plugin TRTLLM_DECODER.USE_BERT_ATTENTION_PLUGIN
Activates BERT attention plugin. You can specify the
plugin dtype or leave blank to use the model
dtype.Choices are ["float16", "float32", "bfloat16"]
--trtllm_decoder.use_gemm_plugin TRTLLM_DECODER.USE_GEMM_PLUGIN
Activates GEMM plugin. You can specify the plugin
dtype or leave blank to use the model dtypeChoices are
["float16", "float32", "bfloat16"]
--trtllm_decoder.remove_input_padding TRTLLM_DECODER.REMOVE_INPUT_PADDING
remove input padding
--trtllm_decoder.enable_context_fmha TRTLLM_DECODER.ENABLE_CONTEXT_FMHA
--trtllm_decoder.use_weight_only TRTLLM_DECODER.USE_WEIGHT_ONLY
Quantize weights for the various GEMMs to
INT4/INT8.See --weight_only_precision to set the
precision
--trtllm_decoder.weight_only_precision TRTLLM_DECODER.WEIGHT_ONLY_PRECISION
Define the precision for the weights when using
weight-only quantization.You must also use
--use_weight_only for that argument to have an impact.
--trtllm_decoder.int8_kv_cache TRTLLM_DECODER.INT8_KV_CACHE
By default, we use dtype for KV cache. int8_kv_cache
chooses int8 quantization for KV
--trtllm_decoder.debug_mode TRTLLM_DECODER.DEBUG_MODE
--trtllm_decoder.vocab_file TRTLLM_DECODER.VOCAB_FILE
Vocab file to be used with decoder
--trtllm_decoder.asr_model_delay TRTLLM_DECODER.ASR_MODEL_DELAY
(Advanced) Number of time steps by which the acoustic
model output should be shifted when computing
timestamps. For Riva Conformer-Large models, one time
step corresponds to 40ms while for Citrinet-1024
models, one time step corresponds to 80ms. Decreasing
the asr_model_delay parameter by 1 will cause all
timestamps to be increased by 40ms for Conformer-Large
models, and 80ms for Citrinet-1024 models. This
parameter must be tuned since the CTC-based models are
not guaranteed to predict correct alignment.
greedy_decoder:
--greedy_decoder.max_sequence_idle_microseconds GREEDY_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--greedy_decoder.max_batch_size GREEDY_DECODER.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--greedy_decoder.min_batch_size GREEDY_DECODER.MIN_BATCH_SIZE
--greedy_decoder.opt_batch_size GREEDY_DECODER.OPT_BATCH_SIZE
--greedy_decoder.preferred_batch_size GREEDY_DECODER.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--greedy_decoder.batching_type GREEDY_DECODER.BATCHING_TYPE
--greedy_decoder.preserve_ordering GREEDY_DECODER.PRESERVE_ORDERING
Preserve ordering
--greedy_decoder.instance_group_count GREEDY_DECODER.INSTANCE_GROUP_COUNT
How many instances in a group
--greedy_decoder.max_queue_delay_microseconds GREEDY_DECODER.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--greedy_decoder.optimization_graph_level GREEDY_DECODER.OPTIMIZATION_GRAPH_LEVEL
The Graph optimization level to use in Triton model
configuration
--greedy_decoder.max_execution_batch_size GREEDY_DECODER.MAX_EXECUTION_BATCH_SIZE
--greedy_decoder.decoder_type GREEDY_DECODER.DECODER_TYPE
--greedy_decoder.padding_size GREEDY_DECODER.PADDING_SIZE
padding_size
--greedy_decoder.max_supported_transcripts GREEDY_DECODER.MAX_SUPPORTED_TRANSCRIPTS
--greedy_decoder.asr_model_delay GREEDY_DECODER.ASR_MODEL_DELAY
(Advanced) Number of time steps by which the acoustic
model output should be shifted when computing
timestamps. For Riva Conformer-Large models, one time
step corresponds to 40ms while for Citrinet-1024
models, one time step corresponds to 80ms. Decreasing
the asr_model_delay parameter by 1 will cause all
timestamps to be increased by 40ms for Conformer-Large
models, and 80ms for Citrinet-1024 models. This
parameter must be tuned since the CTC-based models are
not guaranteed to predict correct alignment.
--greedy_decoder.ms_per_timestep GREEDY_DECODER.MS_PER_TIMESTEP
--greedy_decoder.vocab_file GREEDY_DECODER.VOCAB_FILE
Vocab file to be used with decoder
--greedy_decoder.decoder_num_worker_threads GREEDY_DECODER.DECODER_NUM_WORKER_THREADS
Number of threads to use for CPU decoders. If < 1,
maximum hardware concurrency is used.
--greedy_decoder.force_decoder_reset_after_ms GREEDY_DECODER.FORCE_DECODER_RESET_AFTER_MS
force decoder reset after this number of milliseconds
os2s_decoder:
--os2s_decoder.max_sequence_idle_microseconds OS2S_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--os2s_decoder.max_batch_size OS2S_DECODER.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--os2s_decoder.min_batch_size OS2S_DECODER.MIN_BATCH_SIZE
--os2s_decoder.opt_batch_size OS2S_DECODER.OPT_BATCH_SIZE
--os2s_decoder.preferred_batch_size OS2S_DECODER.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--os2s_decoder.batching_type OS2S_DECODER.BATCHING_TYPE
--os2s_decoder.preserve_ordering OS2S_DECODER.PRESERVE_ORDERING
Preserve ordering
--os2s_decoder.instance_group_count OS2S_DECODER.INSTANCE_GROUP_COUNT
How many instances in a group
--os2s_decoder.max_queue_delay_microseconds OS2S_DECODER.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--os2s_decoder.optimization_graph_level OS2S_DECODER.OPTIMIZATION_GRAPH_LEVEL
The Graph optimization level to use in Triton model
configuration
--os2s_decoder.max_execution_batch_size OS2S_DECODER.MAX_EXECUTION_BATCH_SIZE
--os2s_decoder.decoder_type OS2S_DECODER.DECODER_TYPE
--os2s_decoder.padding_size OS2S_DECODER.PADDING_SIZE
padding_size
--os2s_decoder.max_supported_transcripts OS2S_DECODER.MAX_SUPPORTED_TRANSCRIPTS
--os2s_decoder.asr_model_delay OS2S_DECODER.ASR_MODEL_DELAY
(Advanced) Number of time steps by which the acoustic
model output should be shifted when computing
timestamps. For Riva Conformer-Large models, one time
step corresponds to 40ms while for Citrinet-1024
models, one time step corresponds to 80ms. Decreasing
the asr_model_delay parameter by 1 will cause all
timestamps to be increased by 40ms for Conformer-Large
models, and 80ms for Citrinet-1024 models. This
parameter must be tuned since the CTC-based models are
not guaranteed to predict correct alignment.
--os2s_decoder.ms_per_timestep OS2S_DECODER.MS_PER_TIMESTEP
--os2s_decoder.vocab_file OS2S_DECODER.VOCAB_FILE
Vocab file to be used with decoder
--os2s_decoder.decoder_num_worker_threads OS2S_DECODER.DECODER_NUM_WORKER_THREADS
Number of threads to use for CPU decoders. If < 1,
maximum hardware concurrency is used.
--os2s_decoder.force_decoder_reset_after_ms OS2S_DECODER.FORCE_DECODER_RESET_AFTER_MS
force decoder reset after this number of milliseconds
--os2s_decoder.language_model_file OS2S_DECODER.LANGUAGE_MODEL_FILE
Language model file in binary format to be used by
KenLM
--os2s_decoder.beam_search_width OS2S_DECODER.BEAM_SEARCH_WIDTH
Number of partial hypotheses saves to keep at each
step of the beam search
--os2s_decoder.language_model_alpha OS2S_DECODER.LANGUAGE_MODEL_ALPHA
Weight given to the language model during beam search
--os2s_decoder.language_model_beta OS2S_DECODER.LANGUAGE_MODEL_BETA
Word insertion score
kaldi_decoder:
--kaldi_decoder.max_sequence_idle_microseconds KALDI_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--kaldi_decoder.max_batch_size KALDI_DECODER.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--kaldi_decoder.min_batch_size KALDI_DECODER.MIN_BATCH_SIZE
--kaldi_decoder.opt_batch_size KALDI_DECODER.OPT_BATCH_SIZE
--kaldi_decoder.preferred_batch_size KALDI_DECODER.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--kaldi_decoder.batching_type KALDI_DECODER.BATCHING_TYPE
--kaldi_decoder.preserve_ordering KALDI_DECODER.PRESERVE_ORDERING
Preserve ordering
--kaldi_decoder.instance_group_count KALDI_DECODER.INSTANCE_GROUP_COUNT
How many instances in a group
--kaldi_decoder.max_queue_delay_microseconds KALDI_DECODER.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--kaldi_decoder.optimization_graph_level KALDI_DECODER.OPTIMIZATION_GRAPH_LEVEL
The Graph optimization level to use in Triton model
configuration
--kaldi_decoder.max_execution_batch_size KALDI_DECODER.MAX_EXECUTION_BATCH_SIZE
--kaldi_decoder.decoder_type KALDI_DECODER.DECODER_TYPE
--kaldi_decoder.padding_size KALDI_DECODER.PADDING_SIZE
padding_size
--kaldi_decoder.max_supported_transcripts KALDI_DECODER.MAX_SUPPORTED_TRANSCRIPTS
--kaldi_decoder.asr_model_delay KALDI_DECODER.ASR_MODEL_DELAY
(Advanced) Number of time steps by which the acoustic
model output should be shifted when computing
timestamps. For Riva Conformer-Large models, one time
step corresponds to 40ms while for Citrinet-1024
models, one time step corresponds to 80ms. Decreasing
the asr_model_delay parameter by 1 will cause all
timestamps to be increased by 40ms for Conformer-Large
models, and 80ms for Citrinet-1024 models. This
parameter must be tuned since the CTC-based models are
not guaranteed to predict correct alignment.
--kaldi_decoder.ms_per_timestep KALDI_DECODER.MS_PER_TIMESTEP
--kaldi_decoder.vocab_file KALDI_DECODER.VOCAB_FILE
Vocab file to be used with decoder
--kaldi_decoder.decoder_num_worker_threads KALDI_DECODER.DECODER_NUM_WORKER_THREADS
Number of threads to use for CPU decoders. If < 1,
maximum hardware concurrency is used.
--kaldi_decoder.force_decoder_reset_after_ms KALDI_DECODER.FORCE_DECODER_RESET_AFTER_MS
force decoder reset after this number of milliseconds
--kaldi_decoder.fst_filename KALDI_DECODER.FST_FILENAME
Fst file to use during decoding
--kaldi_decoder.word_syms_filename KALDI_DECODER.WORD_SYMS_FILENAME
--kaldi_decoder.default_beam KALDI_DECODER.DEFAULT_BEAM
--kaldi_decoder.max_active KALDI_DECODER.MAX_ACTIVE
--kaldi_decoder.acoustic_scale KALDI_DECODER.ACOUSTIC_SCALE
--kaldi_decoder.decoder_num_copy_threads KALDI_DECODER.DECODER_NUM_COPY_THREADS
--kaldi_decoder.determinize_lattice KALDI_DECODER.DETERMINIZE_LATTICE
rescorer:
--rescorer.max_sequence_idle_microseconds RESCORER.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--rescorer.max_batch_size RESCORER.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--rescorer.min_batch_size RESCORER.MIN_BATCH_SIZE
--rescorer.opt_batch_size RESCORER.OPT_BATCH_SIZE
--rescorer.preferred_batch_size RESCORER.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--rescorer.batching_type RESCORER.BATCHING_TYPE
--rescorer.preserve_ordering RESCORER.PRESERVE_ORDERING
Preserve ordering
--rescorer.instance_group_count RESCORER.INSTANCE_GROUP_COUNT
How many instances in a group
--rescorer.max_queue_delay_microseconds RESCORER.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--rescorer.optimization_graph_level RESCORER.OPTIMIZATION_GRAPH_LEVEL
The Graph optimization level to use in Triton model
configuration
--rescorer.max_supported_transcripts RESCORER.MAX_SUPPORTED_TRANSCRIPTS
--rescorer.score_lm_carpa_filename RESCORER.SCORE_LM_CARPA_FILENAME
--rescorer.decode_lm_carpa_filename RESCORER.DECODE_LM_CARPA_FILENAME
--rescorer.word_syms_filename RESCORER.WORD_SYMS_FILENAME
--rescorer.word_insertion_penalty RESCORER.WORD_INSERTION_PENALTY
--rescorer.num_worker_threads RESCORER.NUM_WORKER_THREADS
--rescorer.ms_per_timestep RESCORER.MS_PER_TIMESTEP
--rescorer.boundary_character_ids RESCORER.BOUNDARY_CHARACTER_IDS
--rescorer.vocab_file RESCORER.VOCAB_FILE
Vocab file to be used with decoder
lm_decoder_cpu:
--lm_decoder_cpu.beam_search_width LM_DECODER_CPU.BEAM_SEARCH_WIDTH
--lm_decoder_cpu.decoder_type LM_DECODER_CPU.DECODER_TYPE
--lm_decoder_cpu.padding_size LM_DECODER_CPU.PADDING_SIZE
padding_size
--lm_decoder_cpu.language_model_file LM_DECODER_CPU.LANGUAGE_MODEL_FILE
Language model file in binary format to be used by
KenLM
--lm_decoder_cpu.max_supported_transcripts LM_DECODER_CPU.MAX_SUPPORTED_TRANSCRIPTS
--lm_decoder_cpu.asr_model_delay LM_DECODER_CPU.ASR_MODEL_DELAY
(Advanced) Number of time steps by which the acoustic
model output should be shifted when computing
timestamps. This parameter must be tuned since the CTC
model is not guaranteed to predict correct alignment.
--lm_decoder_cpu.language_model_alpha LM_DECODER_CPU.LANGUAGE_MODEL_ALPHA
--lm_decoder_cpu.language_model_beta LM_DECODER_CPU.LANGUAGE_MODEL_BETA
--lm_decoder_cpu.ms_per_timestep LM_DECODER_CPU.MS_PER_TIMESTEP
--lm_decoder_cpu.vocab_file LM_DECODER_CPU.VOCAB_FILE
Vocab file to be used with decoder
--lm_decoder_cpu.lexicon_file LM_DECODER_CPU.LEXICON_FILE
Lexicon file to be used with decoder
--lm_decoder_cpu.beam_size LM_DECODER_CPU.BEAM_SIZE
Maximum number of hypothesis the decoder holds after
each step
--lm_decoder_cpu.beam_size_token LM_DECODER_CPU.BEAM_SIZE_TOKEN
Maximum number of tokens the decoder considers at each
step
--lm_decoder_cpu.beam_threshold LM_DECODER_CPU.BEAM_THRESHOLD
Threshold to prune hypothesis
--lm_decoder_cpu.lm_weight LM_DECODER_CPU.LM_WEIGHT
Weight of language model
--lm_decoder_cpu.word_insertion_score LM_DECODER_CPU.WORD_INSERTION_SCORE
Word insertion score
--lm_decoder_cpu.forerunner_beam_size LM_DECODER_CPU.FORERUNNER_BEAM_SIZE
Maximum number of hypothesis the decoder holds after
each step, for forerunner transcript
--lm_decoder_cpu.forerunner_beam_size_token LM_DECODER_CPU.FORERUNNER_BEAM_SIZE_TOKEN
Maximum number of tokens the decoder considers at each
step, for forerunner transcript
--lm_decoder_cpu.forerunner_beam_threshold LM_DECODER_CPU.FORERUNNER_BEAM_THRESHOLD
Threshold to prune hypothesis, for forerunner
transcript
--lm_decoder_cpu.smearing_mode LM_DECODER_CPU.SMEARING_MODE
Decoder smearing mode. Can be logadd, max or none
--lm_decoder_cpu.forerunner_use_lm LM_DECODER_CPU.FORERUNNER_USE_LM
Bool that controls if the forerunner decoder should
use a language model
asr_ensemble_backend:
--asr_ensemble_backend.max_sequence_idle_microseconds ASR_ENSEMBLE_BACKEND.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--asr_ensemble_backend.max_batch_size ASR_ENSEMBLE_BACKEND.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--asr_ensemble_backend.min_batch_size ASR_ENSEMBLE_BACKEND.MIN_BATCH_SIZE
--asr_ensemble_backend.opt_batch_size ASR_ENSEMBLE_BACKEND.OPT_BATCH_SIZE
--asr_ensemble_backend.preferred_batch_size ASR_ENSEMBLE_BACKEND.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--asr_ensemble_backend.batching_type ASR_ENSEMBLE_BACKEND.BATCHING_TYPE
--asr_ensemble_backend.preserve_ordering ASR_ENSEMBLE_BACKEND.PRESERVE_ORDERING
Preserve ordering
--asr_ensemble_backend.instance_group_count ASR_ENSEMBLE_BACKEND.INSTANCE_GROUP_COUNT
How many instances in a group
--asr_ensemble_backend.max_queue_delay_microseconds ASR_ENSEMBLE_BACKEND.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--asr_ensemble_backend.optimization_graph_level ASR_ENSEMBLE_BACKEND.OPTIMIZATION_GRAPH_LEVEL
The Graph optimization level to use in Triton model
configuration
--asr_ensemble_backend.language_code ASR_ENSEMBLE_BACKEND.LANGUAGE_CODE
Language of the model
--asr_ensemble_backend.streaming ASR_ENSEMBLE_BACKEND.STREAMING
Execute model in streaming mode
--asr_ensemble_backend.offline
Marking the model to be used with offline API in Riva
--asr_ensemble_backend.type
Marking the model to be used with offline API in Riva