自定义模型
目录
自定义模型#
模型部署#
与所有 Riva 模型一样,Riva TTS 需要以下步骤
按照 NeMo 部分的概述,从
.nemo
文件为每个模型创建.riva
文件。使用
riva-build
为每个 Riva 语音 AI 技能(例如,ASR、NLP 和 TTS)创建.rmir
文件。使用
riva_deploy
创建模型目录。使用
riva_server
部署模型目录。
以下部分提供了上述步骤 1 和 2 的示例。对于步骤 3 和 4,请参考使用 riva-deploy 和 Riva 语音容器(高级)。
创建 Riva 文件#
Riva 文件可以从 .nemo
文件创建。正如之前在 NeMo 部分中提到的,从 .nemo
文件生成 Riva 文件必须仅在 Linux x86_64 工作站上完成。
以下是如何将 HiFi-GAN 模型从 .nemo
文件转换为 .riva
文件的示例。
从 NGC 下载
.nemo
文件到主机系统。运行 NeMo 容器,并将
.nemo
文件与容器共享,包括-v
选项。
wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_hifigan/versions/1.0.0rc1/zip -O tts_hifigan_1.0.0rc1.zip
unzip tts_hifigan_1.0.0rc1.zip
docker run --gpus all -it --rm \
-v $(pwd):/NeMo \
--shm-size=8g \
-p 8888:8888 \
-p 6006:6006 \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
--device=/dev/snd \
nvcr.io/nvidia/nemo:22.08
容器启动后,使用
nemo2riva
将.nemo
转换为.riva
。
pip3 install nvidia-pyindex
ngc registry resource download-version "nvidia/riva/riva_quickstart:2.18.0"
pip3 install "riva_quickstart_v2.18.0/nemo2riva-2.18.0-py3-none-any.whl"
nemo2riva --key encryption_key --out /NeMo/hifigan.riva /NeMo/tts_hifigan.nemo
为每个 .nemo
模型重复此过程以生成 .riva
文件。建议您先对 FastPitch 执行此操作,然后再继续下一步。在执行上述步骤时,请确保您获得最新的 tts_hifigan.nemo
检查点、最新的 nvcr.io/nvidia/nemo
容器版本以及最新的 nemo2riva-2.18.0_beta-py3-none-any.whl
版本
自定义#
创建 .riva
文件并在运行 riva-build
之前,有一些自定义选项可以调整。这些是可选的,但是,如果您有兴趣,构建默认 Riva 管道的说明,请跳到Riva-build 管道说明。
自定义发音#
部署在 Riva 中的语音合成模型配置了特定于语言的发音词典,该词典将大量词汇从其书写形式(字素)映射到一系列感知上不同的声音(音素)。在发音不明确的情况下,例如对于像 bass
(鱼)和 bass
(乐器)这样的同形异义词,词典将被忽略,合成模型将使用句子中的上下文线索来预测合适的发音。
现代语音合成算法令人惊讶地能够准确预测新词和生词的发音。然而,有时,需要或有必要为模型提供额外的上下文。
虽然可以使用 SSML 在请求时提供自定义发音,但请求时覆盖最适合一次性调整。对于具有固定发音的特定领域术语,在部署服务器时使用这些发音配置 Riva。
有两个关键参数可以通过 riva-build
或预处理器配置进行配置,这些参数会影响音素路径
--phone_dictionary_file
发音词典的路径。首先,将此参数留空。如果.riva
文件是从包含词典工件的.nemo
模型创建的,并且未设置此参数,则 Riva 将使用模型训练时使用的 NeMo 词典文件。要添加自定义条目并修改发音,请修改 NeMo 词典工件,将其保存到另一个文件,并将该文件路径与此参数一起传递给riva-build
。--preprocessor.g2p_ignore_ambiguous
如果为True
,则在发音词典中具有多个语音表示形式的单词(如“read”)不会转换为音素。默认为True
。如果使用
ipa
,则--upper_case_chars
应设置为True
。这会影响字素输入,因为ipa
音素集包含小写英文字符。--phone_set
可用于指定模型是否使用arpabet
或ipa
进行训练。如果未使用此标志,则 Riva 会尝试自动检测正确的音素集。
注意
--arpabet_file
已从 Riva 2.8.0 开始弃用,并由 --phone_dictionary_file
替换。
要确定合适的音素序列,请使用 SSML API 试验音素序列并评估质量。一旦映射听起来正确,请将发现的映射添加到词典中的新行。
多说话人模型#
Riva 支持具有多个说话人的模型。
要启用此功能,请在构建模型之前指定以下参数。
--voice_name
是模型的名称。默认为English-US.Female-1
。--subvoices
是每个子语音的名称的逗号分隔列表,其长度等于 FastPitch 模型中指定的子语音数量。例如,对于在第 0 个说话人嵌入中具有“male”子语音,在第一个嵌入中具有“female”子语音的模型,请包含选项--subvoices=Male:0,Female:1
。如果未提供,则可以通过整数索引请求所需的嵌入。
语音名称和子语音保存在生成的 .rmir
文件中,并携带到生成的 Triton 存储库中。在推理期间,通过在 voice_name
后附加句点和有效的子语音来修改请求的语音名称。例如,<voice_name>.<subvoice>
。
自定义语音#
Riva 与语音无关,可以与任何 English-US TTS 语音一起运行。为了训练自定义语音模型,必须首先收集数据。我们建议至少 30 分钟的高质量数据。对于收集数据,请参阅 Riva 自定义语音记录器。收集数据后,需要在该数据集上微调 FastPitch 和 HiFi-GAN 模型。有关如何训练这些模型,请参阅 Riva 微调教程。可以使用本页上的说明构建使用这些模型的 Riva 管道。
自定义文本规范化#
Riva 支持从 NeMo 的 WFST 文本规范化 (TN) 工具构建的自定义文本规范化规则。有关自定义 TN 的详细信息,请参阅 NeMo WFST 教程。自定义 WFST 后,使用 NeMo 使用其 export_grammar
脚本部署它。有关更多信息,请参阅文档。这将生成两个文件:tokenize_and_classify.far
和 verbalize.far
。这些文件使用 --wfst_tokenizer_model
和 --wfst_verbalizer_model
参数传递给 riva-build
步骤。此外,riva-build
还支持 --wfst_pre_process_model
和 --wfst_post_process_model
参数,以传递用于文本规范化的预处理和后处理 FAR 文件。
Riva-build 管道说明#
FastPitch 和 HiFi-GAN#
从 Riva 容器内部署 FastPitch 和 HiFi-GAN TTS 管道,如下所示
riva-build speech_synthesis \
/servicemaker-dev/<rmir_filename>:<encryption_key> \
/servicemaker-dev/<fastpitch_riva_filename>:<encryption_key> \
/servicemaker-dev/<hifigan_riva_filename>:<encryption_key> \
--voice_name=<pipeline_name> \
--abbreviations_file=/servicemaker-dev/<abbr_file> \
--arpabet_file=/servicemaker-dev/<dictionary_file> \
--wfst_tokenizer_model=/servicemaker-dev/<tokenizer_far_file> \
--wfst_verbalizer_model=/servicemaker-dev/<verbalizer_far_file> \
--sample_rate=<sample_rate> \
--subvoices=<subvoices> \
其中
<rmir_filename>
是生成的 Rivarmir
文件<encryption_key>
是用于加密文件的密钥。NGC 上上传的预训练 Riva 模型的加密密钥为tlt_encode
,除非在预训练快速入门管道列表中的特定模型下指定。pipeline_name
是模型存储库中组件的可选用户定义名称<fastpitch_riva_filename>
是 FastPitch 的riva
文件的名称<hifigan_riva_filename>
是 HiFi-GAN 的riva
文件的名称<abbr_file>
是包含缩写及其相应展开的文件名<dictionary_file>
是包含发音词典的文件名,该词典将单词映射到其在 ARPABET 中的语音表示形式<voice_name>
是模型的名称<subvoices>
是每个子语音的名称的逗号分隔列表。默认为按整数索引命名。这是多说话人模型所必需且仅用于多说话人模型的。<wfst_tokenizer_model>
是从运行 NeMo 文本处理的export_grammar.sh
脚本生成的tokenize_and_classify.far
文件的位置<wfst_verbalizer_model>
是从运行 NeMo 文本处理的export_grammar.sh
脚本生成的verbalize.far
文件的位置<sample_rate>
是模型训练所用音频的采样率
成功完成此命令后,将在 /servicemaker-dev/
文件夹中创建一个名为 <rmir_filename>
的文件。如果您的 .riva
存档已加密,则需要在 RMIR 和 riva
文件名的末尾包含 :<encryption_key>
,否则这是不必要的。
对于嵌入式平台,建议使用批处理大小为 1,因为它实现了最低的内存占用。要使用批处理大小为 1,请参阅Riva-build 可选参数部分,并在执行 riva-build
命令时将各种 min_batch_size
、max_batch_size
和 opt_batch_size
参数设置为 1。
预训练快速入门管道#
管道 |
|
---|---|
FastPitch + HiFi-GAN IPA (en-US 多说话人) |
riva-build speech_synthesis \
<rmir_filename>:<key> \
<riva_fastpitch_file>:<key> \
<riva_hifigan_file>:<key> \
--language_code=en-US \
--num_speakers=12 \
--phone_set=ipa \
--phone_dictionary_file=<txt_phone_dictionary_file> \
--sample_rate 44100 \
--voice_name English-US \
--subvoices Female-1:0,Male-1:1,Female-Neutral:2,Male-Neutral:3,Female-Angry:4,Male-Angry:5,Female-Calm:6,Male-Calm:7,Female-Fearful:10,Female-Happy:12,Male-Happy:13,Female-Sad:14 \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--upper_case_chars=True \
--preprocessor.enable_emphasis_tag=True \
--preprocessor.start_of_emphasis_token='[' \
--preprocessor.end_of_emphasis_token=']' \
--abbreviations_file=<txt_abbreviations_file>
|
FastPitch + HiFi-GAN IPA (zh-CN 多说话人) |
riva-build speech_synthesis \
<rmir_filename>:<key> \
<riva_fastpitch_file>:<key> \
<riva_hifigan_file>:<key> \
--language_code=zh-CN \
--num_speakers=10 \
--phone_set=ipa \
--phone_dictionary_file=<txt_phone_dictionary_file> \
--sample_rate 44100 \
--voice_name Mandarin-CN \
--subvoices Female-1:0,Male-1:1,Female-Neutral:2,Male-Neutral:3,Male-Angry:5,Female-Calm:6,Male-Calm:7,Male-Fearful:11,Male-Happy:13,Male-Sad:15 \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--wfst_post_process_model=<far_post_process_file> \
--preprocessor.enable_emphasis_tag=True \
--preprocessor.start_of_emphasis_token='[' \
--preprocessor.end_of_emphasis_token=']'
|
FastPitch + HiFi-GAN IPA (es-ES 女性) |
riva-build speech_synthesis \
<rmir_filename>:<key> \
<riva_fastpitch_file>:BSzv7YAjcH4nJS \
<riva_hifigan_file>:BSzv7YAjcH4nJS \
--language_code=es-ES \
--phone_dictionary_file=<dict_file> \
--sample_rate 22050 \
--voice_name Spanish-ES-Female-1 \
--phone_set=ipa \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--abbreviations_file=<txt_file>
|
FastPitch + HiFi-GAN IPA (es-ES 男性) |
riva-build speech_synthesis \
<rmir_filename>:<key> \
<riva_fastpitch_file>:PPihyG3Moru5in \
<riva_hifigan_file>:PPihyG3Moru5in \
--language_code=es-ES \
--phone_dictionary_file=<dict_file> \
--sample_rate 22050 \
--voice_name Spanish-ES-Male-1 \
--phone_set=ipa \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--abbreviations_file=<txt_file>
|
FastPitch + HiFi-GAN IPA (es-US 多说话人) |
riva-build speech_synthesis \
<rmir_filename>:<key> \
<riva_fastpitch_file>:<key> \
<riva_hifigan_file>:<key> \
--language_code=es-US \
--num_speakers=12 \
--phone_set=ipa \
--phone_dictionary_file=<txt_phone_dictionary_file> \
--sample_rate 44100 \
--voice_name Spanish-US \
--subvoices Female-1:0,Male-1:1,Female-Neutral:2,Male-Neutral:3,Female-Angry:4,Male-Angry:5,Female-Calm:6,Male-Calm:7,Male-Fearful:11,Male-Happy:13,Female-Sad:14,Male-Sad:15 \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--preprocessor.enable_emphasis_tag=True \
--preprocessor.start_of_emphasis_token='[' \
--preprocessor.end_of_emphasis_token=']'
|
FastPitch + HiFi-GAN IPA (it-IT 女性) |
riva-build speech_synthesis \
<rmir_filename>:<key> \
<riva_fastpitch_file>:R62srgxeXBgVxg \
<riva_hifigan_file>:R62srgxeXBgVxg \
--language_code=it-IT \
--phone_dictionary_file=<dict_file> \
--sample_rate 22050 \
--voice_name Italian-IT-Female-1 \
--phone_set=ipa \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--abbreviations_file=<txt_file>
|
FastPitch + HiFi-GAN IPA (it-IT 男性) |
riva-build speech_synthesis \
<rmir_filename>:<key> \
<riva_fastpitch_file>:dVRvg47ZqCdQrR \
<riva_hifigan_file>:dVRvg47ZqCdQrR \
--language_code=it-IT \
--phone_dictionary_file=<dict_file> \
--sample_rate 22050 \
--voice_name Italian-IT-Male-1 \
--phone_set=ipa \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--abbreviations_file=<txt_file>
|
FastPitch + HiFi-GAN IPA (de-DE 男性) |
riva-build speech_synthesis \
<rmir_filename>:<key> \
<riva_fastpitch_file>:ZzZjce65zzGZ9o \
<riva_hifigan_file>:ZzZjce65zzGZ9o \
--language_code=de-DE \
--phone_dictionary_file=<dict_file> \
--sample_rate 22050 \
--voice_name German-DE-Male-1 \
--phone_set=ipa \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--abbreviations_file=<txt_file>
|
T5TTS + AudioCodec IPA |
riva-build speech_synthesis \
<rmir_filename>:<key> \
<riva_t5tts_file>:<key> \
<riva_audiocodec_file>:<key> \
<riva_neuralg2p_file>:<key> \
--num_speakers=11 \
--phone_dictionary_file=<txt_phone_dictionary_file> \
--sample_rate 22050 \
--voice_name English-US-T5TTS \
--subvoices Female-1:0,Male-1:1,Male-Calm:8,Female-Calm:9,Female-Fearful:11,Male-Neutral:12,Male-Angry:14,Female-Angry:16,Female-Neutral:17,Male-Fearful:20,Female-Happy:21 \
--phone_set=ipa \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--preprocessor.g2p_ignore_ambiguous=False \
--abbreviations_file=<txt_abbreviations_file>
|
RadTTS + HiFi-GAN IPA |
riva-build speech_synthesis \
<rmir_filename>:<key> \
<riva_radtts_file>:<key> \
<riva_hifigan_file>:<key> \
--num_speakers=12 \
--phone_dictionary_file=<txt_phone_dictionary_file> \
--sample_rate 44100 \
--voice_name English-US-RadTTS \
--subvoices Female-1:0,Male-1:1,Female-Neutral:2,Male-Neutral:3,Female-Angry:4,Male-Angry:5,Female-Calm:6,Male-Calm:7,Female-Fearful:10,Female-Happy:12,Male-Happy:13,Female-Sad:14 \
--phone_set=ipa \
--upper_case_chars=True \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--preprocessor.enable_emphasis_tag=True \
--preprocessor.start_of_emphasis_token='[' \
--preprocessor.end_of_emphasis_token=']' \
--abbreviations_file=<txt_abbreviations_file>
|
FastPitch + HiFi-GAN ARPABET |
riva-build speech_synthesis \
<rmir_filename>:<key> \
<riva_fastpitch_file>:<key> \
<riva_hifigan_file>:<key> \
--arpabet_file=cmudict-0.7b_nv22.08 \
--sample_rate 44100 \
--voice_name English-US \
--subvoices Male-1:0,Female-1:1 \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--preprocessor.enable_emphasis_tag=True \
--preprocessor.start_of_emphasis_token='[' \
--preprocessor.end_of_emphasis_token=']' \
--abbreviations_file=<txt_file>
|
FastPitch + HiFi-GAN LJSpeech |
riva-build speech_synthesis \
<rmir_filename>:<key> \
<riva_fastpitch_file>:<key> \
<riva_hifigan_file>:<key> \
--arpabet_file=..cmudict-0.7b_nv22.08 \
--voice_name ljspeech \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--abbreviations_file=<txt_file>
|
所有文本规范化 .far
文件都在 NGC 上的 Riva TTS 英语规范化语法 页面中。所有其他不是 .riva
文件(例如发音词典)的辅助文件都在 NGC 上的 Riva TTS 英语美国辅助文件 页面中。
Riva-build 可选参数#
有关传递给 riva-build
以自定义 TTS 管道的参数的详细信息,请发出
riva-build speech_synthesis -h
以下列表包含当前 riva-build
识别的所有可选参数的描述
usage: riva-build speech_synthesis [-h] [-f] [-v]
[--language_code LANGUAGE_CODE]
[--instance_group_count INSTANCE_GROUP_COUNT]
[--kind KIND]
[--max_batch_size MAX_BATCH_SIZE]
[--max_queue_delay_microseconds MAX_QUEUE_DELAY_MICROSECONDS]
[--batching_type BATCHING_TYPE]
[--voice_name VOICE_NAME]
[--num_speakers NUM_SPEAKERS]
[--subvoices SUBVOICES]
[--sample_rate SAMPLE_RATE]
[--chunk_length CHUNK_LENGTH]
[--chunk_ms CHUNK_MS]
[--overlap_length OVERLAP_LENGTH]
[--num_mels NUM_MELS]
[--num_samples_per_frame NUM_SAMPLES_PER_FRAME]
[--abbreviations_file ABBREVIATIONS_FILE]
[--has_mapping_file HAS_MAPPING_FILE]
[--mapping_file MAPPING_FILE]
[--wfst_tokenizer_model WFST_TOKENIZER_MODEL]
[--wfst_verbalizer_model WFST_VERBALIZER_MODEL]
[--wfst_pre_process_model WFST_PRE_PROCESS_MODEL]
[--wfst_post_process_model WFST_POST_PROCESS_MODEL]
[--arpabet_file ARPABET_FILE]
[--phone_dictionary_file PHONE_DICTIONARY_FILE]
[--phone_set PHONE_SET]
[--upper_case_chars UPPER_CASE_CHARS]
[--upper_case_g2p UPPER_CASE_G2P]
[--mel_basis_file_path MEL_BASIS_FILE_PATH]
[--voice_map_file VOICE_MAP_FILE]
[--history_future HISTORY_FUTURE]
[--postprocessor.max_sequence_idle_microseconds POSTPROCESSOR.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--postprocessor.max_batch_size POSTPROCESSOR.MAX_BATCH_SIZE]
[--postprocessor.min_batch_size POSTPROCESSOR.MIN_BATCH_SIZE]
[--postprocessor.opt_batch_size POSTPROCESSOR.OPT_BATCH_SIZE]
[--postprocessor.preferred_batch_size POSTPROCESSOR.PREFERRED_BATCH_SIZE]
[--postprocessor.batching_type POSTPROCESSOR.BATCHING_TYPE]
[--postprocessor.preserve_ordering POSTPROCESSOR.PRESERVE_ORDERING]
[--postprocessor.instance_group_count POSTPROCESSOR.INSTANCE_GROUP_COUNT]
[--postprocessor.max_queue_delay_microseconds POSTPROCESSOR.MAX_QUEUE_DELAY_MICROSECONDS]
[--postprocessor.optimization_graph_level POSTPROCESSOR.OPTIMIZATION_GRAPH_LEVEL]
[--postprocessor.fade_length POSTPROCESSOR.FADE_LENGTH]
[--preprocessor.max_sequence_idle_microseconds PREPROCESSOR.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--preprocessor.max_batch_size PREPROCESSOR.MAX_BATCH_SIZE]
[--preprocessor.min_batch_size PREPROCESSOR.MIN_BATCH_SIZE]
[--preprocessor.opt_batch_size PREPROCESSOR.OPT_BATCH_SIZE]
[--preprocessor.preferred_batch_size PREPROCESSOR.PREFERRED_BATCH_SIZE]
[--preprocessor.batching_type PREPROCESSOR.BATCHING_TYPE]
[--preprocessor.preserve_ordering PREPROCESSOR.PRESERVE_ORDERING]
[--preprocessor.instance_group_count PREPROCESSOR.INSTANCE_GROUP_COUNT]
[--preprocessor.max_queue_delay_microseconds PREPROCESSOR.MAX_QUEUE_DELAY_MICROSECONDS]
[--preprocessor.optimization_graph_level PREPROCESSOR.OPTIMIZATION_GRAPH_LEVEL]
[--preprocessor.mapping_path PREPROCESSOR.MAPPING_PATH]
[--preprocessor.g2p_ignore_ambiguous PREPROCESSOR.G2P_IGNORE_AMBIGUOUS]
[--preprocessor.language PREPROCESSOR.LANGUAGE]
[--preprocessor.max_sequence_length PREPROCESSOR.MAX_SEQUENCE_LENGTH]
[--preprocessor.max_input_length PREPROCESSOR.MAX_INPUT_LENGTH]
[--preprocessor.mapping PREPROCESSOR.MAPPING]
[--preprocessor.tolower PREPROCESSOR.TOLOWER]
[--preprocessor.pad_with_space PREPROCESSOR.PAD_WITH_SPACE]
[--preprocessor.enable_emphasis_tag PREPROCESSOR.ENABLE_EMPHASIS_TAG]
[--preprocessor.start_of_emphasis_token PREPROCESSOR.START_OF_EMPHASIS_TOKEN]
[--preprocessor.end_of_emphasis_token PREPROCESSOR.END_OF_EMPHASIS_TOKEN]
[--encoderFastPitch.max_sequence_idle_microseconds ENCODERFASTPITCH.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--encoderFastPitch.max_batch_size ENCODERFASTPITCH.MAX_BATCH_SIZE]
[--encoderFastPitch.min_batch_size ENCODERFASTPITCH.MIN_BATCH_SIZE]
[--encoderFastPitch.opt_batch_size ENCODERFASTPITCH.OPT_BATCH_SIZE]
[--encoderFastPitch.preferred_batch_size ENCODERFASTPITCH.PREFERRED_BATCH_SIZE]
[--encoderFastPitch.batching_type ENCODERFASTPITCH.BATCHING_TYPE]
[--encoderFastPitch.preserve_ordering ENCODERFASTPITCH.PRESERVE_ORDERING]
[--encoderFastPitch.instance_group_count ENCODERFASTPITCH.INSTANCE_GROUP_COUNT]
[--encoderFastPitch.max_queue_delay_microseconds ENCODERFASTPITCH.MAX_QUEUE_DELAY_MICROSECONDS]
[--encoderFastPitch.optimization_graph_level ENCODERFASTPITCH.OPTIMIZATION_GRAPH_LEVEL]
[--encoderFastPitch.trt_max_workspace_size ENCODERFASTPITCH.TRT_MAX_WORKSPACE_SIZE]
[--encoderFastPitch.use_onnx_runtime]
[--encoderFastPitch.use_torchscript]
[--encoderFastPitch.use_trt_fp32]
[--encoderFastPitch.fp16_needs_obey_precision_pass]
[--encoderRadTTS.max_sequence_idle_microseconds ENCODERRADTTS.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--encoderRadTTS.max_batch_size ENCODERRADTTS.MAX_BATCH_SIZE]
[--encoderRadTTS.min_batch_size ENCODERRADTTS.MIN_BATCH_SIZE]
[--encoderRadTTS.opt_batch_size ENCODERRADTTS.OPT_BATCH_SIZE]
[--encoderRadTTS.preferred_batch_size ENCODERRADTTS.PREFERRED_BATCH_SIZE]
[--encoderRadTTS.batching_type ENCODERRADTTS.BATCHING_TYPE]
[--encoderRadTTS.preserve_ordering ENCODERRADTTS.PRESERVE_ORDERING]
[--encoderRadTTS.instance_group_count ENCODERRADTTS.INSTANCE_GROUP_COUNT]
[--encoderRadTTS.max_queue_delay_microseconds ENCODERRADTTS.MAX_QUEUE_DELAY_MICROSECONDS]
[--encoderRadTTS.optimization_graph_level ENCODERRADTTS.OPTIMIZATION_GRAPH_LEVEL]
[--encoderRadTTS.trt_max_workspace_size ENCODERRADTTS.TRT_MAX_WORKSPACE_SIZE]
[--encoderRadTTS.use_onnx_runtime]
[--encoderRadTTS.use_torchscript]
[--encoderRadTTS.use_trt_fp32]
[--encoderRadTTS.fp16_needs_obey_precision_pass]
[--encoderPflow.max_sequence_idle_microseconds ENCODERPFLOW.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--encoderPflow.max_batch_size ENCODERPFLOW.MAX_BATCH_SIZE]
[--encoderPflow.min_batch_size ENCODERPFLOW.MIN_BATCH_SIZE]
[--encoderPflow.opt_batch_size ENCODERPFLOW.OPT_BATCH_SIZE]
[--encoderPflow.preferred_batch_size ENCODERPFLOW.PREFERRED_BATCH_SIZE]
[--encoderPflow.batching_type ENCODERPFLOW.BATCHING_TYPE]
[--encoderPflow.preserve_ordering ENCODERPFLOW.PRESERVE_ORDERING]
[--encoderPflow.instance_group_count ENCODERPFLOW.INSTANCE_GROUP_COUNT]
[--encoderPflow.max_queue_delay_microseconds ENCODERPFLOW.MAX_QUEUE_DELAY_MICROSECONDS]
[--encoderPflow.optimization_graph_level ENCODERPFLOW.OPTIMIZATION_GRAPH_LEVEL]
[--encoderPflow.trt_max_workspace_size ENCODERPFLOW.TRT_MAX_WORKSPACE_SIZE]
[--encoderPflow.use_onnx_runtime]
[--encoderPflow.use_torchscript]
[--encoderPflow.use_trt_fp32]
[--encoderPflow.fp16_needs_obey_precision_pass]
[--t5tts.max_sequence_idle_microseconds T5TTS.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--t5tts.max_batch_size T5TTS.MAX_BATCH_SIZE]
[--t5tts.min_batch_size T5TTS.MIN_BATCH_SIZE]
[--t5tts.opt_batch_size T5TTS.OPT_BATCH_SIZE]
[--t5tts.preferred_batch_size T5TTS.PREFERRED_BATCH_SIZE]
[--t5tts.batching_type T5TTS.BATCHING_TYPE]
[--t5tts.preserve_ordering T5TTS.PRESERVE_ORDERING]
[--t5tts.instance_group_count T5TTS.INSTANCE_GROUP_COUNT]
[--t5tts.max_queue_delay_microseconds T5TTS.MAX_QUEUE_DELAY_MICROSECONDS]
[--t5tts.optimization_graph_level T5TTS.OPTIMIZATION_GRAPH_LEVEL]
[--t5tts.chunk_ms T5TTS.CHUNK_MS]
[--t5tts.history_future T5TTS.HISTORY_FUTURE]
[--t5tts.fade_ms T5TTS.FADE_MS]
[--chunkerFastPitch.max_sequence_idle_microseconds CHUNKERFASTPITCH.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--chunkerFastPitch.max_batch_size CHUNKERFASTPITCH.MAX_BATCH_SIZE]
[--chunkerFastPitch.min_batch_size CHUNKERFASTPITCH.MIN_BATCH_SIZE]
[--chunkerFastPitch.opt_batch_size CHUNKERFASTPITCH.OPT_BATCH_SIZE]
[--chunkerFastPitch.preferred_batch_size CHUNKERFASTPITCH.PREFERRED_BATCH_SIZE]
[--chunkerFastPitch.batching_type CHUNKERFASTPITCH.BATCHING_TYPE]
[--chunkerFastPitch.preserve_ordering CHUNKERFASTPITCH.PRESERVE_ORDERING]
[--chunkerFastPitch.instance_group_count CHUNKERFASTPITCH.INSTANCE_GROUP_COUNT]
[--chunkerFastPitch.max_queue_delay_microseconds CHUNKERFASTPITCH.MAX_QUEUE_DELAY_MICROSECONDS]
[--chunkerFastPitch.optimization_graph_level CHUNKERFASTPITCH.OPTIMIZATION_GRAPH_LEVEL]
[--hifigan.max_sequence_idle_microseconds HIFIGAN.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--hifigan.max_batch_size HIFIGAN.MAX_BATCH_SIZE]
[--hifigan.min_batch_size HIFIGAN.MIN_BATCH_SIZE]
[--hifigan.opt_batch_size HIFIGAN.OPT_BATCH_SIZE]
[--hifigan.preferred_batch_size HIFIGAN.PREFERRED_BATCH_SIZE]
[--hifigan.batching_type HIFIGAN.BATCHING_TYPE]
[--hifigan.preserve_ordering HIFIGAN.PRESERVE_ORDERING]
[--hifigan.instance_group_count HIFIGAN.INSTANCE_GROUP_COUNT]
[--hifigan.max_queue_delay_microseconds HIFIGAN.MAX_QUEUE_DELAY_MICROSECONDS]
[--hifigan.optimization_graph_level HIFIGAN.OPTIMIZATION_GRAPH_LEVEL]
[--hifigan.trt_max_workspace_size HIFIGAN.TRT_MAX_WORKSPACE_SIZE]
[--hifigan.use_onnx_runtime]
[--hifigan.use_torchscript]
[--hifigan.use_trt_fp32]
[--hifigan.fp16_needs_obey_precision_pass]
[--neuralg2p.max_sequence_idle_microseconds NEURALG2P.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--neuralg2p.max_batch_size NEURALG2P.MAX_BATCH_SIZE]
[--neuralg2p.min_batch_size NEURALG2P.MIN_BATCH_SIZE]
[--neuralg2p.opt_batch_size NEURALG2P.OPT_BATCH_SIZE]
[--neuralg2p.preferred_batch_size NEURALG2P.PREFERRED_BATCH_SIZE]
[--neuralg2p.batching_type NEURALG2P.BATCHING_TYPE]
[--neuralg2p.preserve_ordering NEURALG2P.PRESERVE_ORDERING]
[--neuralg2p.instance_group_count NEURALG2P.INSTANCE_GROUP_COUNT]
[--neuralg2p.max_queue_delay_microseconds NEURALG2P.MAX_QUEUE_DELAY_MICROSECONDS]
[--neuralg2p.optimization_graph_level NEURALG2P.OPTIMIZATION_GRAPH_LEVEL]
[--neuralg2p.trt_max_workspace_size NEURALG2P.TRT_MAX_WORKSPACE_SIZE]
[--neuralg2p.use_onnx_runtime]
[--neuralg2p.use_torchscript]
[--neuralg2p.use_trt_fp32]
[--neuralg2p.fp16_needs_obey_precision_pass]
output_path source_path [source_path ...]
Generate a Riva Model from a speech_synthesis model trained with NVIDIA NeMo.
positional arguments:
output_path Location to write compiled Riva pipeline
source_path Source file(s)
options:
-h, --help show this help message and exit
-f, --force Overwrite existing artifacts if they exist
-v, --verbose Verbose log outputs
--language_code LANGUAGE_CODE
Language of the model
--instance_group_count INSTANCE_GROUP_COUNT
How many instances in a group
--kind KIND Backend runs on CPU or GPU
--max_batch_size MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--max_queue_delay_microseconds MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--batching_type BATCHING_TYPE
--voice_name VOICE_NAME
Set the voice name for speech synthesis
--num_speakers NUM_SPEAKERS
Number of unique speakers.
--subvoices SUBVOICES
Comma-separated list of subvoices (no whitespace).
--sample_rate SAMPLE_RATE
Sample rate of the output signal
--chunk_length CHUNK_LENGTH
Chunk length in mel frames to synthesize at one time
--chunk_ms CHUNK_MS For T5 TTS only, chunk length (in ms) to synthesize at
a time.
--overlap_length OVERLAP_LENGTH
Chunk length in mel frames to overlap neighboring
chunks
--num_mels NUM_MELS number of mels
--num_samples_per_frame NUM_SAMPLES_PER_FRAME
number of samples per frame
--abbreviations_file ABBREVIATIONS_FILE
Path to file with list of abbreviations and
corresponding expansions
--has_mapping_file HAS_MAPPING_FILE
--mapping_file MAPPING_FILE
Path to phoneme mapping file
--wfst_tokenizer_model WFST_TOKENIZER_MODEL
Sparrowhawk model to use for tokenization and
classification, must be in .far format
--wfst_verbalizer_model WFST_VERBALIZER_MODEL
Sparrowhawk model to use for verbalizer, must be in
.far format.
--wfst_pre_process_model WFST_PRE_PROCESS_MODEL
Sparrowhawk model to use for pre process, must be in
.far format.
--wfst_post_process_model WFST_POST_PROCESS_MODEL
Sparrowhawk model to use for post process, must be in
.far format.
--arpabet_file ARPABET_FILE
Path to pronunciation dictionary (deprecated)
--phone_dictionary_file PHONE_DICTIONARY_FILE
Path to pronunciation dictionary
--phone_set PHONE_SET
Phonetic set that the model was trained on. An unset
value will attempt to auto-detect the phone set used
during training. Supports either "arpabet", "ipa",
"none".
--upper_case_chars UPPER_CASE_CHARS
Whether character representations for this model are
upper case or lower case.
--upper_case_g2p UPPER_CASE_G2P
Whether character representations for this model are
upper case or lower case.
--mel_basis_file_path MEL_BASIS_FILE_PATH
Pre calculated Mel basis file for Audio to Mel
--voice_map_file VOICE_MAP_FILE
Default voice name to filepath map
--history_future HISTORY_FUTURE
Number of Codec Future/History frames
postprocessor:
--postprocessor.max_sequence_idle_microseconds POSTPROCESSOR.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--postprocessor.max_batch_size POSTPROCESSOR.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--postprocessor.min_batch_size POSTPROCESSOR.MIN_BATCH_SIZE
--postprocessor.opt_batch_size POSTPROCESSOR.OPT_BATCH_SIZE
--postprocessor.preferred_batch_size POSTPROCESSOR.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--postprocessor.batching_type POSTPROCESSOR.BATCHING_TYPE
--postprocessor.preserve_ordering POSTPROCESSOR.PRESERVE_ORDERING
Preserve ordering
--postprocessor.instance_group_count POSTPROCESSOR.INSTANCE_GROUP_COUNT
How many instances in a group
--postprocessor.max_queue_delay_microseconds POSTPROCESSOR.MAX_QUEUE_DELAY_MICROSECONDS
max queue delta in microseconds
--postprocessor.optimization_graph_level POSTPROCESSOR.OPTIMIZATION_GRAPH_LEVEL
The Graph optimization level to use in Triton model
configuration
--postprocessor.fade_length POSTPROCESSOR.FADE_LENGTH
Cross fade length in samples used in between audio
chunks
preprocessor:
--preprocessor.max_sequence_idle_microseconds PREPROCESSOR.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--preprocessor.max_batch_size PREPROCESSOR.MAX_BATCH_SIZE
Use Batched Forward calls
--preprocessor.min_batch_size PREPROCESSOR.MIN_BATCH_SIZE
--preprocessor.opt_batch_size PREPROCESSOR.OPT_BATCH_SIZE
--preprocessor.preferred_batch_size PREPROCESSOR.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--preprocessor.batching_type PREPROCESSOR.BATCHING_TYPE
--preprocessor.preserve_ordering PREPROCESSOR.PRESERVE_ORDERING
Preserve ordering
--preprocessor.instance_group_count PREPROCESSOR.INSTANCE_GROUP_COUNT
How many instances in a group
--preprocessor.max_queue_delay_microseconds PREPROCESSOR.MAX_QUEUE_DELAY_MICROSECONDS
max queue delta in microseconds
--preprocessor.optimization_graph_level PREPROCESSOR.OPTIMIZATION_GRAPH_LEVEL
The Graph optimization level to use in Triton model
configuration
--preprocessor.mapping_path PREPROCESSOR.MAPPING_PATH
--preprocessor.g2p_ignore_ambiguous PREPROCESSOR.G2P_IGNORE_AMBIGUOUS
--preprocessor.language PREPROCESSOR.LANGUAGE
--preprocessor.max_sequence_length PREPROCESSOR.MAX_SEQUENCE_LENGTH
maximum length of every emitted sequence
--preprocessor.max_input_length PREPROCESSOR.MAX_INPUT_LENGTH
maximum length of input string
--preprocessor.mapping PREPROCESSOR.MAPPING
--preprocessor.tolower PREPROCESSOR.TOLOWER
--preprocessor.pad_with_space PREPROCESSOR.PAD_WITH_SPACE
--preprocessor.enable_emphasis_tag PREPROCESSOR.ENABLE_EMPHASIS_TAG
Boolean flag that controls if the emphasis tag should
be parsed or not during pre-processing
--preprocessor.start_of_emphasis_token PREPROCESSOR.START_OF_EMPHASIS_TOKEN
field to indicate start of emphasis in the given text
--preprocessor.end_of_emphasis_token PREPROCESSOR.END_OF_EMPHASIS_TOKEN
field to indicate end of emphasis in the given text
encoderFastPitch:
--encoderFastPitch.max_sequence_idle_microseconds ENCODERFASTPITCH.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--encoderFastPitch.max_batch_size ENCODERFASTPITCH.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--encoderFastPitch.min_batch_size ENCODERFASTPITCH.MIN_BATCH_SIZE
--encoderFastPitch.opt_batch_size ENCODERFASTPITCH.OPT_BATCH_SIZE
--encoderFastPitch.preferred_batch_size ENCODERFASTPITCH.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--encoderFastPitch.batching_type ENCODERFASTPITCH.BATCHING_TYPE
--encoderFastPitch.preserve_ordering ENCODERFASTPITCH.PRESERVE_ORDERING
Preserve ordering
--encoderFastPitch.instance_group_count ENCODERFASTPITCH.INSTANCE_GROUP_COUNT
How many instances in a group
--encoderFastPitch.max_queue_delay_microseconds ENCODERFASTPITCH.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--encoderFastPitch.optimization_graph_level ENCODERFASTPITCH.OPTIMIZATION_GRAPH_LEVEL
The Graph optimization level to use in Triton model
configuration
--encoderFastPitch.trt_max_workspace_size ENCODERFASTPITCH.TRT_MAX_WORKSPACE_SIZE
Maximum workspace size (in Mb) to use for model export
to TensorRT
--encoderFastPitch.use_onnx_runtime
Use ONNX runtime instead of TensorRT
--encoderFastPitch.use_torchscript
Use TorchScript instead of TensorRT
--encoderFastPitch.use_trt_fp32
Use TensorRT engine with FP32 instead of FP16
--encoderFastPitch.fp16_needs_obey_precision_pass
Flag to explicitly mark layers as float when parsing
the ONNX network
encoderRadTTS:
--encoderRadTTS.max_sequence_idle_microseconds ENCODERRADTTS.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--encoderRadTTS.max_batch_size ENCODERRADTTS.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--encoderRadTTS.min_batch_size ENCODERRADTTS.MIN_BATCH_SIZE
--encoderRadTTS.opt_batch_size ENCODERRADTTS.OPT_BATCH_SIZE
--encoderRadTTS.preferred_batch_size ENCODERRADTTS.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--encoderRadTTS.batching_type ENCODERRADTTS.BATCHING_TYPE
--encoderRadTTS.preserve_ordering ENCODERRADTTS.PRESERVE_ORDERING
Preserve ordering
--encoderRadTTS.instance_group_count ENCODERRADTTS.INSTANCE_GROUP_COUNT
How many instances in a group
--encoderRadTTS.max_queue_delay_microseconds ENCODERRADTTS.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--encoderRadTTS.optimization_graph_level ENCODERRADTTS.OPTIMIZATION_GRAPH_LEVEL
The Graph optimization level to use in Triton model
configuration
--encoderRadTTS.trt_max_workspace_size ENCODERRADTTS.TRT_MAX_WORKSPACE_SIZE
Maximum workspace size (in Mb) to use for model export
to TensorRT
--encoderRadTTS.use_onnx_runtime
Use ONNX runtime instead of TensorRT
--encoderRadTTS.use_torchscript
Use TorchScript instead of TensorRT
--encoderRadTTS.use_trt_fp32
Use TensorRT engine with FP32 instead of FP16
--encoderRadTTS.fp16_needs_obey_precision_pass
Flag to explicitly mark layers as float when parsing
the ONNX network
encoderPflow:
--encoderPflow.max_sequence_idle_microseconds ENCODERPFLOW.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--encoderPflow.max_batch_size ENCODERPFLOW.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--encoderPflow.min_batch_size ENCODERPFLOW.MIN_BATCH_SIZE
--encoderPflow.opt_batch_size ENCODERPFLOW.OPT_BATCH_SIZE
--encoderPflow.preferred_batch_size ENCODERPFLOW.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--encoderPflow.batching_type ENCODERPFLOW.BATCHING_TYPE
--encoderPflow.preserve_ordering ENCODERPFLOW.PRESERVE_ORDERING
Preserve ordering
--encoderPflow.instance_group_count ENCODERPFLOW.INSTANCE_GROUP_COUNT
How many instances in a group
--encoderPflow.max_queue_delay_microseconds ENCODERPFLOW.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--encoderPflow.optimization_graph_level ENCODERPFLOW.OPTIMIZATION_GRAPH_LEVEL
The Graph optimization level to use in Triton model
configuration
--encoderPflow.trt_max_workspace_size ENCODERPFLOW.TRT_MAX_WORKSPACE_SIZE
Maximum workspace size (in Mb) to use for model export
to TensorRT
--encoderPflow.use_onnx_runtime
Use ONNX runtime instead of TensorRT
--encoderPflow.use_torchscript
Use TorchScript instead of TensorRT
--encoderPflow.use_trt_fp32
Use TensorRT engine with FP32 instead of FP16
--encoderPflow.fp16_needs_obey_precision_pass
Flag to explicitly mark layers as float when parsing
the ONNX network
t5tts:
--t5tts.max_sequence_idle_microseconds T5TTS.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--t5tts.max_batch_size T5TTS.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--t5tts.min_batch_size T5TTS.MIN_BATCH_SIZE
--t5tts.opt_batch_size T5TTS.OPT_BATCH_SIZE
--t5tts.preferred_batch_size T5TTS.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--t5tts.batching_type T5TTS.BATCHING_TYPE
--t5tts.preserve_ordering T5TTS.PRESERVE_ORDERING
Preserve ordering
--t5tts.instance_group_count T5TTS.INSTANCE_GROUP_COUNT
How many instances in a group
--t5tts.max_queue_delay_microseconds T5TTS.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--t5tts.optimization_graph_level T5TTS.OPTIMIZATION_GRAPH_LEVEL
The Graph optimization level to use in Triton model
configuration
--t5tts.chunk_ms T5TTS.CHUNK_MS
Chunk size in ms
--t5tts.history_future T5TTS.HISTORY_FUTURE
Number of codec frames to use as history/future
--t5tts.fade_ms T5TTS.FADE_MS
Fade-in/Fade-out for the chunk in ms
chunkerFastPitch:
--chunkerFastPitch.max_sequence_idle_microseconds CHUNKERFASTPITCH.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--chunkerFastPitch.max_batch_size CHUNKERFASTPITCH.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--chunkerFastPitch.min_batch_size CHUNKERFASTPITCH.MIN_BATCH_SIZE
--chunkerFastPitch.opt_batch_size CHUNKERFASTPITCH.OPT_BATCH_SIZE
--chunkerFastPitch.preferred_batch_size CHUNKERFASTPITCH.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--chunkerFastPitch.batching_type CHUNKERFASTPITCH.BATCHING_TYPE
--chunkerFastPitch.preserve_ordering CHUNKERFASTPITCH.PRESERVE_ORDERING
Preserve ordering
--chunkerFastPitch.instance_group_count CHUNKERFASTPITCH.INSTANCE_GROUP_COUNT
How many instances in a group
--chunkerFastPitch.max_queue_delay_microseconds CHUNKERFASTPITCH.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--chunkerFastPitch.optimization_graph_level CHUNKERFASTPITCH.OPTIMIZATION_GRAPH_LEVEL
The Graph optimization level to use in Triton model
configuration
hifigan:
--hifigan.max_sequence_idle_microseconds HIFIGAN.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--hifigan.max_batch_size HIFIGAN.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--hifigan.min_batch_size HIFIGAN.MIN_BATCH_SIZE
--hifigan.opt_batch_size HIFIGAN.OPT_BATCH_SIZE
--hifigan.preferred_batch_size HIFIGAN.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--hifigan.batching_type HIFIGAN.BATCHING_TYPE
--hifigan.preserve_ordering HIFIGAN.PRESERVE_ORDERING
Preserve ordering
--hifigan.instance_group_count HIFIGAN.INSTANCE_GROUP_COUNT
How many instances in a group
--hifigan.max_queue_delay_microseconds HIFIGAN.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--hifigan.optimization_graph_level HIFIGAN.OPTIMIZATION_GRAPH_LEVEL
The Graph optimization level to use in Triton model
configuration
--hifigan.trt_max_workspace_size HIFIGAN.TRT_MAX_WORKSPACE_SIZE
Maximum workspace size (in Mb) to use for model export
to TensorRT
--hifigan.use_onnx_runtime
Use ONNX runtime instead of TensorRT
--hifigan.use_torchscript
Use TorchScript instead of TensorRT
--hifigan.use_trt_fp32
Use TensorRT engine with FP32 instead of FP16
--hifigan.fp16_needs_obey_precision_pass
Flag to explicitly mark layers as float when parsing
the ONNX network
neuralg2p:
--neuralg2p.max_sequence_idle_microseconds NEURALG2P.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--neuralg2p.max_batch_size NEURALG2P.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--neuralg2p.min_batch_size NEURALG2P.MIN_BATCH_SIZE
--neuralg2p.opt_batch_size NEURALG2P.OPT_BATCH_SIZE
--neuralg2p.preferred_batch_size NEURALG2P.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--neuralg2p.batching_type NEURALG2P.BATCHING_TYPE
--neuralg2p.preserve_ordering NEURALG2P.PRESERVE_ORDERING
Preserve ordering
--neuralg2p.instance_group_count NEURALG2P.INSTANCE_GROUP_COUNT
How many instances in a group
--neuralg2p.max_queue_delay_microseconds NEURALG2P.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--neuralg2p.optimization_graph_level NEURALG2P.OPTIMIZATION_GRAPH_LEVEL
The Graph optimization level to use in Triton model
configuration
--neuralg2p.trt_max_workspace_size NEURALG2P.TRT_MAX_WORKSPACE_SIZE
Maximum workspace size (in Mb) to use for model export
to TensorRT
--neuralg2p.use_onnx_runtime
Use ONNX runtime instead of TensorRT
--neuralg2p.use_torchscript
Use TorchScript instead of TensorRT
--neuralg2p.use_trt_fp32
Use TensorRT engine with FP32 instead of FP16
--neuralg2p.fp16_needs_obey_precision_pass
Flag to explicitly mark layers as float when parsing
the ONNX network