自定义模型#

模型部署#

与所有 Riva 模型一样，Riva TTS 需要以下步骤

按照 NeMo 部分的概述，从 .nemo 文件为每个模型创建 .riva 文件。

使用 riva-build 为每个 Riva 语音 AI 技能（例如，ASR、NLP 和 TTS）创建 .rmir 文件。

使用 riva_deploy 创建模型目录。

使用 riva_server 部署模型目录。

以下部分提供了上述步骤 1 和 2 的示例。对于步骤 3 和 4，请参考使用 riva-deploy 和 Riva 语音容器（高级）。

创建 Riva 文件#

Riva 文件可以从 .nemo 文件创建。正如之前在 NeMo 部分中提到的，从 .nemo 文件生成 Riva 文件必须仅在 Linux x86_64 工作站上完成。

以下是如何将 HiFi-GAN 模型从 .nemo 文件转换为 .riva 文件的示例。

从 NGC 下载 .nemo 文件到主机系统。
运行 NeMo 容器，并将 .nemo 文件与容器共享，包括 -v 选项。

wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_hifigan/versions/1.0.0rc1/zip -O tts_hifigan_1.0.0rc1.zip
unzip tts_hifigan_1.0.0rc1.zip
docker run --gpus all -it --rm \
    -v $(pwd):/NeMo \
    --shm-size=8g \
    -p 8888:8888 \
    -p 6006:6006 \
    --ulimit memlock=-1 \
    --ulimit stack=67108864 \
    --device=/dev/snd \
    nvcr.io/nvidia/nemo:22.08

容器启动后，使用 nemo2riva 将 .nemo 转换为 .riva。

pip3 install nvidia-pyindex
ngc registry resource download-version "nvidia/riva/riva_quickstart:2.18.0"
pip3 install "riva_quickstart_v2.18.0/nemo2riva-2.18.0-py3-none-any.whl"
nemo2riva --key encryption_key --out /NeMo/hifigan.riva /NeMo/tts_hifigan.nemo

为每个 .nemo 模型重复此过程以生成 .riva 文件。建议您先对 FastPitch 执行此操作，然后再继续下一步。在执行上述步骤时，请确保您获得最新的 tts_hifigan.nemo 检查点、最新的 nvcr.io/nvidia/nemo 容器版本以及最新的 nemo2riva-2.18.0_beta-py3-none-any.whl 版本

自定义#

创建 .riva 文件并在运行 riva-build 之前，有一些自定义选项可以调整。这些是可选的，但是，如果您有兴趣，构建默认 Riva 管道的说明，请跳到Riva-build 管道说明。

自定义发音#

部署在 Riva 中的语音合成模型配置了特定于语言的发音词典，该词典将大量词汇从其书写形式（字素）映射到一系列感知上不同的声音（音素）。在发音不明确的情况下，例如对于像 bass（鱼）和 bass（乐器）这样的同形异义词，词典将被忽略，合成模型将使用句子中的上下文线索来预测合适的发音。

现代语音合成算法令人惊讶地能够准确预测新词和生词的发音。然而，有时，需要或有必要为模型提供额外的上下文。

虽然可以使用 SSML 在请求时提供自定义发音，但请求时覆盖最适合一次性调整。对于具有固定发音的特定领域术语，在部署服务器时使用这些发音配置 Riva。

有两个关键参数可以通过 riva-build 或预处理器配置进行配置，这些参数会影响音素路径

--phone_dictionary_file 发音词典的路径。首先，将此参数留空。如果 .riva 文件是从包含词典工件的 .nemo 模型创建的，并且未设置此参数，则 Riva 将使用模型训练时使用的 NeMo 词典文件。要添加自定义条目并修改发音，请修改 NeMo 词典工件，将其保存到另一个文件，并将该文件路径与此参数一起传递给 riva-build。
--preprocessor.g2p_ignore_ambiguous 如果为 True，则在发音词典中具有多个语音表示形式的单词（如“read”）不会转换为音素。默认为 True。
如果使用 ipa，则 --upper_case_chars 应设置为 True。这会影响字素输入，因为 ipa 音素集包含小写英文字符。
--phone_set 可用于指定模型是否使用 arpabet 或 ipa 进行训练。如果未使用此标志，则 Riva 会尝试自动检测正确的音素集。

注意

--arpabet_file 已从 Riva 2.8.0 开始弃用，并由 --phone_dictionary_file 替换。

注意

Riva 同时支持 arpabet 和 ipa，具体取决于声学模型的训练方式。有关更多信息，请参阅 ARPABET 维基百科页面。有关 IPA 的更多信息，请参阅 TTS 音素支持页面。

要确定合适的音素序列，请使用 SSML API 试验音素序列并评估质量。一旦映射听起来正确，请将发现的映射添加到词典中的新行。

多说话人模型#

Riva 支持具有多个说话人的模型。

要启用此功能，请在构建模型之前指定以下参数。

--voice_name 是模型的名称。默认为 English-US.Female-1。
--subvoices 是每个子语音的名称的逗号分隔列表，其长度等于 FastPitch 模型中指定的子语音数量。例如，对于在第 0 个说话人嵌入中具有“male”子语音，在第一个嵌入中具有“female”子语音的模型，请包含选项 --subvoices=Male:0,Female:1。如果未提供，则可以通过整数索引请求所需的嵌入。

语音名称和子语音保存在生成的 .rmir 文件中，并携带到生成的 Triton 存储库中。在推理期间，通过在 voice_name 后附加句点和有效的子语音来修改请求的语音名称。例如，<voice_name>.<subvoice>。

自定义语音#

Riva 与语音无关，可以与任何 English-US TTS 语音一起运行。为了训练自定义语音模型，必须首先收集数据。我们建议至少 30 分钟的高质量数据。对于收集数据，请参阅 Riva 自定义语音记录器。收集数据后，需要在该数据集上微调 FastPitch 和 HiFi-GAN 模型。有关如何训练这些模型，请参阅 Riva 微调教程。可以使用本页上的说明构建使用这些模型的 Riva 管道。

自定义文本规范化#

Riva 支持从 NeMo 的 WFST 文本规范化 (TN) 工具构建的自定义文本规范化规则。有关自定义 TN 的详细信息，请参阅 NeMo WFST 教程。自定义 WFST 后，使用 NeMo 使用其 export_grammar 脚本部署它。有关更多信息，请参阅文档。这将生成两个文件：tokenize_and_classify.far 和 verbalize.far。这些文件使用 --wfst_tokenizer_model 和 --wfst_verbalizer_model 参数传递给 riva-build 步骤。此外，riva-build 还支持 --wfst_pre_process_model 和 --wfst_post_process_model 参数，以传递用于文本规范化的预处理和后处理 FAR 文件。

Riva-build 管道说明#

FastPitch 和 HiFi-GAN#

从 Riva 容器内部署 FastPitch 和 HiFi-GAN TTS 管道，如下所示

riva-build speech_synthesis \
    /servicemaker-dev/<rmir_filename>:<encryption_key> \
    /servicemaker-dev/<fastpitch_riva_filename>:<encryption_key> \
    /servicemaker-dev/<hifigan_riva_filename>:<encryption_key> \
    --voice_name=<pipeline_name> \
    --abbreviations_file=/servicemaker-dev/<abbr_file> \
    --arpabet_file=/servicemaker-dev/<dictionary_file> \
    --wfst_tokenizer_model=/servicemaker-dev/<tokenizer_far_file> \
    --wfst_verbalizer_model=/servicemaker-dev/<verbalizer_far_file> \
    --sample_rate=<sample_rate> \
    --subvoices=<subvoices> \

其中

<rmir_filename> 是生成的 Riva rmir 文件
<encryption_key> 是用于加密文件的密钥。NGC 上上传的预训练 Riva 模型的加密密钥为 tlt_encode，除非在预训练快速入门管道列表中的特定模型下指定。
pipeline_name 是模型存储库中组件的可选用户定义名称
<fastpitch_riva_filename> 是 FastPitch 的 riva 文件的名称
<hifigan_riva_filename> 是 HiFi-GAN 的 riva 文件的名称
<abbr_file> 是包含缩写及其相应展开的文件名
<dictionary_file> 是包含发音词典的文件名，该词典将单词映射到其在 ARPABET 中的语音表示形式
<voice_name> 是模型的名称
<subvoices> 是每个子语音的名称的逗号分隔列表。默认为按整数索引命名。这是多说话人模型所必需且仅用于多说话人模型的。
<wfst_tokenizer_model> 是从运行 NeMo 文本处理的 export_grammar.sh 脚本生成的 tokenize_and_classify.far 文件的位置
<wfst_verbalizer_model> 是从运行 NeMo 文本处理的 export_grammar.sh 脚本生成的 verbalize.far 文件的位置
<sample_rate> 是模型训练所用音频的采样率

成功完成此命令后，将在 /servicemaker-dev/ 文件夹中创建一个名为 <rmir_filename> 的文件。如果您的 .riva 存档已加密，则需要在 RMIR 和 riva 文件名的末尾包含 :<encryption_key>，否则这是不必要的。

对于嵌入式平台，建议使用批处理大小为 1，因为它实现了最低的内存占用。要使用批处理大小为 1，请参阅Riva-build 可选参数部分，并在执行 riva-build 命令时将各种 min_batch_size、max_batch_size 和 opt_batch_size 参数设置为 1。

预训练快速入门管道#

管道	`riva-build` 命令
FastPitch + HiFi-GAN IPA (en-US 多说话人)	riva-build speech_synthesis \ <rmir_filename>:<key> \ <riva_fastpitch_file>:<key> \ <riva_hifigan_file>:<key> \ --language_code=en-US \ --num_speakers=12 \ --phone_set=ipa \ --phone_dictionary_file=<txt_phone_dictionary_file> \ --sample_rate 44100 \ --voice_name English-US \ --subvoices Female-1:0,Male-1:1,Female-Neutral:2,Male-Neutral:3,Female-Angry:4,Male-Angry:5,Female-Calm:6,Male-Calm:7,Female-Fearful:10,Female-Happy:12,Male-Happy:13,Female-Sad:14 \ --wfst_tokenizer_model=<far_tokenizer_file> \ --wfst_verbalizer_model=<far_verbalizer_file> \ --upper_case_chars=True \ --preprocessor.enable_emphasis_tag=True \ --preprocessor.start_of_emphasis_token='[' \ --preprocessor.end_of_emphasis_token=']' \ --abbreviations_file=<txt_abbreviations_file>
FastPitch + HiFi-GAN IPA (zh-CN 多说话人)	riva-build speech_synthesis \ <rmir_filename>:<key> \ <riva_fastpitch_file>:<key> \ <riva_hifigan_file>:<key> \ --language_code=zh-CN \ --num_speakers=10 \ --phone_set=ipa \ --phone_dictionary_file=<txt_phone_dictionary_file> \ --sample_rate 44100 \ --voice_name Mandarin-CN \ --subvoices Female-1:0,Male-1:1,Female-Neutral:2,Male-Neutral:3,Male-Angry:5,Female-Calm:6,Male-Calm:7,Male-Fearful:11,Male-Happy:13,Male-Sad:15 \ --wfst_tokenizer_model=<far_tokenizer_file> \ --wfst_verbalizer_model=<far_verbalizer_file> \ --wfst_post_process_model=<far_post_process_file> \ --preprocessor.enable_emphasis_tag=True \ --preprocessor.start_of_emphasis_token='[' \ --preprocessor.end_of_emphasis_token=']'
FastPitch + HiFi-GAN IPA (es-ES 女性)	riva-build speech_synthesis \ <rmir_filename>:<key> \ <riva_fastpitch_file>:BSzv7YAjcH4nJS \ <riva_hifigan_file>:BSzv7YAjcH4nJS \ --language_code=es-ES \ --phone_dictionary_file=<dict_file> \ --sample_rate 22050 \ --voice_name Spanish-ES-Female-1 \ --phone_set=ipa \ --wfst_tokenizer_model=<far_tokenizer_file> \ --wfst_verbalizer_model=<far_verbalizer_file> \ --abbreviations_file=<txt_file>
FastPitch + HiFi-GAN IPA (es-ES 男性)	riva-build speech_synthesis \ <rmir_filename>:<key> \ <riva_fastpitch_file>:PPihyG3Moru5in \ <riva_hifigan_file>:PPihyG3Moru5in \ --language_code=es-ES \ --phone_dictionary_file=<dict_file> \ --sample_rate 22050 \ --voice_name Spanish-ES-Male-1 \ --phone_set=ipa \ --wfst_tokenizer_model=<far_tokenizer_file> \ --wfst_verbalizer_model=<far_verbalizer_file> \ --abbreviations_file=<txt_file>
FastPitch + HiFi-GAN IPA (es-US 多说话人)	riva-build speech_synthesis \ <rmir_filename>:<key> \ <riva_fastpitch_file>:<key> \ <riva_hifigan_file>:<key> \ --language_code=es-US \ --num_speakers=12 \ --phone_set=ipa \ --phone_dictionary_file=<txt_phone_dictionary_file> \ --sample_rate 44100 \ --voice_name Spanish-US \ --subvoices Female-1:0,Male-1:1,Female-Neutral:2,Male-Neutral:3,Female-Angry:4,Male-Angry:5,Female-Calm:6,Male-Calm:7,Male-Fearful:11,Male-Happy:13,Female-Sad:14,Male-Sad:15 \ --wfst_tokenizer_model=<far_tokenizer_file> \ --wfst_verbalizer_model=<far_verbalizer_file> \ --preprocessor.enable_emphasis_tag=True \ --preprocessor.start_of_emphasis_token='[' \ --preprocessor.end_of_emphasis_token=']'
FastPitch + HiFi-GAN IPA (it-IT 女性)	riva-build speech_synthesis \ <rmir_filename>:<key> \ <riva_fastpitch_file>:R62srgxeXBgVxg \ <riva_hifigan_file>:R62srgxeXBgVxg \ --language_code=it-IT \ --phone_dictionary_file=<dict_file> \ --sample_rate 22050 \ --voice_name Italian-IT-Female-1 \ --phone_set=ipa \ --wfst_tokenizer_model=<far_tokenizer_file> \ --wfst_verbalizer_model=<far_verbalizer_file> \ --abbreviations_file=<txt_file>
FastPitch + HiFi-GAN IPA (it-IT 男性)	riva-build speech_synthesis \ <rmir_filename>:<key> \ <riva_fastpitch_file>:dVRvg47ZqCdQrR \ <riva_hifigan_file>:dVRvg47ZqCdQrR \ --language_code=it-IT \ --phone_dictionary_file=<dict_file> \ --sample_rate 22050 \ --voice_name Italian-IT-Male-1 \ --phone_set=ipa \ --wfst_tokenizer_model=<far_tokenizer_file> \ --wfst_verbalizer_model=<far_verbalizer_file> \ --abbreviations_file=<txt_file>
FastPitch + HiFi-GAN IPA (de-DE 男性)	riva-build speech_synthesis \ <rmir_filename>:<key> \ <riva_fastpitch_file>:ZzZjce65zzGZ9o \ <riva_hifigan_file>:ZzZjce65zzGZ9o \ --language_code=de-DE \ --phone_dictionary_file=<dict_file> \ --sample_rate 22050 \ --voice_name German-DE-Male-1 \ --phone_set=ipa \ --wfst_tokenizer_model=<far_tokenizer_file> \ --wfst_verbalizer_model=<far_verbalizer_file> \ --abbreviations_file=<txt_file>
T5TTS + AudioCodec IPA	riva-build speech_synthesis \ <rmir_filename>:<key> \ <riva_t5tts_file>:<key> \ <riva_audiocodec_file>:<key> \ <riva_neuralg2p_file>:<key> \ --num_speakers=11 \ --phone_dictionary_file=<txt_phone_dictionary_file> \ --sample_rate 22050 \ --voice_name English-US-T5TTS \ --subvoices Female-1:0,Male-1:1,Male-Calm:8,Female-Calm:9,Female-Fearful:11,Male-Neutral:12,Male-Angry:14,Female-Angry:16,Female-Neutral:17,Male-Fearful:20,Female-Happy:21 \ --phone_set=ipa \ --wfst_tokenizer_model=<far_tokenizer_file> \ --wfst_verbalizer_model=<far_verbalizer_file> \ --preprocessor.g2p_ignore_ambiguous=False \ --abbreviations_file=<txt_abbreviations_file>
RadTTS + HiFi-GAN IPA	riva-build speech_synthesis \ <rmir_filename>:<key> \ <riva_radtts_file>:<key> \ <riva_hifigan_file>:<key> \ --num_speakers=12 \ --phone_dictionary_file=<txt_phone_dictionary_file> \ --sample_rate 44100 \ --voice_name English-US-RadTTS \ --subvoices Female-1:0,Male-1:1,Female-Neutral:2,Male-Neutral:3,Female-Angry:4,Male-Angry:5,Female-Calm:6,Male-Calm:7,Female-Fearful:10,Female-Happy:12,Male-Happy:13,Female-Sad:14 \ --phone_set=ipa \ --upper_case_chars=True \ --wfst_tokenizer_model=<far_tokenizer_file> \ --wfst_verbalizer_model=<far_verbalizer_file> \ --preprocessor.enable_emphasis_tag=True \ --preprocessor.start_of_emphasis_token='[' \ --preprocessor.end_of_emphasis_token=']' \ --abbreviations_file=<txt_abbreviations_file>
FastPitch + HiFi-GAN ARPABET	riva-build speech_synthesis \ <rmir_filename>:<key> \ <riva_fastpitch_file>:<key> \ <riva_hifigan_file>:<key> \ --arpabet_file=cmudict-0.7b_nv22.08 \ --sample_rate 44100 \ --voice_name English-US \ --subvoices Male-1:0,Female-1:1 \ --wfst_tokenizer_model=<far_tokenizer_file> \ --wfst_verbalizer_model=<far_verbalizer_file> \ --preprocessor.enable_emphasis_tag=True \ --preprocessor.start_of_emphasis_token='[' \ --preprocessor.end_of_emphasis_token=']' \ --abbreviations_file=<txt_file>
FastPitch + HiFi-GAN LJSpeech	riva-build speech_synthesis \ <rmir_filename>:<key> \ <riva_fastpitch_file>:<key> \ <riva_hifigan_file>:<key> \ --arpabet_file=..cmudict-0.7b_nv22.08 \ --voice_name ljspeech \ --wfst_tokenizer_model=<far_tokenizer_file> \ --wfst_verbalizer_model=<far_verbalizer_file> \ --abbreviations_file=<txt_file>

所有文本规范化 .far 文件都在 NGC 上的 Riva TTS 英语规范化语法页面中。所有其他不是 .riva 文件（例如发音词典）的辅助文件都在 NGC 上的 Riva TTS 英语美国辅助文件页面中。

Riva-build 可选参数#

有关传递给 riva-build 以自定义 TTS 管道的参数的详细信息，请发出

riva-build speech_synthesis -h

以下列表包含当前 riva-build 识别的所有可选参数的描述

usage: riva-build speech_synthesis [-h] [-f] [-v]
                                   [--language_code LANGUAGE_CODE]
                                   [--instance_group_count INSTANCE_GROUP_COUNT]
                                   [--kind KIND]
                                   [--max_batch_size MAX_BATCH_SIZE]
                                   [--max_queue_delay_microseconds MAX_QUEUE_DELAY_MICROSECONDS]
                                   [--batching_type BATCHING_TYPE]
                                   [--voice_name VOICE_NAME]
                                   [--num_speakers NUM_SPEAKERS]
                                   [--subvoices SUBVOICES]
                                   [--sample_rate SAMPLE_RATE]
                                   [--chunk_length CHUNK_LENGTH]
                                   [--chunk_ms CHUNK_MS]
                                   [--overlap_length OVERLAP_LENGTH]
                                   [--num_mels NUM_MELS]
                                   [--num_samples_per_frame NUM_SAMPLES_PER_FRAME]
                                   [--abbreviations_file ABBREVIATIONS_FILE]
                                   [--has_mapping_file HAS_MAPPING_FILE]
                                   [--mapping_file MAPPING_FILE]
                                   [--wfst_tokenizer_model WFST_TOKENIZER_MODEL]
                                   [--wfst_verbalizer_model WFST_VERBALIZER_MODEL]
                                   [--wfst_pre_process_model WFST_PRE_PROCESS_MODEL]
                                   [--wfst_post_process_model WFST_POST_PROCESS_MODEL]
                                   [--arpabet_file ARPABET_FILE]
                                   [--phone_dictionary_file PHONE_DICTIONARY_FILE]
                                   [--phone_set PHONE_SET]
                                   [--upper_case_chars UPPER_CASE_CHARS]
                                   [--upper_case_g2p UPPER_CASE_G2P]
                                   [--mel_basis_file_path MEL_BASIS_FILE_PATH]
                                   [--voice_map_file VOICE_MAP_FILE]
                                   [--history_future HISTORY_FUTURE]
                                   [--postprocessor.max_sequence_idle_microseconds POSTPROCESSOR.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                   [--postprocessor.max_batch_size POSTPROCESSOR.MAX_BATCH_SIZE]
                                   [--postprocessor.min_batch_size POSTPROCESSOR.MIN_BATCH_SIZE]
                                   [--postprocessor.opt_batch_size POSTPROCESSOR.OPT_BATCH_SIZE]
                                   [--postprocessor.preferred_batch_size POSTPROCESSOR.PREFERRED_BATCH_SIZE]
                                   [--postprocessor.batching_type POSTPROCESSOR.BATCHING_TYPE]
                                   [--postprocessor.preserve_ordering POSTPROCESSOR.PRESERVE_ORDERING]
                                   [--postprocessor.instance_group_count POSTPROCESSOR.INSTANCE_GROUP_COUNT]
                                   [--postprocessor.max_queue_delay_microseconds POSTPROCESSOR.MAX_QUEUE_DELAY_MICROSECONDS]
                                   [--postprocessor.optimization_graph_level POSTPROCESSOR.OPTIMIZATION_GRAPH_LEVEL]
                                   [--postprocessor.fade_length POSTPROCESSOR.FADE_LENGTH]
                                   [--preprocessor.max_sequence_idle_microseconds PREPROCESSOR.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                   [--preprocessor.max_batch_size PREPROCESSOR.MAX_BATCH_SIZE]
                                   [--preprocessor.min_batch_size PREPROCESSOR.MIN_BATCH_SIZE]
                                   [--preprocessor.opt_batch_size PREPROCESSOR.OPT_BATCH_SIZE]
                                   [--preprocessor.preferred_batch_size PREPROCESSOR.PREFERRED_BATCH_SIZE]
                                   [--preprocessor.batching_type PREPROCESSOR.BATCHING_TYPE]
                                   [--preprocessor.preserve_ordering PREPROCESSOR.PRESERVE_ORDERING]
                                   [--preprocessor.instance_group_count PREPROCESSOR.INSTANCE_GROUP_COUNT]
                                   [--preprocessor.max_queue_delay_microseconds PREPROCESSOR.MAX_QUEUE_DELAY_MICROSECONDS]
                                   [--preprocessor.optimization_graph_level PREPROCESSOR.OPTIMIZATION_GRAPH_LEVEL]
                                   [--preprocessor.mapping_path PREPROCESSOR.MAPPING_PATH]
                                   [--preprocessor.g2p_ignore_ambiguous PREPROCESSOR.G2P_IGNORE_AMBIGUOUS]
                                   [--preprocessor.language PREPROCESSOR.LANGUAGE]
                                   [--preprocessor.max_sequence_length PREPROCESSOR.MAX_SEQUENCE_LENGTH]
                                   [--preprocessor.max_input_length PREPROCESSOR.MAX_INPUT_LENGTH]
                                   [--preprocessor.mapping PREPROCESSOR.MAPPING]
                                   [--preprocessor.tolower PREPROCESSOR.TOLOWER]
                                   [--preprocessor.pad_with_space PREPROCESSOR.PAD_WITH_SPACE]
                                   [--preprocessor.enable_emphasis_tag PREPROCESSOR.ENABLE_EMPHASIS_TAG]
                                   [--preprocessor.start_of_emphasis_token PREPROCESSOR.START_OF_EMPHASIS_TOKEN]
                                   [--preprocessor.end_of_emphasis_token PREPROCESSOR.END_OF_EMPHASIS_TOKEN]
                                   [--encoderFastPitch.max_sequence_idle_microseconds ENCODERFASTPITCH.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                   [--encoderFastPitch.max_batch_size ENCODERFASTPITCH.MAX_BATCH_SIZE]
                                   [--encoderFastPitch.min_batch_size ENCODERFASTPITCH.MIN_BATCH_SIZE]
                                   [--encoderFastPitch.opt_batch_size ENCODERFASTPITCH.OPT_BATCH_SIZE]
                                   [--encoderFastPitch.preferred_batch_size ENCODERFASTPITCH.PREFERRED_BATCH_SIZE]
                                   [--encoderFastPitch.batching_type ENCODERFASTPITCH.BATCHING_TYPE]
                                   [--encoderFastPitch.preserve_ordering ENCODERFASTPITCH.PRESERVE_ORDERING]
                                   [--encoderFastPitch.instance_group_count ENCODERFASTPITCH.INSTANCE_GROUP_COUNT]
                                   [--encoderFastPitch.max_queue_delay_microseconds ENCODERFASTPITCH.MAX_QUEUE_DELAY_MICROSECONDS]
                                   [--encoderFastPitch.optimization_graph_level ENCODERFASTPITCH.OPTIMIZATION_GRAPH_LEVEL]
                                   [--encoderFastPitch.trt_max_workspace_size ENCODERFASTPITCH.TRT_MAX_WORKSPACE_SIZE]
                                   [--encoderFastPitch.use_onnx_runtime]
                                   [--encoderFastPitch.use_torchscript]
                                   [--encoderFastPitch.use_trt_fp32]
                                   [--encoderFastPitch.fp16_needs_obey_precision_pass]
                                   [--encoderRadTTS.max_sequence_idle_microseconds ENCODERRADTTS.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                   [--encoderRadTTS.max_batch_size ENCODERRADTTS.MAX_BATCH_SIZE]
                                   [--encoderRadTTS.min_batch_size ENCODERRADTTS.MIN_BATCH_SIZE]
                                   [--encoderRadTTS.opt_batch_size ENCODERRADTTS.OPT_BATCH_SIZE]
                                   [--encoderRadTTS.preferred_batch_size ENCODERRADTTS.PREFERRED_BATCH_SIZE]
                                   [--encoderRadTTS.batching_type ENCODERRADTTS.BATCHING_TYPE]
                                   [--encoderRadTTS.preserve_ordering ENCODERRADTTS.PRESERVE_ORDERING]
                                   [--encoderRadTTS.instance_group_count ENCODERRADTTS.INSTANCE_GROUP_COUNT]
                                   [--encoderRadTTS.max_queue_delay_microseconds ENCODERRADTTS.MAX_QUEUE_DELAY_MICROSECONDS]
                                   [--encoderRadTTS.optimization_graph_level ENCODERRADTTS.OPTIMIZATION_GRAPH_LEVEL]
                                   [--encoderRadTTS.trt_max_workspace_size ENCODERRADTTS.TRT_MAX_WORKSPACE_SIZE]
                                   [--encoderRadTTS.use_onnx_runtime]
                                   [--encoderRadTTS.use_torchscript]
                                   [--encoderRadTTS.use_trt_fp32]
                                   [--encoderRadTTS.fp16_needs_obey_precision_pass]
                                   [--encoderPflow.max_sequence_idle_microseconds ENCODERPFLOW.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                   [--encoderPflow.max_batch_size ENCODERPFLOW.MAX_BATCH_SIZE]
                                   [--encoderPflow.min_batch_size ENCODERPFLOW.MIN_BATCH_SIZE]
                                   [--encoderPflow.opt_batch_size ENCODERPFLOW.OPT_BATCH_SIZE]
                                   [--encoderPflow.preferred_batch_size ENCODERPFLOW.PREFERRED_BATCH_SIZE]
                                   [--encoderPflow.batching_type ENCODERPFLOW.BATCHING_TYPE]
                                   [--encoderPflow.preserve_ordering ENCODERPFLOW.PRESERVE_ORDERING]
                                   [--encoderPflow.instance_group_count ENCODERPFLOW.INSTANCE_GROUP_COUNT]
                                   [--encoderPflow.max_queue_delay_microseconds ENCODERPFLOW.MAX_QUEUE_DELAY_MICROSECONDS]
                                   [--encoderPflow.optimization_graph_level ENCODERPFLOW.OPTIMIZATION_GRAPH_LEVEL]
                                   [--encoderPflow.trt_max_workspace_size ENCODERPFLOW.TRT_MAX_WORKSPACE_SIZE]
                                   [--encoderPflow.use_onnx_runtime]
                                   [--encoderPflow.use_torchscript]
                                   [--encoderPflow.use_trt_fp32]
                                   [--encoderPflow.fp16_needs_obey_precision_pass]
                                   [--t5tts.max_sequence_idle_microseconds T5TTS.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                   [--t5tts.max_batch_size T5TTS.MAX_BATCH_SIZE]
                                   [--t5tts.min_batch_size T5TTS.MIN_BATCH_SIZE]
                                   [--t5tts.opt_batch_size T5TTS.OPT_BATCH_SIZE]
                                   [--t5tts.preferred_batch_size T5TTS.PREFERRED_BATCH_SIZE]
                                   [--t5tts.batching_type T5TTS.BATCHING_TYPE]
                                   [--t5tts.preserve_ordering T5TTS.PRESERVE_ORDERING]
                                   [--t5tts.instance_group_count T5TTS.INSTANCE_GROUP_COUNT]
                                   [--t5tts.max_queue_delay_microseconds T5TTS.MAX_QUEUE_DELAY_MICROSECONDS]
                                   [--t5tts.optimization_graph_level T5TTS.OPTIMIZATION_GRAPH_LEVEL]
                                   [--t5tts.chunk_ms T5TTS.CHUNK_MS]
                                   [--t5tts.history_future T5TTS.HISTORY_FUTURE]
                                   [--t5tts.fade_ms T5TTS.FADE_MS]
                                   [--chunkerFastPitch.max_sequence_idle_microseconds CHUNKERFASTPITCH.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                   [--chunkerFastPitch.max_batch_size CHUNKERFASTPITCH.MAX_BATCH_SIZE]
                                   [--chunkerFastPitch.min_batch_size CHUNKERFASTPITCH.MIN_BATCH_SIZE]
                                   [--chunkerFastPitch.opt_batch_size CHUNKERFASTPITCH.OPT_BATCH_SIZE]
                                   [--chunkerFastPitch.preferred_batch_size CHUNKERFASTPITCH.PREFERRED_BATCH_SIZE]
                                   [--chunkerFastPitch.batching_type CHUNKERFASTPITCH.BATCHING_TYPE]
                                   [--chunkerFastPitch.preserve_ordering CHUNKERFASTPITCH.PRESERVE_ORDERING]
                                   [--chunkerFastPitch.instance_group_count CHUNKERFASTPITCH.INSTANCE_GROUP_COUNT]
                                   [--chunkerFastPitch.max_queue_delay_microseconds CHUNKERFASTPITCH.MAX_QUEUE_DELAY_MICROSECONDS]
                                   [--chunkerFastPitch.optimization_graph_level CHUNKERFASTPITCH.OPTIMIZATION_GRAPH_LEVEL]
                                   [--hifigan.max_sequence_idle_microseconds HIFIGAN.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                   [--hifigan.max_batch_size HIFIGAN.MAX_BATCH_SIZE]
                                   [--hifigan.min_batch_size HIFIGAN.MIN_BATCH_SIZE]
                                   [--hifigan.opt_batch_size HIFIGAN.OPT_BATCH_SIZE]
                                   [--hifigan.preferred_batch_size HIFIGAN.PREFERRED_BATCH_SIZE]
                                   [--hifigan.batching_type HIFIGAN.BATCHING_TYPE]
                                   [--hifigan.preserve_ordering HIFIGAN.PRESERVE_ORDERING]
                                   [--hifigan.instance_group_count HIFIGAN.INSTANCE_GROUP_COUNT]
                                   [--hifigan.max_queue_delay_microseconds HIFIGAN.MAX_QUEUE_DELAY_MICROSECONDS]
                                   [--hifigan.optimization_graph_level HIFIGAN.OPTIMIZATION_GRAPH_LEVEL]
                                   [--hifigan.trt_max_workspace_size HIFIGAN.TRT_MAX_WORKSPACE_SIZE]
                                   [--hifigan.use_onnx_runtime]
                                   [--hifigan.use_torchscript]
                                   [--hifigan.use_trt_fp32]
                                   [--hifigan.fp16_needs_obey_precision_pass]
                                   [--neuralg2p.max_sequence_idle_microseconds NEURALG2P.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                   [--neuralg2p.max_batch_size NEURALG2P.MAX_BATCH_SIZE]
                                   [--neuralg2p.min_batch_size NEURALG2P.MIN_BATCH_SIZE]
                                   [--neuralg2p.opt_batch_size NEURALG2P.OPT_BATCH_SIZE]
                                   [--neuralg2p.preferred_batch_size NEURALG2P.PREFERRED_BATCH_SIZE]
                                   [--neuralg2p.batching_type NEURALG2P.BATCHING_TYPE]
                                   [--neuralg2p.preserve_ordering NEURALG2P.PRESERVE_ORDERING]
                                   [--neuralg2p.instance_group_count NEURALG2P.INSTANCE_GROUP_COUNT]
                                   [--neuralg2p.max_queue_delay_microseconds NEURALG2P.MAX_QUEUE_DELAY_MICROSECONDS]
                                   [--neuralg2p.optimization_graph_level NEURALG2P.OPTIMIZATION_GRAPH_LEVEL]
                                   [--neuralg2p.trt_max_workspace_size NEURALG2P.TRT_MAX_WORKSPACE_SIZE]
                                   [--neuralg2p.use_onnx_runtime]
                                   [--neuralg2p.use_torchscript]
                                   [--neuralg2p.use_trt_fp32]
                                   [--neuralg2p.fp16_needs_obey_precision_pass]
                                   output_path source_path [source_path ...]

Generate a Riva Model from a speech_synthesis model trained with NVIDIA NeMo.

positional arguments:
  output_path           Location to write compiled Riva pipeline
  source_path           Source file(s)

options:
  -h, --help            show this help message and exit
  -f, --force           Overwrite existing artifacts if they exist
  -v, --verbose         Verbose log outputs
  --language_code LANGUAGE_CODE
                        Language of the model
  --instance_group_count INSTANCE_GROUP_COUNT
                        How many instances in a group
  --kind KIND           Backend runs on CPU or GPU
  --max_batch_size MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --max_queue_delay_microseconds MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --batching_type BATCHING_TYPE
  --voice_name VOICE_NAME
                        Set the voice name for speech synthesis
  --num_speakers NUM_SPEAKERS
                        Number of unique speakers.
  --subvoices SUBVOICES
                        Comma-separated list of subvoices (no whitespace).
  --sample_rate SAMPLE_RATE
                        Sample rate of the output signal
  --chunk_length CHUNK_LENGTH
                        Chunk length in mel frames to synthesize at one time
  --chunk_ms CHUNK_MS   For T5 TTS only, chunk length (in ms) to synthesize at
                        a time.
  --overlap_length OVERLAP_LENGTH
                        Chunk length in mel frames to overlap neighboring
                        chunks
  --num_mels NUM_MELS   number of mels
  --num_samples_per_frame NUM_SAMPLES_PER_FRAME
                        number of samples per frame
  --abbreviations_file ABBREVIATIONS_FILE
                        Path to file with list of abbreviations and
                        corresponding expansions
  --has_mapping_file HAS_MAPPING_FILE
  --mapping_file MAPPING_FILE
                        Path to phoneme mapping file
  --wfst_tokenizer_model WFST_TOKENIZER_MODEL
                        Sparrowhawk model to use for tokenization and
                        classification, must be in .far format
  --wfst_verbalizer_model WFST_VERBALIZER_MODEL
                        Sparrowhawk model to use for verbalizer, must be in
                        .far format.
  --wfst_pre_process_model WFST_PRE_PROCESS_MODEL
                        Sparrowhawk model to use for pre process, must be in
                        .far format.
  --wfst_post_process_model WFST_POST_PROCESS_MODEL
                        Sparrowhawk model to use for post process, must be in
                        .far format.
  --arpabet_file ARPABET_FILE
                        Path to pronunciation dictionary (deprecated)
  --phone_dictionary_file PHONE_DICTIONARY_FILE
                        Path to pronunciation dictionary
  --phone_set PHONE_SET
                        Phonetic set that the model was trained on. An unset
                        value will attempt to auto-detect the phone set used
                        during training. Supports either "arpabet", "ipa",
                        "none".
  --upper_case_chars UPPER_CASE_CHARS
                        Whether character representations for this model are
                        upper case or lower case.
  --upper_case_g2p UPPER_CASE_G2P
                        Whether character representations for this model are
                        upper case or lower case.
  --mel_basis_file_path MEL_BASIS_FILE_PATH
                        Pre calculated Mel basis file for Audio to Mel
  --voice_map_file VOICE_MAP_FILE
                        Default voice name to filepath map
  --history_future HISTORY_FUTURE
                        Number of Codec Future/History frames

postprocessor:
  --postprocessor.max_sequence_idle_microseconds POSTPROCESSOR.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --postprocessor.max_batch_size POSTPROCESSOR.MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --postprocessor.min_batch_size POSTPROCESSOR.MIN_BATCH_SIZE
  --postprocessor.opt_batch_size POSTPROCESSOR.OPT_BATCH_SIZE
  --postprocessor.preferred_batch_size POSTPROCESSOR.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --postprocessor.batching_type POSTPROCESSOR.BATCHING_TYPE
  --postprocessor.preserve_ordering POSTPROCESSOR.PRESERVE_ORDERING
                        Preserve ordering
  --postprocessor.instance_group_count POSTPROCESSOR.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --postprocessor.max_queue_delay_microseconds POSTPROCESSOR.MAX_QUEUE_DELAY_MICROSECONDS
                        max queue delta in microseconds
  --postprocessor.optimization_graph_level POSTPROCESSOR.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --postprocessor.fade_length POSTPROCESSOR.FADE_LENGTH
                        Cross fade length in samples used in between audio
                        chunks

preprocessor:
  --preprocessor.max_sequence_idle_microseconds PREPROCESSOR.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --preprocessor.max_batch_size PREPROCESSOR.MAX_BATCH_SIZE
                        Use Batched Forward calls
  --preprocessor.min_batch_size PREPROCESSOR.MIN_BATCH_SIZE
  --preprocessor.opt_batch_size PREPROCESSOR.OPT_BATCH_SIZE
  --preprocessor.preferred_batch_size PREPROCESSOR.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --preprocessor.batching_type PREPROCESSOR.BATCHING_TYPE
  --preprocessor.preserve_ordering PREPROCESSOR.PRESERVE_ORDERING
                        Preserve ordering
  --preprocessor.instance_group_count PREPROCESSOR.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --preprocessor.max_queue_delay_microseconds PREPROCESSOR.MAX_QUEUE_DELAY_MICROSECONDS
                        max queue delta in microseconds
  --preprocessor.optimization_graph_level PREPROCESSOR.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --preprocessor.mapping_path PREPROCESSOR.MAPPING_PATH
  --preprocessor.g2p_ignore_ambiguous PREPROCESSOR.G2P_IGNORE_AMBIGUOUS
  --preprocessor.language PREPROCESSOR.LANGUAGE
  --preprocessor.max_sequence_length PREPROCESSOR.MAX_SEQUENCE_LENGTH
                        maximum length of every emitted sequence
  --preprocessor.max_input_length PREPROCESSOR.MAX_INPUT_LENGTH
                        maximum length of input string
  --preprocessor.mapping PREPROCESSOR.MAPPING
  --preprocessor.tolower PREPROCESSOR.TOLOWER
  --preprocessor.pad_with_space PREPROCESSOR.PAD_WITH_SPACE
  --preprocessor.enable_emphasis_tag PREPROCESSOR.ENABLE_EMPHASIS_TAG
                        Boolean flag that controls if the emphasis tag should
                        be parsed or not during pre-processing
  --preprocessor.start_of_emphasis_token PREPROCESSOR.START_OF_EMPHASIS_TOKEN
                        field to indicate start of emphasis in the given text
  --preprocessor.end_of_emphasis_token PREPROCESSOR.END_OF_EMPHASIS_TOKEN
                        field to indicate end of emphasis in the given text

encoderFastPitch:
  --encoderFastPitch.max_sequence_idle_microseconds ENCODERFASTPITCH.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --encoderFastPitch.max_batch_size ENCODERFASTPITCH.MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --encoderFastPitch.min_batch_size ENCODERFASTPITCH.MIN_BATCH_SIZE
  --encoderFastPitch.opt_batch_size ENCODERFASTPITCH.OPT_BATCH_SIZE
  --encoderFastPitch.preferred_batch_size ENCODERFASTPITCH.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --encoderFastPitch.batching_type ENCODERFASTPITCH.BATCHING_TYPE
  --encoderFastPitch.preserve_ordering ENCODERFASTPITCH.PRESERVE_ORDERING
                        Preserve ordering
  --encoderFastPitch.instance_group_count ENCODERFASTPITCH.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --encoderFastPitch.max_queue_delay_microseconds ENCODERFASTPITCH.MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --encoderFastPitch.optimization_graph_level ENCODERFASTPITCH.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --encoderFastPitch.trt_max_workspace_size ENCODERFASTPITCH.TRT_MAX_WORKSPACE_SIZE
                        Maximum workspace size (in Mb) to use for model export
                        to TensorRT
  --encoderFastPitch.use_onnx_runtime
                        Use ONNX runtime instead of TensorRT
  --encoderFastPitch.use_torchscript
                        Use TorchScript instead of TensorRT
  --encoderFastPitch.use_trt_fp32
                        Use TensorRT engine with FP32 instead of FP16
  --encoderFastPitch.fp16_needs_obey_precision_pass
                        Flag to explicitly mark layers as float when parsing
                        the ONNX network

encoderRadTTS:
  --encoderRadTTS.max_sequence_idle_microseconds ENCODERRADTTS.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --encoderRadTTS.max_batch_size ENCODERRADTTS.MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --encoderRadTTS.min_batch_size ENCODERRADTTS.MIN_BATCH_SIZE
  --encoderRadTTS.opt_batch_size ENCODERRADTTS.OPT_BATCH_SIZE
  --encoderRadTTS.preferred_batch_size ENCODERRADTTS.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --encoderRadTTS.batching_type ENCODERRADTTS.BATCHING_TYPE
  --encoderRadTTS.preserve_ordering ENCODERRADTTS.PRESERVE_ORDERING
                        Preserve ordering
  --encoderRadTTS.instance_group_count ENCODERRADTTS.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --encoderRadTTS.max_queue_delay_microseconds ENCODERRADTTS.MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --encoderRadTTS.optimization_graph_level ENCODERRADTTS.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --encoderRadTTS.trt_max_workspace_size ENCODERRADTTS.TRT_MAX_WORKSPACE_SIZE
                        Maximum workspace size (in Mb) to use for model export
                        to TensorRT
  --encoderRadTTS.use_onnx_runtime
                        Use ONNX runtime instead of TensorRT
  --encoderRadTTS.use_torchscript
                        Use TorchScript instead of TensorRT
  --encoderRadTTS.use_trt_fp32
                        Use TensorRT engine with FP32 instead of FP16
  --encoderRadTTS.fp16_needs_obey_precision_pass
                        Flag to explicitly mark layers as float when parsing
                        the ONNX network

encoderPflow:
  --encoderPflow.max_sequence_idle_microseconds ENCODERPFLOW.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --encoderPflow.max_batch_size ENCODERPFLOW.MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --encoderPflow.min_batch_size ENCODERPFLOW.MIN_BATCH_SIZE
  --encoderPflow.opt_batch_size ENCODERPFLOW.OPT_BATCH_SIZE
  --encoderPflow.preferred_batch_size ENCODERPFLOW.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --encoderPflow.batching_type ENCODERPFLOW.BATCHING_TYPE
  --encoderPflow.preserve_ordering ENCODERPFLOW.PRESERVE_ORDERING
                        Preserve ordering
  --encoderPflow.instance_group_count ENCODERPFLOW.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --encoderPflow.max_queue_delay_microseconds ENCODERPFLOW.MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --encoderPflow.optimization_graph_level ENCODERPFLOW.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --encoderPflow.trt_max_workspace_size ENCODERPFLOW.TRT_MAX_WORKSPACE_SIZE
                        Maximum workspace size (in Mb) to use for model export
                        to TensorRT
  --encoderPflow.use_onnx_runtime
                        Use ONNX runtime instead of TensorRT
  --encoderPflow.use_torchscript
                        Use TorchScript instead of TensorRT
  --encoderPflow.use_trt_fp32
                        Use TensorRT engine with FP32 instead of FP16
  --encoderPflow.fp16_needs_obey_precision_pass
                        Flag to explicitly mark layers as float when parsing
                        the ONNX network

t5tts:
  --t5tts.max_sequence_idle_microseconds T5TTS.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --t5tts.max_batch_size T5TTS.MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --t5tts.min_batch_size T5TTS.MIN_BATCH_SIZE
  --t5tts.opt_batch_size T5TTS.OPT_BATCH_SIZE
  --t5tts.preferred_batch_size T5TTS.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --t5tts.batching_type T5TTS.BATCHING_TYPE
  --t5tts.preserve_ordering T5TTS.PRESERVE_ORDERING
                        Preserve ordering
  --t5tts.instance_group_count T5TTS.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --t5tts.max_queue_delay_microseconds T5TTS.MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --t5tts.optimization_graph_level T5TTS.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --t5tts.chunk_ms T5TTS.CHUNK_MS
                        Chunk size in ms
  --t5tts.history_future T5TTS.HISTORY_FUTURE
                        Number of codec frames to use as history/future
  --t5tts.fade_ms T5TTS.FADE_MS
                        Fade-in/Fade-out for the chunk in ms

chunkerFastPitch:
  --chunkerFastPitch.max_sequence_idle_microseconds CHUNKERFASTPITCH.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --chunkerFastPitch.max_batch_size CHUNKERFASTPITCH.MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --chunkerFastPitch.min_batch_size CHUNKERFASTPITCH.MIN_BATCH_SIZE
  --chunkerFastPitch.opt_batch_size CHUNKERFASTPITCH.OPT_BATCH_SIZE
  --chunkerFastPitch.preferred_batch_size CHUNKERFASTPITCH.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --chunkerFastPitch.batching_type CHUNKERFASTPITCH.BATCHING_TYPE
  --chunkerFastPitch.preserve_ordering CHUNKERFASTPITCH.PRESERVE_ORDERING
                        Preserve ordering
  --chunkerFastPitch.instance_group_count CHUNKERFASTPITCH.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --chunkerFastPitch.max_queue_delay_microseconds CHUNKERFASTPITCH.MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --chunkerFastPitch.optimization_graph_level CHUNKERFASTPITCH.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration

hifigan:
  --hifigan.max_sequence_idle_microseconds HIFIGAN.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --hifigan.max_batch_size HIFIGAN.MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --hifigan.min_batch_size HIFIGAN.MIN_BATCH_SIZE
  --hifigan.opt_batch_size HIFIGAN.OPT_BATCH_SIZE
  --hifigan.preferred_batch_size HIFIGAN.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --hifigan.batching_type HIFIGAN.BATCHING_TYPE
  --hifigan.preserve_ordering HIFIGAN.PRESERVE_ORDERING
                        Preserve ordering
  --hifigan.instance_group_count HIFIGAN.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --hifigan.max_queue_delay_microseconds HIFIGAN.MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --hifigan.optimization_graph_level HIFIGAN.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --hifigan.trt_max_workspace_size HIFIGAN.TRT_MAX_WORKSPACE_SIZE
                        Maximum workspace size (in Mb) to use for model export
                        to TensorRT
  --hifigan.use_onnx_runtime
                        Use ONNX runtime instead of TensorRT
  --hifigan.use_torchscript
                        Use TorchScript instead of TensorRT
  --hifigan.use_trt_fp32
                        Use TensorRT engine with FP32 instead of FP16
  --hifigan.fp16_needs_obey_precision_pass
                        Flag to explicitly mark layers as float when parsing
                        the ONNX network

neuralg2p:
  --neuralg2p.max_sequence_idle_microseconds NEURALG2P.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --neuralg2p.max_batch_size NEURALG2P.MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --neuralg2p.min_batch_size NEURALG2P.MIN_BATCH_SIZE
  --neuralg2p.opt_batch_size NEURALG2P.OPT_BATCH_SIZE
  --neuralg2p.preferred_batch_size NEURALG2P.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --neuralg2p.batching_type NEURALG2P.BATCHING_TYPE
  --neuralg2p.preserve_ordering NEURALG2P.PRESERVE_ORDERING
                        Preserve ordering
  --neuralg2p.instance_group_count NEURALG2P.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --neuralg2p.max_queue_delay_microseconds NEURALG2P.MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --neuralg2p.optimization_graph_level NEURALG2P.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --neuralg2p.trt_max_workspace_size NEURALG2P.TRT_MAX_WORKSPACE_SIZE
                        Maximum workspace size (in Mb) to use for model export
                        to TensorRT
  --neuralg2p.use_onnx_runtime
                        Use ONNX runtime instead of TensorRT
  --neuralg2p.use_torchscript
                        Use TorchScript instead of TensorRT
  --neuralg2p.use_trt_fp32
                        Use TensorRT engine with FP32 instead of FP16
  --neuralg2p.fp16_needs_obey_precision_pass
                        Flag to explicitly mark layers as float when parsing
                        the ONNX network

NVIDIA Riva

自定义模型

目录

自定义模型#

模型部署#

创建 Riva 文件#

自定义#

自定义发音#

多说话人模型#

自定义语音#

自定义文本规范化#

Riva-build 管道说明#

FastPitch 和 HiFi-GAN#

预训练快速入门管道#

Riva-build 可选参数#