TTS 部署#

本教程解释了生成 TTS RMIR（Riva 模型中间表示）的过程。RMIR 是一个中间文件，其中包含部署 Riva 服务所需的所有必要工件（模型、文件、配置和用户设置）。

学习目标#

在本教程中，您将学习如何：

使用 Riva ServiceMaker 获取两个 .riva 文件，并将其转换为 .rmir，用于 AMD64（数据中心，86_64）或 ARM64（嵌入式，AArch64）机器。
- 对于拥有 .nemo 文件的用户，可以使用 nemo2riva 从 .nemo 检查点生成 .riva 文件。
在本地 Riva 服务器上启动和部署 .rmir。
使用 Riva API 绑定从演示客户端发送推理请求。

先决条件#

要使用本教程，请确保您：

通过 NGC 命令行界面 (CLI) 访问 NGC。

Riva ServiceMaker#

ServiceMaker 是一组工具，用于聚合 Riva 部署到目标环境所需的所有必要工件（模型、文件、配置和用户设置）。它有两个主要组件：

riva-build
riva-deploy

第一步是 riva-build，它可以在数据中心或嵌入式机器上运行，以构建 .rmir 文件。

第二步是 riva-deploy，它应该在要提供 Riva 服务器的机器上运行。

如果您正在数据中心机器上构建 .rmir 文件以用于嵌入式部署，请按照本教程操作至 Riva-build 部分（包括该部分）。将构建的 .rmir 复制到目标嵌入式机器，运行设置配置和参数部分，然后继续 Riva-deploy 部分。

Riva-build#

此步骤有助于构建 Riva 就绪的模型版本。它唯一的输出是一种中间格式（称为 Riva 模型中间表示 (.rmir)），用于 Riva 中支持的服务的端到端管道。让我们考虑两个 TTS 模型：

FastPitch（频谱图生成器）
HiFi-GAN（声码器）。

riva-build 负责将一个或多个导出的模型（.riva 文件）组合成一个包含中间格式（称为 .rmir）的单个文件。此文件包含整个端到端管道的部署无关规范，以及最终部署和推理所需的所有资产。有关更多信息，请参阅 Riva 文档。

Riva-deploy#

部署工具将一个或多个 .rmir 文件和一个目标模型存储库目录作为输入。它创建一个集合配置，指定执行管道，并最终将所有这些资产写入输出模型存储库目录。

设置配置和参数#

更新以下代码块中的参数：

machine_type：运行本教程的机器类型。可接受的值为 AMD64、ARM64_linux、ARM64_l4t。默认为 AMD64。
target_machine：RMIR 将部署到的机器类型。可接受的值为 AMD64、ARM64_linux、ARM64_l4t。默认为 AMD64。
acoustic_model：声学模型 .riva 文件的完整路径。默认为 None。可以使用自定义声学模型 .riva 检查点替换它。
vocoder：声码器 .riva 文件的完整路径。默认为 None。可以使用自定义声码器 .riva 检查点替换它。
out_dir：用于放置 TTS.rmir 文件的目录。RMIR 将放置在 ${out_dir}/RMIR/RMIR_NAME.rmir 中。默认为 $pwd/out。
voice：设置模型的语音名称。默认为 "test"。
key：这是 nemo2riva 中使用的加密密钥。相同的密钥将用于部署本教程中生成的 RMIR。默认为 tlt_encode。
use_ipa：如果模型使用 IPA 音素，则设置为 "y" 或 "Y"；如果模型使用 ARPAbet，则设置为 "no"。默认为 "yes"。
lang：模型语言。这仅用于客户端，对生成的语音没有影响。默认为 "en-US"。
sample_rate：生成的音频的采样率，单位为 Hz。默认为 44100。
num_speakers：模型中的说话人数量。默认为 2，NGC 示例模型中的说话人数量。

import pathlib
import logging
import warnings

from version import __riva_version__

machine_type="AMD64" #Change this to `ARM64_linux` or `ARM64_l4t` in case of an ARM64 machine.
target_machine="AMD64" #Change this to `ARM64_linux` or `ARM64_l4t` in case of an ARM64 machine.
acoustic_model = None ##acoustic_model .riva location
vocoder = None ##vocoder .riva location
out_dir = pathlib.Path.cwd() / "out" ##Output directory to store the generated RMIR. The RMIR will be placed in `${out_dir}/RMIR/RMIR_NAME.rmir`.
voice = "test" ##Voice name
key = "tlt_encode" ##Encryption key used during nemo2riva
use_ipa = "yes" ##`"y"` or `"Y"` if the model uses `ipa`, no otherwise.
lang = "en-US" ##Language
sample_rate = 44100 ##Sample rate of the audios
num_speakers = 2 ## Number of speakers

riva_aux_files = None ##Riva model repo path. In the case of a custom model repo, change this to the full path of the custom Riva model repo.
riva_tn_files = None ##Riva model repo path. In the case of a custom model repo, change this to the full path of the custom Riva model repo.

## Riva NGC, docker image config.
if machine_type.lower() in ["amd64", "arm64_linux"]:
    riva_init_image = f"nvcr.io/nvidia/riva/riva-speech:{__riva_version__}"
elif machine_type.lower()=="arm64_l4t":
    riva_init_image = f"nvcr.io/nvidia/riva/riva-speech:{__riva_version__}-l4t-aarch64"
rmir_dir = out_dir / "rmir"

if not out_dir.exists():
    out_dir.mkdir()
if not rmir_dir.exists():
    rmir_dir.mkdir()

def ngc_download_and_get_dir(ngc_resource_name, var, var_name, resource_type="model"):
    default_download_folder = "_v".join(ngc_resource_name.split("/")[-1].split(":"))
    !rm -rf ./riva_artifacts/{default_download_folder}
    ngc_output = !ngc registry {resource_type} download-version {ngc_resource_name} --dest riva_artifacts
    output = pathlib.Path(f"./riva_artifacts/{default_download_folder}")
    if not output.exists():
        ngc_output_formatted='\n'.join(ngc_output)
        logging.error(
            f"NGC was not able to download the requested model {ngc_resource_name}. "
            "Please check the NGC error message, removed all directories, and re-start the "
            f"notebook. NGC message: {ngc_output_formatted}"
        )
        return None
    if "model" in resource_type:
        riva_files_in_dir = list(output.glob("*.riva"))
        if len(riva_files_in_dir) > 0:
            output = riva_files_in_dir[0]
    if output is not None and var is not None:
        warnings.warn(
            f"`{var_name}` had a non-default value of `{var}`. `{var_name}` will be updated to `{var}`"
        )
    return output

下载模型#

以下代码块将下载默认的 NGC 模型：FastPitch 和 HiFi-GAN。它们将被下载到名为 riva_artifacts 的文件夹中。如果当前文件夹已存在，则将其删除。

对于自定义模型，可以跳过代码块。

riva_ngc_artifacts = pathlib.Path.cwd() / "riva_artifacts"
if not riva_ngc_artifacts.exists():
    riva_ngc_artifacts.mkdir()

acoustic_model = ngc_download_and_get_dir("nvidia/riva/speechsynthesis_en_us_fastpitch_ipa:deployable_v1.0", acoustic_model, "acoustic_model")
vocoder = ngc_download_and_get_dir("nvidia/riva/speechsynthesis_en_us_hifigan_ipa:deployable_v1.0", vocoder, "vocoder")

以下代码块将下载一些用于部署的其他 TTS 文件。这将包括以下文件：

ARPAbet 字典文件
IPA 字典文件
缩写映射文件
两个文本规范化 (TN) 文件
- tokenize_and_classify.far
- verbalize.far

riva_aux_files = ngc_download_and_get_dir("nvidia/riva/speechsynthesis_en_us_auxiliary_files:deployable_v1.3", riva_aux_files, "riva_aux_files")
riva_tn_files = ngc_download_and_get_dir("nvidia/riva/normalization_en_us:deployable_v1.1", riva_tn_files, "riva_tn_files")

运行 riva-build#

停止运行 Docker，运行 riva_server，然后使用必要的路径再次运行。

##Run the riva servicemaker.
!docker stop riva_rmir_gen &> /dev/null
!set -x && docker run -td --gpus all --rm -v {str(riva_aux_files.resolve())}:/riva_aux \
            -v {str(acoustic_model.parent.resolve())}/:/synt \
            -v {str(vocoder.parent.resolve())}:/voc -v {str(riva_tn_files.resolve())}:/riva_tn \
            -v {str(rmir_dir.resolve())}:/data --name riva_rmir_gen --entrypoint="/bin/bash" {riva_init_image}

在 riva-build 中使用 --force 标记将替换任何现有的 RMIR。

warnings.warn("Using --force in riva-build will replace any existing RMIR.")
riva_build=(
    f"riva-build speech_synthesis --force --voice_name={voice} --language_code={lang} "
    f"--sample_rate={sample_rate} /data/FastPitch_HifiGan.rmir:{key} /synt/{str(acoustic_model.name)}:{key} "
    f"/voc/{str(vocoder.name)}:{key} --abbreviations_file=/riva_aux/abbr.txt "
    f"--wfst_tokenizer_model=/riva_tn/tokenize_and_classify.far --wfst_verbalizer_model=/riva_tn/verbalize.far"
)
if target_machine=="arm":
    riva_build += """--max_batch_size 1 --postprocessor.max_batch_size 1 --preprocessor.max_batch_size 1 \
                --encoderFastPitch.max_batch_size 1 --chunkerFastPitch.max_batch_size 1 --hifigan.max_batch_size 1"""
if use_ipa.lower() in ["y", "yes"]:
    riva_build+=" --phone_set=ipa --phone_dictionary_file=/riva_aux/ipa_cmudict-0.7b_nv22.08.txt --upper_case_chars=True"
else:
    riva_build+=" --phone_set=arpabet --phone_dictionary_file=/riva_aux/cmudict-0.7b_nv22.08"
if num_speakers > 1:
    riva_build+=f" --num_speakers={num_speakers}"
    riva_build+=" --subvoices " + ",".join([f"{i}:{i}" for i in range(num_speakers)])
print(riva_build)

执行 riva build 命令并停止 riva_server 容器。

!docker exec riva_rmir_gen {riva_build}
!docker stop riva_rmir_gen

运行 riva-deploy#

到目前为止，在本教程中，我们学习了如何从 .riva 文件生成 RMIR 文件。我们将看到 FastPitch_HifiGan.rmir 已在我们先前定义的 ${out_dir}/rmir 位置生成。

可以使用 riva_quickstart 部署本教程中生成的 RMIR 文件。

部署 RMIR 的步骤#

下载 Riva 快速入门资源
打开 config.sh 并更新以下参数：
- 将 service_enabled_asr 设置为 false。
- 将 service_enabled_nlp 设置为 false。
- 将 service_enabled_tts 设置为 true。
- riva_model_loc 设置为您的 out_dir 位置。
- 将 use_existing_rmirs 设置为 true。
运行 riva_init.sh。
运行 riva_start.sh。

让我们从 NGC 下载 Riva 快速入门资源。

if target_machine.lower() in ["amd64", "arm64_linux"]:
    quickstart_link = f"nvidia/riva/riva_quickstart:{__riva_version__}"
else:
    quickstart_link = f"nvidia/riva/riva_quickstart_arm64:{__riva_version__}"

quickstart_dir = ngc_download_and_get_dir(quickstart_link, None, None, resource_type="resource")

接下来，我们修改 config.sh 文件以启用相关的 Riva 服务（在本例中为 FastPitch 和 HiFi-GAN 的 TTS），并提供加密密钥和模型存储库路径 (riva_model_loc)，该路径在之前的步骤中生成。

例如，如果上述模型存储库在 ${out_dir}/rmir 生成，那么您可以将 riva_model_loc 指定为与 ${out_dir}/rmir 相同的目录

以下是 config.sh 应有的样子：

config.sh 代码片段#

# Enable or Disable Riva Services 
service_enabled_asr=false                                                      ## MAKE CHANGES HERE  
service_enabled_nlp=false                                                      ## MAKE CHANGES HERE  
service_enabled_tts=true                                                     ## MAKE CHANGES HERE  

# Specify one or more GPUs to use
# specifying more than one GPU is currently an experimental feature, and may result in undefined behaviours.
gpus_to_use="device=0"

# Specify the encryption key to use to deploy models
MODEL_DEPLOY_KEY="tlt_encode"                                                  ## MAKE CHANGES HERE

# Locations to use for storing models artifacts
#
# If an absolute path is specified, the data will be written to that location
# Otherwise, a docker volume will be used (default).
#
# riva_init.sh will create a `rmir` and `models` directory in the volume or
# path specified. 
#
# RMIR ($riva_model_loc/rmir)
# Riva uses an intermediate representation (RMIR) for models
# that are ready to deploy but not yet fully optimized for deployment. Pretrained
# versions can be obtained from NGC (by specifying NGC models below) and will be
# downloaded to $riva_model_loc/rmir by `riva_init.sh`
# 
# Custom models produced by NeMo and prepared using riva-build
# may also be copied manually to this location $(riva_model_loc/rmir).
#
# Models ($riva_model_loc/models)
# During the riva_init process, the RMIR files in $riva_model_loc/rmir
# are inspected and optimized for deployment. The optimized versions are
# stored in $riva_model_loc/models. The riva server exclusively uses these
# optimized versions.
riva_model_loc="<add path>"                              ## MAKE CHANGES HERE (Replace with MODEL_LOC)    

# The default RMIRs are downloaded from NGC by default in the above $riva_rmir_loc directory
# If you'd like to skip the download from NGC and use the existing RMIRs in the $riva_rmir_loc
# then set the below $use_existing_rmirs flag to true. You can also deploy your set of custom
# RMIRs by keeping them in the riva_rmir_loc dir and use this quickstart script with the
# below flag to deploy them all together.
use_existing_rmirs=false                                ## MAKE CHANGES HERE (Set to true)

让我们对 config.sh 进行必要的更改。

with open(f"{quickstart_dir}/config.sh", "r") as config_in:
    config_file = config_in.readlines()

for i, line in enumerate(config_file):
    # Disable services
    if "service_enabled_asr" in line:
        config_file[i] = "service_enabled_asr=false\n"
    elif "service_enabled_nlp" in line:
        config_file[i] = "service_enabled_nlp=false\n"
    elif "service_enabled_nmt" in line:
        config_file[i] = "service_enabled_nmt=false\n"
    elif "service_enabled_tts" in line:
        config_file[i] = "service_enabled_tts=true\n"
    # Update riva_model_loc to our rmir folder
    elif "riva_model_loc" in line:
        config_file[i] = config_file[i].split("riva_model_loc")[0]+f"riva_model_loc={out_dir}\n"
    elif "use_existing_rmirs" in line:
        config_file[i] = "use_existing_rmirs=true\n"
    elif "MODEL_DEPLOY_KEY" in line:
        config_file[i] = f"MODEL_DEPLOY_KEY=\"{key}\"\n"
    elif "fastpitch" in line:
        config_file[i] = f"#{line}"

with open(f"{quickstart_dir}/config.sh", "w") as config_in:
    config_in.writelines(config_file)

print("".join(config_file))

# Ensure you have permission to execute these scripts
! cd {quickstart_dir} && chmod +x ./riva_init.sh && chmod +x ./riva_start.sh && chmod +x ./riva_stop.sh
! cd {quickstart_dir} && ./riva_stop.sh config.sh

# Run `riva_init.sh`. This will fetch the containers/models and run `riva-deploy`.
# YOU CAN SKIP THIS STEP IF YOU DID RIVA DEPLOY
! cd {quickstart_dir} && ./riva_init.sh config.sh

# Run `riva_start.sh`. This will start the Riva server and serve your model.
! cd {quickstart_dir} && ./riva_start.sh config.sh

运行推理#

一旦 Riva 服务器启动并运行您的模型，您就可以发送推理请求来查询服务器。

要发送 gRPC 请求，请为客户端安装 Riva Python API 绑定。

# Install client API bindings
! pip install nvidia-riva-client

连接到 Riva 服务器并运行推理#

现在，我们可以查询 Riva 服务器了；让我们开始吧。以下单元格查询 Riva 服务器（使用 gRPC）以产生结果。

import os
import riva.client
import IPython.display as ipd
import numpy as np

server = "localhost:50051"                # location of riva server
auth = riva.client.Auth(uri=server)
tts_service = riva.client.SpeechSynthesisService(auth)


text = "Is it recognize speech or wreck a nice beach?"
language_code = lang                   # currently required to be "en-US"
sample_rate_hz = sample_rate                    # the desired sample rate
voice_name = voice      # subvoice to generate the audio output.
data_type = np.int16                      # For RIVA version < 1.10.0 please set this to np.float32

resp = tts_service.synthesize(text, voice_name=voice_name, language_code=language_code, sample_rate_hz=sample_rate_hz)
audio = resp.audio
meta = resp.meta
processed_text = meta.processed_text
predicted_durations = meta.predicted_durations

audio_samples = np.frombuffer(resp.audio, dtype=data_type)
print(processed_text)
ipd.Audio(audio_samples, rate=sample_rate_hz)

NVIDIA Riva

TTS 部署

目录