SpeechSquad#

本地部署#

此 SpeechSquad 示例应用程序可以在本地部署。无需负载均衡。ASR、NLP 和 TTS Riva 服务需要正在运行。

安装#

在本地安装并启动 Riva。请参阅本地 (Docker) 部分。

拉取 SpeechSquad 容器。

docker pull nvcr.io/nvidia/jarvis/speech_squad:1.0.0-b.1

下载数据集。

wget https://github.com/NVIDIA/speechsquad/releases/download/v1.0.0-b.1/speechsquad_sample_public_v1.tgz
tar xzf speechsquad_sample_public_v1.tgz

本地运行测试#

启动 Riva Speech AI 服务器（如果尚未启动）

bash riva_start.sh

启动 SpeechSquad 服务器

docker run -it --net=host \
      nvcr.io/nvidia/jarvis/speech_squad:1.0.0-b.1 \
      speechsquad_server  \
         -tts_service_url=0.0.0.0:50051 \
         -nlp_service_url=0.0.0.0:50051 \
         -asr_service_url=0.0.0.0:50051

运行性能测试

docker run -it --net=host \
      -v $(pwd)/speechsquad_sample_public_v1:/work/test_files/speech_squad/ \
      nvcr.io/nvidia/jarvis/speech_squad:1.0.0-b.1 \
      speechsquad_perf_client \
         --squad_questions_json=/work/test_files/speech_squad/recorded_questions.jl \
         --squad_dataset_json=/work/test_files/speech_squad/manifest.json \
         --speech_squad_uri=0.0.0.0:1337 \
         --chunk_duration_ms=800 \
         --executor_count=1 \
         --num_iterations=1 \
         --num_parallel_requests=64 \
         --print_results=false

云部署#

此 SpeechSquad 示例应用程序需要完全设置好的 Riva 环境，并具有正常工作的 L7 负载均衡和名称解析。为了最好地验证扩展，我们建议为每种服务类型进行不同的安装。为此，需要三次不同的 Helm 安装调用。这需要在目标系统上至少三个 GPU。为了扩展每项服务，需要更多的 GPU。

SpeechSquad 也可能针对在单个 GPU 上运行所有三个服务的 Riva Speech AI 系统运行，但不要期望获得最佳性能。

假设 values.yaml 已更新为所有其他配置详细信息

要设置仅 NLP 服务，请运行

 helm install nlp-riva-api riva-api \
    --riva.speechServices.asr=false \
    --riva.speechServices.tts=false

要设置仅 ASR 服务，请运行

helm install asr-riva-api riva-api \
    --riva.speechServices.nlp=false \
    --riva.speechServices.tts=false

要设置仅 TTS 服务，请运行

helm install tts-riva-api riva-api \
   --riva.speechServices.asr=false \
   --riva.speechServices.nlp=false``

然后，可以使用 kubectl 按正常方式独立扩展每个部署。

安装#

至少，SpeechSquad 服务器容器 (sss) 希望能够路由到 Riva Speech AI 容器。对于任何规模的测试，由于使用了第 7 层负载均衡器，因此名称解析需要正常工作。许多问题可以通过首先验证容器路由，然后验证 pod 之间的名称解析来解决。

从概念上讲，成功的 SpeechSquad 部署具有每种 pod 类型的一个或多个。每个 pod 都需要与其周围的 pod 建立正确的连接（路由和可选的名称解析）。

+---------------+     +----------------+     +---------------+
|               |     |                |     |               |
|Riva Speech AI | --- | SpeechSquad    | --- |  SpeechSquad  |
|               |     |    server      |     |     client    |
+---------------+     +----------------+     +---------------+

为了简化故障排除，除了检查每个 pod 是否正确运行后再继续下一个 pod 之外，还要确保

Riva 在 SpeechSquad 安装之前正在运行并响应客户端查询。
SpeechSquad 服务器启动并连接到 Riva（使用 IP:PORT 或 FQDN:PORT）。
SpeechSquad 客户端连接到 sss（使用适当的 IP:PORT 或 FQDN:PORT）。

获取 NGC 上托管的 Helm chart。如果需要，请确保添加身份验证选项。
```
helm fetch https://helm.ngc.nvidia.com/nvidia/riva/charts/speechsquad-1.0.0-b.1.tgz
```
或者，可以从 github.com/nvidia/speechsquad 拉取 git 存储库。
更新 values.yaml 以匹配此安装的完全限定域名 (fqdn)，如果使用名称解析。SpeechSquad 也应该为其服务器设置一个 fqdn（例如，speechsquad.riva.nvda）。SpeechSquad 服务器期望每个 TTS、NLP 和 ASR 服务的端点。例如，默认情况下，values.yaml 包含以下内容
- nlp_uri: "riva.nvda"
- asr_uri: "riva.nvda"
- tts_uri: "riva.nvda"

如果 SpeechSquad 部署在单个系统上，则可以将这些 URI 替换为 IP 地址。

默认情况下，使用的 ASR 模型通过将 --asr_model_name="" 传递给 SpeechSquad 服务器来定义。这可以在 values.yaml 文件中使用键 sss.asr_model 来控制。

注意

如果未指定此模型，则 Riva Speech AI 服务器会尝试从其模型注册表中选择要使用的模型。如果指定了无效的模型，则请求将失败。

对于负载均衡，要求 SpeechSquad 服务器容器可以将 riva.nvda 解析为公开 riva-speech 的端点。该 chart 允许通过将 .Values.sss.lb_ip: 值设置为与 riva.nvda 应解析到的值匹配，在启动时设置容器中的 /etc/hosts。

在以下示例中，我们看到负载均衡器的外部 IP 为 10.42.0.190。由于我们想要负载均衡，请将 SpeechSquad 的 values.yaml 文件中的 .Values.sss.lb_ip 设置为与此 ip 匹配。

$ kubectl get services
NAME           TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                                                        AGE
riva-api     LoadBalancer   10.97.16.15      10.42.0.191   8000:31308/TCP,8001:30646/TCP,8002:30042/TCP,50051:32669/TCP   3d19h
kubernetes     ClusterIP      10.96.0.1        <none>        443/TCP                                                        50d
traefik        LoadBalancer   10.107.200.114   10.42.0.190   80:32131/TCP,443:31802/TCP

在 values.yaml 正确反映环境后，安装 Helm chart，并可选择传递任何命令行选项以进行进一步自定义。

helm install speechsquad speechsquad

这应该在集群中提供两个 pod，一个用于服务器 (sss)，另一个用于 clnt。

kubectl get pods

NAME                           READY   STATUS    RESTARTS   AGE
clnt-ss-5945877dc7-wk9fs       1/1     Running   0          5d16h
riva-api-6947945c67-4f7gt    1/1     Running   0          7d12h
speechsquad-6974455879-spxgt   1/1     Running   0          5d13h

验证您是否已正确设置 ingressroutes。

kubectl get ingressroute

NAME                        AGE
riva-ingressroute         7d12h
speech-squad-ingressroute   5d16h
traefik-dashboard           23d

发出 describe 以确保出现正确的 Host() 子句。Entry Points 是负载均衡器接受流量的端口。在这里，我们使用 web，即端口 80；services 部分将匹配主机子句的流量转发到指定端口上使用指定协议的服务。

如果使用名称解析，请使用 kubectl describe ingressroute 以确保出现正确的 Host() 子句。Entry Points 是负载均衡器接受流量的端口。在这里，我们使用 web，即端口 80；services 部分将匹配主机子句的流量转发到指定端口上使用指定协议的服务。

kubectl describe ingressroute riva-ingressroute

...
    web
Routes:
  Kind:   Rule
  Match:  Host(`riva.nvda`)
  Services:
    Name:    riva-api
    Port:    50051
    Scheme:  h2c

和

kubectl describe ingressroute speech-squad-ingressroute

  Entry Points:
  web
Routes:
  Kind:   Rule
  Match:  Host(`speechsquad.riva.nvda`, `speechsquad.riva.nvda.nvidia.com`)
  Services:
    Name:    speech-squad
    Port:    1337
    Scheme:  h2c

还可以通过从路由中拉取服务名称并检查其是否存在来验证每个 ingressroute 的服务。

   kubectl get service `kubectl get ingressroute speech-squad-ingressroute -o=json | jq .spec.routes[0].services[0].name -r`

容器#

使用以下命令拉取容器

docker pull nvcr.io/nvidia/jarvis/speech_squad:1.0.0-b.1

数据集#

我们提供了一个包含五个示例的玩具数据集，可以迭代任意次数以生成负载。该数据随官方 SpeechSquad GitHub 版本一起提供。

sudo mount -t nfs 10.31.241.13:/mnt/tank/datasets/jarvis_speech_ci/ /mnt/nvdl/datasets/jarvis_speech_ci/
cd /work/test_files/speech_squad/gtc_squad2_asr_data_collection/
rsync -Phrl /mnt/nvdl/datasets/jarvis_speech_ci/gtc_squad2_asr_data_collection/ .

### Running the Test

You can run the client from a Docker container or by `kubectl exec` into the node in the cluster.

1. To execute the client, ensure the dataset above is in your current working directory.

   ```
   ls
   build    client           CREDITS.md  speechsquad_sample_public_v1      LICENSE    reference
   CLA.pdf  CONTRIBUTING.md  Dockerfile  speechsquad_sample_public_v1.tgz  README.md  server
   ```

2. Run the following command to perform the test from the same directory.

   ```bash

   docker run -it --net=host -v $(pwd):/work/test_files/speech_squad/ nvcr.io/nvidia/jarvis/speech_squad:1.0.0-b.1 \
      speechsquad_perf_client \
         --squad_questions_json=/work/test_files/speech_squad/speechsquad_sample_public_v1/recorded_questions.jl \
         --squad_dataset_json=/work/test_files/speech_squad/speechsquad_sample_public_v1/manifest.json \
         --speech_squad_uri=speechsquad.riva.nvda:80 \
         --chunk_duration_ms=800 --executor_count=1 \
         --num_iterations=1 --num_parallel_requests=64 \
         --print_results=false
   ```

   If using `kubectl exec`, the client node must have access to the data volume with the `speechsquad_sample_public_v1` in it.
   This is currently hardcoded to `/sss_data`, so that the `speechsquad_sample_public_v1` directory needs to live in `/sss_data`
   on the Kubernetes host, which is also running the client container pod. This is controlled in the `deployment.yaml` file under
   *Volumes* and *VolumeMounts*.

   ```bash

   export CLIENT_POD=$(kubectl get pods | grep clnt  | awk '{print $1}')
   kubectl exec --stdin --tty $CLIENT_POD -- /bin/bash
   speechsquad_perf_client \
      --squad_questions_json=/work/test_files/speech_squad/speechsquad_sample_public_v1/recorded_questions.jl \
      --squad_dataset_json=/work/test_files/speech_squad/speechsquad_sample_public_v1/manifest.json \
      --speech_squad_uri=speechsquad.riva.nvda:80 \
      --chunk_duration_ms=800 --executor_count=1 \
      --num_iterations=1 --num_parallel_requests=64 --print_results=false
   ```

## License

For applicable licenses, refer to the {ref}`license` section.

NVIDIA Riva

SpeechSquad

目录

SpeechSquad#

本地部署#

安装#

本地运行测试#

云部署#

安装#

容器#

数据集#