如何使用 Riva NMT API 和开箱即用模型执行语言翻译?#

本教程将引导您了解使用 Riva 神经机器翻译 (NMT) 服务的基础知识,特别是介绍如何使用 Riva NMT API 和开箱即用模型。我们还将介绍如何使用 Riva 的语音转语音 (S2S) 和语音转文本 (S2T) API。

NVIDIA Riva 概述#

NVIDIA Riva 是一个 GPU 加速 SDK,用于构建针对您的用例定制并提供实时性能的语音 AI 应用程序。
Riva 提供了一系列丰富的语音和自然语言理解服务,例如

  • 自动语音识别 (ASR)

  • 文本到语音合成 (TTS)

  • 神经机器翻译 (NMT)

  • 一系列自然语言处理 (NLP) 服务,例如命名实体识别 (NER)、标点符号、意图分类。

在本教程中,我们将与神经机器翻译 (NMT) API 进行交互。我们还将介绍如何使用 Riva 的语音转语音 (S2S) 和语音转文本 (S2T) 服务。

有关 Riva 的更多信息,请参阅 Riva 开发者文档

Riva NMT 语言翻译简介#

Riva 神经机器翻译 (NMT) 是一个基于神经网络的机器翻译框架。NMT 在语言对之间翻译文本,即从一种语言翻译成另一种语言。例如,我们希望机器将一种语言(我们称之为源语言)的文本翻译成另一种语言(我们称之为目标语言)的相应文本。

Riva NMT EA 为机器翻译提供了多种模型。这些模型分为三种模型架构

  1. Megatron 模型 基于 Megatron-BERT 架构,具有 5 亿个参数,能够从任何语言翻译成英语,反之亦然。例如,megatronnmt_en_any_500m 模型可用于将英语翻译成任何语言。

  2. 多语言模型 支持从一种源语言翻译成多种目标语言,反之亦然。例如,mnmt_en_deesfr_transformer24x6 模型可用于将英语翻译成德语、西班牙语和法语。多语言模型在其名称中包含多个语言代码。如果您需要支持多种语言,或者如果您想优化资源利用率(因为您可以在多个语言对之间进行翻译而无需加载多个模型),请使用多语言模型。运行多语言模型可以防止加载多个模型,这有助于防止开销。默认情况下,使用 24x6 多语言模型。如果您需要进一步降低资源消耗并且可以接受一些翻译质量下降,则可以使用 12x2 模型代替 24x6 多语言模型。

  3. 双语模型 用于从一种源语言翻译成另一种目标语言。例如,en_de_24x6 模型可用于将英语翻译成俄语。双语模型在其名称中包含单个语言代码对。当您希望获得特定语言对方向的最佳性能时,请使用双语模型。与运行多语言模型相比,运行双语模型产生的结果更快。

要了解有关 Riva NMT 的更多信息,请参阅 Riva NMT EA 文档。
有关 NMT 模型架构和训练的更多信息,请参阅 NeMo NMT 文档

支持的语言对:#

下表列出了 NVIDIA Riva Speech Skills NMT 服务支持的所有语言对的模型。
该表还提供了语言代码、Riva 快速入门指南的 config.sh 文件中的模型名称以及 API 调用期间指定的相应模型名称。

语言对

config.sh 中的模型名称

API 调用期间指定的模型名称

英语 (en) 到 简体中文 (zh)

rmir_en_zh_24x6

en_zh_24x6

简体中文 (zh) 到 英语 (en)

rmir_zh_en_24x6

zh_en_24x6

英语 (en) 到 俄语 (ru)

rmir_en_ru_24x6

en_ru_24x6

俄语 (ru) 到 英语 (en)

rmir_ru_en_24x6

ru_en_24x6

英语 (en) 到 德语 (de)

rmir_en_de_24x6

en_de_24x6

德语 (de) 到 英语 (en)

rmir_de_en_24x6

de_en_24x6

英语 (en) 到 西班牙语 (es)

rmir_en_es_24x6

en_es_24x6

西班牙语 (es) 到 英语 (en)

rmir_es_en_24x6

es_en_24x6

英语 (en) 到 法语 (fr)

rmir_en_fr_24x6

en_fr_24x6

法语 (fr) 到 英语 (en)

rmir_fr_en_24x6

fr_en_24x6

*任何语言到英语 (en)

rmir_megatronnmt_any_en_500m

megatronnmt_any_en_500m

*英语 (en) 到 任何语言

rmir_megatronnmt_en_any_500m

megatronnmt_en_any_500m

* 在 Megatron 模型中,“任何语言”指的是以下 32 种语言:cs, da, de, el, es, fi, fr, hu, it, lt, lv, nl, no, pl, pt, ro, ru, sk, sv, zh, ja, hi, ko, et, sl, bg, uk, hr, ar, vi, tr, id

要求和设置#

  1. 启动 Riva Speech Skills 服务器。
    要使用 Riva NMT 模型,我们首先需要在 Riva Speech Skills 服务器上部署它们。在运行本教程之前,请按照 Riva 快速入门指南中的说明在 Riva Speech Skills 服务器上部署 OOTB NMT 模型。对于本教程,请部署以下模型

    • 英语 (en) 法语 (fr) 双语模型 - 与此语言对对应的模型名称可以在 Riva 快速入门指南的 config.sh 中找到,请参考 上表

    • 英语 (en) 任何语言 Megatron 模型 - 与此对应的模型名称为 megatronnmt_en_any_500m。取消注释 config.sh 中包含此模型的行。

    • 西班牙语 (es) ASR、西班牙语到英语 (es-en) NMT 和英语 (en) TTS 模型 - 部署西班牙语(语言代码 es-US)ASR 模型和英语 (en-US) TTS 模型的说明可以在 config.sh 本身中找到,因为本教程的后半部分将介绍如何使用语音转语音 (S2S) 和语音转文本 (S2T) 服务。与西班牙语-英语语言对对应的模型名称可以在 上表 中找到。

  2. 安装 Riva 客户端库。
    按照 Riva NMT EA 教程的“运行 Riva 客户端”部分中的步骤,或 概述部分README.md 安装 Riva 客户端库。

  3. 安装运行本教程所需的其他库。

!apt-get install python3-dev
''' 
Install Pyaudio. portaudio19-dev is a prerequisite for Pyaudio.
'''
!apt-get update && apt-get install -y python3-pyaudio portaudio19-dev
!python -m pip install pyaudio
# If you run into errors running apt-get commands through Jupyter notebook, run this command directly on your local machine's terminal. You might need sudo access to run this command.
# For alternate options to install PyAudio, please refer to PyAudio documentation - https://people.csail.mit.edu/hubert/pyaudio/

'''
Install librosa.
'''
!apt-get update && apt-get install -y libsndfile1
# If you run into errors running apt-get commands through Jupyter notebook, run this command directly on your local machine's terminal. You might need sudo access to run this command.
!python -m pip install librosa

'''
Install nltk
'''
!python -m pip install nltk

使用 Riva NMT API 进行语言翻译#

现在,让我们使用 Riva API 和 OOTB 模型生成语言翻译。

导入 Riva 客户端库#

import riva.client

创建 Riva 客户端并连接到 Riva Speech API 服务器#

以下 URI 假定 Riva Speech API 服务器的本地部署在默认端口上。如果服务器部署在不同的主机上或通过 Kubernetes 上的 Helm Chart 进行,请使用适当的 URI。

# `Auth` class wraps a gRPC channel.
auth = riva.client.Auth(uri='localhost:50051')

# `NeuralMachineTranslationClient` is for sending requests to a server.
riva_nmt_client = riva.client.NeuralMachineTranslationClient(auth)

向 Riva Speech API 服务器发出 gRPC 请求#

使用双语 NMT 模型进行推理:#

现在,让我们向 Riva Speech 服务器的双语 NMT 模型 rmir_en_fr_24x6 发出 gRPC 请求,以进行从源语言英语 (en) 到目标语言法语 (fr) 的翻译。

eng_text = (
    "Molecular Biology is the field of biology that studies the composition, structure "
    "and interactions of cellular molecules – such as nucleic acids and proteins – that "
    "carry out the biological processes essential for the cell's functions and maintenance."
)
model_name = 'en_fr_24x6'
source_language = 'en'
target_language = 'fr'

要了解有关 NeuralMachineTranslationClient 的更多信息,请参阅相应的 docstring

现在,我们将请求提交到服务器。

response = riva_nmt_client.translate([eng_text], model_name, source_language, target_language)
# response.translations is a list of all translations - Each entry corresponds to the 
# corresponding entry in the texts attribute of TranslateTextRequest (nmt_request.texts) from above.

print("English Text: ", eng_text)
# Fetch the translated text from the 1st entry of response.translations
print("\nTranslated French Text: ", response.translations[0].text)

让我们向多语言 Megatron 模型 megatronnmt_en_any_500m 发出 gRPC 请求,以获取同一英语文本的法语翻译。

model_name = 'megatronnmt_en_any_500m'

response = riva_nmt_client.translate([eng_text], model_name, source_language, target_language)
# response.translations is a list of all translations - Each entry corresponds to the 
# corresponding entry in the texts attribute of TranslateTextRequest (nmt_request.texts) from above.

print("English Text: ", eng_text)
# Fetch the translated text from the 1st entry of response.translations
print("\nTranslated French Text: ", response.translations[0].text)

Riva NMT API - 处理大型输入文本:#

Riva NMT API 的最大输入令牌限制为 512 个令牌。如果提供的输入大于 512 个令牌,则 NMT API 不会返回完整的转录

eng_text = """
The effects of climate change span the impacts on physical environment, ecosystems and human societies due to ongoing human-caused climate change. The future impact of climate change depends on how much nations reduce greenhouse gas emissions and adapt to climate change. Effects that scientists predicted in the past—loss of sea ice, accelerated sea level rise and longer, more intense heat waves—are now occurring. The changes in climate are not expected to be uniform across the Earth. In particular, land areas change more quickly than oceans, and northern high latitudes change more quickly than the tropics. There are three major ways in which global warming will make changes to regional climate: melting ice, changing the hydrological cycle (of evaporation and precipitation) and changing currents in the oceans.
Physical changes include extreme weather, glacier retreat, sea level rise, declines in Arctic sea ice, and changes in the timing of seasonal events (such as earlier spring flowering). Since 1970, the ocean has absorbed more than 90% of the excess heat in the climate system. Even if global surface temperature is stabilized, sea levels will continue to rise and the ocean will continue to absorb excess heat from the atmosphere for many centuries. The uptake of carbon dioxide from the atmosphere is leading to ocean acidification.
Climate change has degraded land by raising temperatures, drying soils and increasing wildfire risk. Recent warming has strongly affected natural biological systems. Species worldwide are migrating poleward to colder areas. On land, species move to higher elevations, whereas marine species find colder water at greater depths. Between 1% and 50% of species on land were assessed to be at substantially higher risk of extinction due to climate change. Coral reefs and shellfish are vulnerable to the combined threat of ocean warming and acidification.
Food security and access to fresh water are at risk due to rising temperatures. Climate change has profound impacts on human health, directly via heat stress and indirectly via the spread of infectious diseases.
The effects of climate change span the impacts on physical environment, ecosystems and human societies due to ongoing human-caused climate change. The future impact of climate change depends on how much nations reduce greenhouse gas emissions and adapt to climate change. Effects that scientists predicted in the past—loss of sea ice, accelerated sea level rise and longer, more intense heat waves—are now occurring. The changes in climate are not expected to be uniform across the Earth. In particular, land areas change more quickly than oceans, and northern high latitudes change more quickly than the tropics. There are three major ways in which global warming will make changes to regional climate: melting ice, changing the hydrological cycle (of evaporation and precipitation) and changing currents in the oceans.
Physical changes include extreme weather, glacier retreat, sea level rise, declines in Arctic sea ice, and changes in the timing of seasonal events (such as earlier spring flowering). Since 1970, the ocean has absorbed more than 90% of the excess heat in the climate system. Even if global surface temperature is stabilized, sea levels will continue to rise and the ocean will continue to absorb excess heat from the atmosphere for many centuries. The uptake of carbon dioxide from the atmosphere is leading to ocean acidification.
Climate change has degraded land by raising temperatures, drying soils and increasing wildfire risk. Recent warming has strongly affected natural biological systems. Species worldwide are migrating poleward to colder areas. On land, species move to higher elevations, whereas marine species find colder water at greater depths. Between 1% and 50% of species on land were assessed to be at substantially higher risk of extinction due to climate change. Coral reefs and shellfish are vulnerable to the combined threat of ocean warming and acidification.
Food security and access to fresh water are at risk due to rising temperatures. Climate change has profound impacts on human health, directly via heat stress and indirectly via the spread of infectious diseases.
The effects of climate change span the impacts on physical environment, ecosystems and human societies due to ongoing human-caused climate change. The future impact of climate change depends on how much nations reduce greenhouse gas emissions and adapt to climate change. Effects that scientists predicted in the past—loss of sea ice, accelerated sea level rise and longer, more intense heat waves—are now occurring. The changes in climate are not expected to be uniform across the Earth. In particular, land areas change more quickly than oceans, and northern high latitudes change more quickly than the tropics. There are three major ways in which global warming will make changes to regional climate: melting ice, changing the hydrological cycle (of evaporation and precipitation) and changing currents in the oceans.
Physical changes include extreme weather, glacier retreat, sea level rise, declines in Arctic sea ice, and changes in the timing of seasonal events (such as earlier spring flowering). Since 1970, the ocean has absorbed more than 90% of the excess heat in the climate system. Even if global surface temperature is stabilized, sea levels will continue to rise and the ocean will continue to absorb excess heat from the atmosphere for many centuries. The uptake of carbon dioxide from the atmosphere is leading to ocean acidification.
Climate change has degraded land by raising temperatures, drying soils and increasing wildfire risk. Recent warming has strongly affected natural biological systems. Species worldwide are migrating poleward to colder areas. On land, species move to higher elevations, whereas marine species find colder water at greater depths. Between 1% and 50% of species on land were assessed to be at substantially higher risk of extinction due to climate change. Coral reefs and shellfish are vulnerable to the combined threat of ocean warming and acidification.
Food security and access to fresh water are at risk due to rising temperatures. Climate change has profound impacts on human health, directly via heat stress and indirectly via the spread of infectious diseases.
The effects of climate change span the impacts on physical environment, ecosystems and human societies due to ongoing human-caused climate change. The future impact of climate change depends on how much nations reduce greenhouse gas emissions and adapt to climate change. Effects that scientists predicted in the past—loss of sea ice, accelerated sea level rise and longer, more intense heat waves—are now occurring. The changes in climate are not expected to be uniform across the Earth. In particular, land areas change more quickly than oceans, and northern high latitudes change more quickly than the tropics. There are three major ways in which global warming will make changes to regional climate: melting ice, changing the hydrological cycle (of evaporation and precipitation) and changing currents in the oceans.
Physical changes include extreme weather, glacier retreat, sea level rise, declines in Arctic sea ice, and changes in the timing of seasonal events (such as earlier spring flowering). Since 1970, the ocean has absorbed more than 90% of the excess heat in the climate system. Even if global surface temperature is stabilized, sea levels will continue to rise and the ocean will continue to absorb excess heat from the atmosphere for many centuries. The uptake of carbon dioxide from the atmosphere is leading to ocean acidification.
Climate change has degraded land by raising temperatures, drying soils and increasing wildfire risk. Recent warming has strongly affected natural biological systems. Species worldwide are migrating poleward to colder areas. On land, species move to higher elevations, whereas marine species find colder water at greater depths. Between 1% and 50% of species on land were assessed to be at substantially higher risk of extinction due to climate change. Coral reefs and shellfish are vulnerable to the combined threat of ocean warming and acidification.
Food security and access to fresh water are at risk due to rising temperatures. Climate change has profound impacts on human health, directly via heat stress and indirectly via the spread of infectious diseases.
"""
model_name = 'en_fr_24x6'
source_language = 'en'
target_language = 'fr'

response = riva_nmt_client.translate([eng_text], model_name, source_language, target_language)
print("English Text: ", eng_text)
print("Translated French Text: ", response.translations[0].text)

如上所示,翻译后的法语文本在输入英语文本的 512 个令牌后被截断。

处理此类大型输入文本的最佳方法是将输入文本拆分,并将这些文本作为文本列表发送到 NMT API
遗憾的是,目前没有精确的方法来查找输入文本中的令牌数。从 WMT 测试集来看,在多个句子和模型中,平均而言,1 个令牌映射到 1.2 个字符(这仅作为估计值提供,并且映射可能因模型/句子对而异)。根据此估计,我们应将每次 API 调用的输入文本大小保持在小于 615 个字符(512 * 1.2)的范围内。我们还需要确保在拆分文本时尊重句子边界。
让我们看一个关于如何处理大型文本翻译的示例。

import nltk
nltk.download('punkt')

from nltk import tokenize

def nmt_large_text_split(input_text, max_chars = 615):
    """Function to split large input text"""
    
    def nmt_text_split_sentence_splitter(sentence_text, max_chars):
        """Function to split a sentence while respecting word boundaries, if sentence length > max_chars"""
        sentence_splits = []
        if len(sentence_text) > max_chars:
            words = sentence_text.split()
            for word in words:
                if len(sentence_splits) > 0 and (len(sentence_splits[-1]) + len(word) <= max_chars):
                    sentence_splits[-1] += word
                else:
                    sentence_splits.append(word)
        else:
            sentence_splits.append(sentence_text)
        return sentence_splits
    
    # 1. Split the input text into sentences 
    sentences = tokenize.sent_tokenize(input_text) # nltk.tokenize is the best way to split large text into sentences.
    
    # 2. Add input text to nmt_input_texts, ensuring no entry is greater than max_chars
    nmt_input_texts = []
    for i in range(len(sentences)):
        # 2.1. Split sentence if sentence length > max_chars, and update sentences 
        sentence_splits = nmt_text_split_sentence_splitter(sentences[i], max_chars)
        sentences = sentences[:i] + sentence_splits + sentences[i+1:]
        # 2.2. Adding entry to nmt_input_texts        
        if len(nmt_input_texts) > 0 and (len(nmt_input_texts[-1]) + len(sentences[i]) <= max_chars):
            nmt_input_texts[-1] += sentences[i]
        else:
            nmt_input_texts.append(sentences[i])    
    return nmt_input_texts
    

eng_text = """
The effects of climate change span the impacts on physical environment, ecosystems and human societies due to ongoing human-caused climate change. The future impact of climate change depends on how much nations reduce greenhouse gas emissions and adapt to climate change. Effects that scientists predicted in the past—loss of sea ice, accelerated sea level rise and longer, more intense heat waves—are now occurring. The changes in climate are not expected to be uniform across the Earth. In particular, land areas change more quickly than oceans, and northern high latitudes change more quickly than the tropics. There are three major ways in which global warming will make changes to regional climate: melting ice, changing the hydrological cycle (of evaporation and precipitation) and changing currents in the oceans.
Physical changes include extreme weather, glacier retreat, sea level rise, declines in Arctic sea ice, and changes in the timing of seasonal events (such as earlier spring flowering). Since 1970, the ocean has absorbed more than 90% of the excess heat in the climate system. Even if global surface temperature is stabilized, sea levels will continue to rise and the ocean will continue to absorb excess heat from the atmosphere for many centuries. The uptake of carbon dioxide from the atmosphere is leading to ocean acidification.
Climate change has degraded land by raising temperatures, drying soils and increasing wildfire risk. Recent warming has strongly affected natural biological systems. Species worldwide are migrating poleward to colder areas. On land, species move to higher elevations, whereas marine species find colder water at greater depths. Between 1% and 50% of species on land were assessed to be at substantially higher risk of extinction due to climate change. Coral reefs and shellfish are vulnerable to the combined threat of ocean warming and acidification.
Food security and access to fresh water are at risk due to rising temperatures. Climate change has profound impacts on human health, directly via heat stress and indirectly via the spread of infectious diseases.
"""
model_name = 'en_fr_24x6'
source_language = 'en'
target_language = 'fr'

parts = nmt_large_text_split(eng_text)

response = riva_nmt_client.translate(parts, model_name, source_language, target_language)

print("English Text:\n", eng_text)
print("Translated French Text:\n")
for i, translation in enumerate(response.translations):
    print(translation.text)
警告:请注意,您不能向模型传递超过 8 个文本。如果您传递超过 8 个输入,则响应将为空。

Riva S2T 和 S2S API#

Riva 语音转文本翻译 (S2T) 服务在给定的语言对之间转录音频到文本,即从源语言到目标语言。S2T 将音频流或音频缓冲区作为输入,并返回转录。Riva S2T 服务在内部由 Riva ASR 和 NMT 管道组成,并支持流式模式。

Riva 语音转语音翻译 (S2S) 服务在语言对之间翻译音频,即从一种源语言到另一种目标语言。S2S 将音频流或音频缓冲区作为输入,并返回生成的音频文件。Riva S2S 服务在内部由 Riva ASR、NMT 和 TTS 管道组成。Riva S2S 服务支持流式模式。

Riva ASR 为多种语言(如英语、西班牙语、德语、俄语和普通话)提供最先进的 OOTB(开箱即用)模型和管道。Riva 还支持以各种方式轻松自定义 ASR 管道,以满足您的特定需求。
Riva TTS 为英语提供两种最先进的声音(一种男性和一种女性)。Riva 还支持以各种方式轻松自定义 TTS,以满足您的特定需求。

在本节中,让我们看一些示例,说明如何从音频生成翻译后的语音和文本。确保您的 Riva 语音服务器已部署西班牙语 ASR、西班牙语到英语 NMT 和英语 TTS 模型。

Riva 语音转文本服务#

Riva S2T 服务支持以下语言对的模型

  1. 西班牙语 (es) 到英语 (en)

  2. 德语 (de)、西班牙语 (es)、法语 (fr) 到英语 (en)

  3. 简体中文 (zh) 到英语 (en)

  4. 俄语 (ru) 到英语 (en)

  5. 德语 (de) 到英语 (en)

  6. 法语 (fr) 到英语 (en)

让我们以将西班牙语语音翻译成英语文本为例。

导入 Riva 客户端库#

让我们导入一些必需的库,包括 Riva 客户端库。

import IPython.display as ipd
import numpy as np

# Riva ASR client import
import riva.client

创建 Riva 客户端并连接到 Riva Speech API 服务器#

以下 URI 假定 Riva Speech API 服务器的本地部署在默认端口上。如果服务器部署在不同的主机上或通过 Kubernetes 上的 Helm Chart 进行,请使用适当的 URI。

auth = riva.client.Auth(uri="localhost:50051")

# `NeuralMachineTranslationClient` is for sending requests to a server.
riva_nmt_client = riva.client.NeuralMachineTranslationClient(auth)

加载音频文件#

让我们加载一个音频文件并创建一个音频块生成器来模拟流式输入。

my_wav_file = "./audio_samples/es-US_sample.wav"
output_device = None  # use default device

wav_parameters = riva.client.get_wav_file_parameters(my_wav_file)
audio_chunk_iterator = riva.client.AudioChunkFileIterator(
    my_wav_file, chunk_n_frames=4800)

定义 S2T 配置#

S2T 配置是使用 Riva ASR 和 NMT 配置的序列组成的。

s2t_config = riva.client.StreamingTranslateSpeechToTextConfig(
        asr_config = riva.client.StreamingRecognitionConfig(
            config=riva.client.RecognitionConfig(
                encoding=riva.client.AudioEncoding.LINEAR_PCM,
                language_code='es-US',    # Spanish ASR model
                max_alternatives=1,
                profanity_filter=False,
                enable_automatic_punctuation=False,
                verbatim_transcripts=not True,
                sample_rate_hertz=16000,
                audio_channel_count=1,
            ),
            interim_results=True,
        ),
        translation_config = riva.client.TranslationConfig(
            source_language_code="es-US",    # Transcript's language is Spanish
            target_language_code='en-US',    # Target language is English
        ),
    )
向 Riva Speech API 服务器发出 Riva ASR gRPC 请求:#
# Create a response object which takes the S2T config and audio iterator as arguments

responses = riva_nmt_client.streaming_s2t_response_generator(
            audio_chunks=audio_chunk_iterator,
            streaming_config=s2t_config)

Riva 语音转语音服务#

Riva S2S 服务支持以下语言对的模型

  1. 西班牙语 (es) 到英语 (en)

  2. 德语 (de)、西班牙语 (es)、法语 (fr) 到英语 (en)

  3. 简体中文 (zh) 到英语 (en)

  4. 俄语 (ru) 到英语 (en)

  5. 德语 (de) 到英语 (en)

  6. 法语 (fr) 到英语 (en)

让我们以将西班牙语语音翻译成英语语音为例。

创建 Riva 客户端并连接到 Riva Speech API 服务器#

以下 URI 假定 Riva Speech API 服务器的本地部署在默认端口上。如果服务器部署在不同的主机上或通过 Kubernetes 上的 Helm Chart 进行,请使用适当的 URI。

auth = riva.client.Auth(uri="localhost:50051")

# `NeuralMachineTranslationClient` is for sending requests to a server.
riva_nmt_client = riva.client.NeuralMachineTranslationClient(auth)

加载音频文件#

让我们加载一个音频文件并创建一个音频块生成器来模拟流式输入。

my_wav_file = "./audio_samples/es-US_sample.wav"
output_device = None  # use default device

wav_parameters = riva.client.get_wav_file_parameters(my_wav_file)
audio_chunk_iterator = riva.client.AudioChunkFileIterator(
    my_wav_file, chunk_n_frames=4800)

定义 S2S 配置#

S2S 配置是使用 Riva ASR、NMT 和 TTS 配置的序列组成的。

s2s_config = riva.client.StreamingTranslateSpeechToSpeechConfig(
            asr_config = riva.client.StreamingRecognitionConfig(
            config=riva.client.RecognitionConfig(
                encoding=riva.client.AudioEncoding.LINEAR_PCM,
                language_code='es-US',    # Spanish ASR model
                max_alternatives=1,
                profanity_filter=False,
                enable_automatic_punctuation=False,
                verbatim_transcripts=not True,
                sample_rate_hertz=16000,
                audio_channel_count=1,
            ),
            interim_results=True,
        ),
        translation_config = riva.client.TranslationConfig(
            source_language_code="es-US",    # Transcript's language is Spanish
            target_language_code='en-US',    # Target language is English
        ),
        tts_config = riva.client.SynthesizeSpeechConfig(
            encoding=1,
            sample_rate_hz=44100,
            voice_name="English-US.Female-1",    # English Female voice
            language_code="en-US",
        ),
    )
向 Riva Speech API 服务器发出 Riva ASR gRPC 请求:#
# Create a response object which takes the S2S config and audio iterator as arguments

responses = riva_nmt_client.streaming_s2s_response_generator(
            audio_chunks=audio_chunk_iterator,
            streaming_config=s2s_config)
收听流式响应#
# Create an empty array to store the receiving audio buffer

empty = np.array([])

# Send requests and listen to streaming response from the S2S service
for i, rep in enumerate(responses):
    audio_samples = np.frombuffer(rep.speech.audio, dtype=np.int16) / (2**15)
    print("Chunk: ",i)
    ipd.display(ipd.Audio(audio_samples, rate=44100))
    empty = np.concatenate((empty, audio_samples))

# Full translated synthesized speech
print("Final synthesis:")
ipd.display(ipd.Audio(empty, rate=44100))

如上所示,我们合成了语音流,对应于 Riva ASR 生成的转录流。此语音可能是中间或完整的句子翻译,具体取决于音频中的句子边界。