高级用法

本节为高级用户详细介绍推理脚本。

Studio Voice NIM 使用 gRPC 端点。导入已编译的 gRPC protos 以调用 NIM。

import os
import sys
import grpc

sys.path.append(os.path.join(os.getcwd(), "../interfaces/studio_voice"))
# Importing gRPC compiler auto-generated maxine studiovoice library
import studiovoice_pb2, studiovoice_pb2_grpc  # noqa: E402

NIM 调用使用双向 gRPC 流。要生成请求数据流,请定义一个 Python 生成器函数。这也称为 Python 迭代器形式,一个在调用后产生结果的简单函数。yield 返回要流式传输的块。

def generate_request_for_inference(input_filepath: os.PathLike) -> None:
    """Generator to produce the request data stream

    Args:
      input_filepath: Path to input file
    """
    DATA_CHUNKS = 64 * 1024  # bytes, we send the wav file in 64KB chunks
    with open(input_filepath, "rb") as fd:
        while True:
            buffer = fd.read(DATA_CHUNKS)
            if buffer == b"":
                break
            yield studiovoice_pb2.EnhanceAudioRequest(audio_stream_data=buffer)

在调用 NIM 之前,定义一个处理传入流并将其写入输出文件的函数。

from typing import Iterator

def write_output_file_from_response(
    response_iter: Iterator[studiovoice_pb2.EnhanceAudioResponse],
    output_filepath: os.PathLike
) -> None:
    """Function to write the output file from the incoming gRPC data stream.

    Args:
      response_iter: Responses from the server to write into output file
      output_filepath: Path to output file
    """
    with open(output_filepath, "wb") as fd:
        for response in response_iter:
            if response.HasField("audio_stream_data"):
                fd.write(response.audio_stream_data)

现在我们已经设置了请求生成器和输出迭代器,连接到 NIM 并调用它。输入文件路径存储在变量 input_filepath 中,输出文件写入到变量 output_filepath 中指定的位置。在检查输出文件之前,请等待确认函数调用已完成的消息。在下面的代码片段中填写目标的正确主机和端口

import time

input_filepath = "../assets/studio_voice_48k_input.wav"
output_filepath = "studio_voice_48k_output.wav"

with grpc.insecure_channel(target="localhost:8001") as channel:
    try:
        stub = studiovoice_pb2_grpc.MaxineStudioVoiceStub(channel)
        start_time = time.time()

        responses = stub.EnhanceAudio(
            generate_request_for_inference(input_filepath=input_filepath),
            metadata=None,
        )

        write_output_file_from_response(response_iter=responses, output_filepath=output_filepath)

        end_time = time.time()
        print(
            f"Function invocation completed in {end_time-start_time:.2f}s, the output file is generated."
        )
    except BaseException as e:
        print(e)

编译 Protos

NVIDIA Maxine NIM Clients 软件包附带预编译的 protos。但是,要在本地编译 protos,请安装所需的依赖项

Linux

要在 Linux 上编译 protos,请运行

# Go to studio-voice/protos folder
cd studio-voice/protos

chmod +x compile_protos.sh
./compile_protos.sh

Windows

要在 Windows 上编译 protos,请运行

# Go to studio-voice/protos folder
cd studio-voice/protos

compile_protos.bat

模型缓存

当容器首次启动时,它将从 NGC 下载所需的模型。为了避免在后续运行时下载模型,您可以本地缓存它们,方法是使用缓存目录

# Create the cache directory on the host machine
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
chmod 777 $LOCAL_NIM_CACHE

# Run the container with the cache directory mounted in the appropriate location
docker run -it --rm --name=maxine-studio-voice \
    --net host \
    --runtime=nvidia \
    --gpus all \
    --shm-size=8GB \
    -e NGC_API_KEY=$NGC_API_KEY \
    -e NIM_MODEL_PROFILE=<nim_model_profile> \
    -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
    nvcr.io/nim/nvidia/maxine-studio-voice:latest

确保 nim_model_profile 与您的 GPU 兼容。有关 nim_model_profile 的更多信息,请参阅 NIM 模型配置文件表