高级用法#

本节为高级用户详细介绍推理脚本。

确保提供的客户端软件在 PYTHONPATH 中，并运行以下命令来设置客户端。

import os
import sys
import grpc

sys.path.append(os.path.join(os.getcwd(), "../interfaces"))
# Importing gRPC compiler auto-generated maxine eyecontact library
from eye_contact import eyecontact_pb2, eyecontact_pb2_grpc

NIM 调用使用双向 gRPC 流。要生成请求数据流，请定义一个 Python 生成器函数。这也称为 Python 迭代器，形式为一个简单的函数，在调用后产生结果。yield 返回要流式传输的块。流中的第一项用于配置对象，该对象设置 NVIDIA Maxine 眼神交流功能参数。

def generate_request_for_inference(
    input_filepath: str = "input.mp4", params: dict = {}
):
    """Generator to produce the request data stream
    Args:
      input_filepath: Path to input file
      params: Parameters for the feature
    """
    DATA_CHUNKS = 64 * 1024  # bytes, we send the mp4 file in 64KB chunks
    if params:  # if params is supplied, the first item in the input stream is a config object with parameters
        yield eyecontact_pb2.RedirectGazeRequest(
            config=eyecontact_pb2.RedirectGazeConfig(**params)
        )
    with open(input_filepath, "rb") as fd:
        while True:
            buffer = fd.read(DATA_CHUNKS)
            if buffer == b"":
                break
            yield eyecontact_pb2.RedirectGazeRequest(video_file_data=buffer)

以下参数可在此 NIM 中使用

temporal - (UINT32) 用于控制时间滤波的标志（默认 0xffffffff）。设置为 true 时，眼神交流的地标计算将进行时间优化。
detect_closure - (UINT32) 用于切换眼睛闭合和遮挡检测的标志。值为 0 或 1。（默认 0）。
eye_size_sensitivity - (UINT32) 眼睛大小敏感度参数，一个介于 2 到 6 之间的整数值（默认 3）
enable_lookaway - (UINT32) 用于切换视线移开功能的标志。如果设置为开启，眼睛会偶尔被重定向到随机看向别处一段时间，以避免凝视。值为 0 或 1。（默认 0）
lookaway_max_offset - (UINT32) 随机看向别处时，注视偏移角（度）的最大值，一个介于 1 到 10 之间的整数值（默认 5）
lookaway_interval_min - (UINT32) 随机看向别处发生的最小帧数限制，一个介于 1 到 600 之间的整数值（默认 100）
lookaway_interval_range - (UINT32) 用于选择随机看向别处发生的帧数范围，一个介于 1 到 600 之间的整数值（默认 250）
gaze_pitch_threshold_low - (FP32) 注视俯仰角阈值（度），在该阈值下，重定向开始从远离相机过渡到估计的注视方向，浮点数，介于 10 到 35 之间（默认 20）
gaze_pitch_threshold_high - (FP32) 注视俯仰角阈值（度），在该阈值下，重定向等于估计的注视方向，浮点数，介于 10 到 35 之间（默认 30）
gaze_yaw_threshold_low - (FP32) 注视偏航角阈值（度），在该阈值下，重定向开始从远离相机过渡到估计的注视方向，浮点数，介于 10 到 35 之间（默认 20）
gaze_yaw_threshold_high - (FP32) 注视偏航角阈值（度），在该阈值下，重定向等于估计的注视方向，浮点数，介于 10 到 35 之间（默认 30）
head_pitch_threshold_low - (FP32) 头部姿势俯仰角阈值（度），在该阈值下，重定向开始从远离相机过渡到估计的注视方向，浮点数，介于 10 到 35 之间（默认 15）
head_pitch_threshold_high - (FP32) 头部姿势俯仰角阈值（度），在该阈值下，重定向等于估计的注视方向，浮点数，介于 10 到 35 之间（默认 25）
head_yaw_threshold_low - (FP32) 头部姿势偏航角阈值（度），在该阈值下，重定向开始从远离相机过渡到估计的注视方向，浮点数，介于 10 到 35 之间（默认 25）
head_yaw_threshold_high - (FP32) 头部姿势偏航角阈值（度），在该阈值下，重定向等于估计的注视方向，浮点数，介于 10 到 35 之间（默认 30）

在调用 NIM 之前，定义一个处理传入流并将其写入输出文件的函数。有关此算法技术方面的更多详细信息，请参阅技术博客。

from typing import Iterator

def write_output_file_from_response(
    response_iter: Iterator[eyecontact_pb2.RedirectGazeResponse],
    output_filepath: str = "output.mp4",
) -> None:
    """Function to write the output file from the incoming gRPC data stream.
    Args:
      response_iter: Responses from the server
      output_filepath: Path to output file
    """
    with open(output_filepath, "wb") as fd:
        for response in response_iter:
            fd.write(response.video_file_data)

现在我们已经设置了请求生成器和输出迭代器，连接到 NIM 并调用它。输入文件路径在变量 input_filepath 中，输出文件写入在变量 output_filepath 中的位置。params 是一个 Python 字典，其中包含功能参数名称和值对。如果 params 为空，则使用默认值。

等待消息确认函数调用已完成，然后再检查输出文件。在下面的代码片段中填写目标主机的正确主机和端口

import time

input_filepath = "../assets/sample_input.mp4"
output_filepath = "output.mp4"
params = {}
# params = {"eye_size_sensitivity": 4, "detect_closure": 1 } # example of setting parameters

with grpc.insecure_channel(target="localhost:8004") as channel:
    try:
        stub = eyecontact_pb2_grpc.MaxineEyeContactServiceStub(channel)
        start_time = time.time()
        responses = stub.RedirectGaze(
            generate_request_for_inference(input_filepath=input_filepath, params=params)
        )
        if params:
            _ = next(responses)  # if we passed the config, the first output
                                # in the stream will be an echo which we will ignore
        write_output_file_from_response(
            response_iter=responses, output_filepath=output_filepath
        )
        end_time = time.time()
        print(
            f"Function invocation completed in {end_time-start_time:.2f}s, the output file is generated."
        )
    except BaseException as e:
        print(e)

多路并发输入#

要在多输入并发模式下运行服务器，请在服务器容器中将环境变量 MAXINE_MAX_CONCURRENCY_PER_GPU 设置为大于 1 的整数。然后，服务器将接受每个 GPU 的并发输入，数量由 MAXINE_MAX_CONCURRENCY_PER_GPU 变量指定。

由于 Triton 在所有 GPU 上平均分配工作负载，因此如果有 NUM_GPUS 个 GPU，则服务器支持的并发输入总数将为 NUM_GPUS * MAXINE_MAX_CONCURRENCY_PER_GPU。

模型缓存#

当容器首次启动时，它将从 NGC 下载所需的模型。为了避免在后续运行时下载模型，您可以使用缓存目录在本地缓存它们

# Create the cache directory on the host machine
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
chmod 777 $LOCAL_NIM_CACHE

# Run the container with the cache directory mounted in the appropriate location
docker run -it --rm --name=maxine-eye-contact-nim \
  --net host \
  --runtime=nvidia \
  --gpus all \
  --shm-size=8GB \
  -e NGC_API_KEY=$NGC_API_KEY \
  -e MAXINE_MAX_CONCURRENCY_PER_GPU=1 \
  -e NIM_HTTP_API_PORT=9000 \
  -e NIM_GRPC_API_PORT=50051 \
  -p 9000:9000 \
  -p 50051:50051 \
  -v "$LOCAL_NIM_CACHE:/home/nvs/.cache/nim" \
  nvcr.io/nim/nvidia/maxine-eye-contact:latest