A2F-3D NIM 手动容器部署和配置#

我们通过 NGC 注册表提供 Docker 容器以用于部署目的。本指南将演示如何部署、配置和运行 Audio2Face-3D NIM 的 Docker 镜像。

在继续之前,请务必先阅读架构概述页面,熟悉运行 Audio2Face-3D 所需的概念、服务和要求。

Audio2Face-3D 可以通过配置文件和环境变量进行高度配置。要配置 Audio2Face-3D,您需要使用自定义入口点。

先决条件#

为了运行微服务,您需要访问 NGC Docker 注册表

请确保您拥有NVAIE 访问权限,您的个人密钥,并且您已登录到nvcr.io 注册表

您还需要配置了 Docker 的 NVIDIA 容器工具包

有关硬件和软件要求的更多信息,请访问支持矩阵页面。

配置文件#

A2F-3D 配置文件有 3 种。这些配置文件中的每一个都对应于特定类型的用户。

  1. 艺术家:风格化配置包含仅艺术家会调整的特定配置参数。

  2. Devops:部署配置包含仅 Devops 需要考虑的特定配置参数。

  3. 高级用户:其余配置参数不太常更新。但对于特定场景,拥有它们是必要的。

警告

这些配置文件是部署时配置文件。虽然它们看起来与运行时配置文件相似,但不应将运行时和部署时配置文件混淆。这 2 个配置文件之间存在大小写差异(snake_case VS camelCase)和结构差异。这是一个运行时配置示例:config_james.yml

1. 风格化配置文件#

此配置文件有 3 种变体

  1. Claire

  2. James

  3. Mark

它们各自对应于特定的 AI 模型,并包含将要使用的默认值。默认情况下,微服务使用 James 的配置。

Claire 配置#

claire_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip, and it also defines the default preferred emotion.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true # Enable audio2emotion, ai-generated audio-driven emotion
  live_transition_time: 0.5 # Controls the smoothness of the output transition toward the target value across frames; higher values result in smoother transitions. Each frame updates at a rate of <frame time length> / <live transition time> (capped at 1.0) toward the raw result.
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
  inference_model_id: claire_v2.3
  blendshape_id: claire_topo1_v2.1

  face_params:
    eyelid_offset: 0.0 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.0 # Controls the magnitude of the input audio
    lip_close_offset: 0.0 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.25 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.0 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    enable_clamping_bs_weight: false

    # Multiplier for each blendshape output. This list depends on the blendshape model.
    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 1.0
      EyeLookInLeft: 1.0
      EyeLookOutLeft: 1.0
      EyeLookUpLeft: 1.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 1.0
      EyeLookInRight: 1.0
      EyeLookOutRight: 1.0
      EyeLookUpRight: 1.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 1.0
      JawLeft: 1.0
      JawRight: 1.0
      JawOpen: 1.0
      MouthClose: 1.0
      MouthFunnel: 1.0
      MouthPucker: 1.0
      MouthLeft: 1.0
      MouthRight: 1.0
      MouthSmileLeft: 1.0
      MouthSmileRight: 1.0
      MouthFrownLeft: 1.0
      MouthFrownRight: 1.0
      MouthDimpleLeft: 1.0
      MouthDimpleRight: 1.0
      MouthStretchLeft: 1.0
      MouthStretchRight: 1.0
      MouthRollLower: 1.0
      MouthRollUpper: 1.0
      MouthShrugLower: 1.0
      MouthShrugUpper: 1.0
      MouthPressLeft: 1.0
      MouthPressRight: 1.0
      MouthLowerDownLeft: 1.0
      MouthLowerDownRight: 1.0
      MouthUpperUpLeft: 1.0
      MouthUpperUpRight: 1.0
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 1.0
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 1.0
      NoseSneerRight: 1.0
      TongueOut: 1.0

    # Constant offset for each blendshape output. This list depends on the blendshape model.
    weight_offsets:
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0

James 配置#

james_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true
  live_transition_time: 0.5
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
  inference_model_id: james_v2.3
  blendshape_id: james_topo2_v2.2

  face_params:
    eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.0 # Controls the magnitude of the input audio
    lip_close_offset: -0.02 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.2 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.0 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    enable_clamping_bs_weight: false

    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 1.0
      EyeLookInLeft: 1.0
      EyeLookOutLeft: 1.0
      EyeLookUpLeft: 1.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 1.0
      EyeLookInRight: 1.0
      EyeLookOutRight: 1.0
      EyeLookUpRight: 1.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 1.0
      JawLeft: 1.0
      JawRight: 1.0
      JawOpen: 1.0
      MouthClose: 1.0
      MouthFunnel: 1.0
      MouthPucker: 1.0
      MouthLeft: 1.0
      MouthRight: 1.0
      MouthSmileLeft: 1.0
      MouthSmileRight: 1.0
      MouthFrownLeft: 1.0
      MouthFrownRight: 1.0
      MouthDimpleLeft: 1.0
      MouthDimpleRight: 1.0
      MouthStretchLeft: 1.0
      MouthStretchRight: 1.0
      MouthRollLower: 1.0
      MouthRollUpper: 1.0
      MouthShrugLower: 1.0
      MouthShrugUpper: 1.0
      MouthPressLeft: 1.0
      MouthPressRight: 1.0
      MouthLowerDownLeft: 1.0
      MouthLowerDownRight: 1.0
      MouthUpperUpLeft: 1.0
      MouthUpperUpRight: 1.0
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 1.0
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 1.0
      NoseSneerRight: 1.0
      TongueOut: 1.0

    weight_offsets:
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0

Mark 配置#

mark_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true
  live_transition_time: 0.5
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
  inference_model_id: mark_v2.3
  blendshape_id: mark_topo1_v2.1

  face_params:
    eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.3 # Controls the magnitude of the input audio
    lip_close_offset: -0.03 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.0023 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.4 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.1 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    enable_clamping_bs_weight: false

    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 1.0
      EyeLookInLeft: 1.0
      EyeLookOutLeft: 1.0
      EyeLookUpLeft: 1.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 1.0
      EyeLookInRight: 1.0
      EyeLookOutRight: 1.0
      EyeLookUpRight: 1.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 1.0
      JawLeft: 1.0
      JawRight: 1.0
      JawOpen: 1.0
      MouthClose: 1.0
      MouthFunnel: 1.0
      MouthPucker: 1.0
      MouthLeft: 1.0
      MouthRight: 1.0
      MouthSmileLeft: 1.0
      MouthSmileRight: 1.0
      MouthFrownLeft: 1.0
      MouthFrownRight: 1.0
      MouthDimpleLeft: 1.0
      MouthDimpleRight: 1.0
      MouthStretchLeft: 1.0
      MouthStretchRight: 1.0
      MouthRollLower: 1.0
      MouthRollUpper: 1.0
      MouthShrugLower: 1.0
      MouthShrugUpper: 1.0
      MouthPressLeft: 1.0
      MouthPressRight: 1.0
      MouthLowerDownLeft: 1.0
      MouthLowerDownRight: 1.0
      MouthUpperUpLeft: 1.0
      MouthUpperUpRight: 1.0
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 1.0
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 1.0
      NoseSneerRight: 1.0
      TongueOut: 1.0

    weight_offsets:
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0

2. 部署配置文件#

deployment_config.yaml
common:
  # Number of stream to use simultaneously
  # The recommended value depends on the gpu and your latency constraints
  # Higher value means: more concurrent users and higher overall throughput
  # Lower value means: less concurrent users, higher throughput per stream, lower latencies
  stream_number: 10

  # Pad each audio file with some 1.5 seconds of silent audio
  add_silence_padding_after_audio: false

logging:
  # Level of log wanted, info is recommended
  # Can be one of:
  # => trace
  # => debug
  # => info
  # => warn
  # => err
  # => critical
  # => off
  log_level: info
  # How often should FPS logs be printed per stream
  fps_logging_interval_second: 1

endpoints:
  # use the bidirectional endpoint instead of 2 connections (server to receive audio + client to send animation data)
  use_bidirectional: true

  # server to perform the bidirectional streaming connection
  # Used only if use_bidirectional_endpoint==true
  bidirectional:
    server:
      # port to open
      url: 0.0.0.0:52000

  unidirectional:
    # Server that receives the audio
    # Used only if use_bidirectional_endpoint==false
    server:
      # port to open
      url: 0.0.0.0:50000

    # Client that sends the animation data
    # Used only if use_bidirectional_endpoint==false
    client:
      # url of the server to contact
      url: 0.0.0.0:51000

# Configs specific to telemetry
telemetry:
  # Name of the service
  service_name: audio2face
  # Whether to enable metrics
  metrics_enabled: false
  # Whether to enable traces
  traces_enabled: false
  # Can be prometheus or otlp
  metrics_exporter: prometheus
  # Export interval in milliseconds
  otel_metric_export_interval: 60000
  # Export timeout in milliseconds
  otel_metric_export_timeout: 30000

  otlp_http_metrics_endpoint: https://127.0.0.1:4318/v1/metrics
  
  otlp_http_traces_endpoint: https://127.0.0.1:4318/v1/traces
  
  prometheus_endpoint: 0.0.0.0:9464

3. 高级配置文件#

advanced_config.yaml
input_sanitization:
  # max size of UUID
  max_len_uuid: 50
  # Maximum samplerate
  max_sample_rate: 144000
  # Minimum samplerate
  min_sample_rate: 16000
  # Maximum amount in second for the processing time
  # After this timeout the connection to A2F will be cut
  max_processing_duration_second: 300
  # Maximum size of 1 audio buffer sent over the grpc stream
  max_audio_buffer_size_second: 10
  # Maximum size of the audio clip to process
  max_audio_clip_size_second: 300
  # Maximum amount of time that A2F Controller will wait when not
  # receiving data from A2F, before cutting the connection
  max_wait_time_idle_ms: 30000
  # Will stop serving a user if their fps a lower than low_fps
  # for more than low_fps_max_duration_second seconds
  # For real time application less than 30 FPS means slower than realtime
  # So if users provide audio to the service at less than 30 FPS then
  # the interactive experience will stutter.
  low_fps: 29
  low_fps_max_duration_second: 7

garbage_collector:
  # enable or disable the garbage collector
  # This is only used with bidirectional connection where the service is holding data
  # waiting for the client to pick them up.
  enabled: true
  # how often the garbage collector should run
  interval_run_second: 10
  # If the garbage collector finds streams holding
  # more than N seconds of data, it will delete data
  # until the amount falls below this threshold.
  # Clients are expected to retrieve data promptly so that
  # the service doesn't retain the data excessively.
  max_size_stored_data_second: 60


pipeline_parameters:
  # Queues between pipeline components
  # Can be tweaked:
  # Higher values can lead to higher throughput but leads to higher latencies
  # Lower values leads to lower latencies; and potentially lower overall throughput
  # Leave these values to default in case of doubt
  queue_size_after_a2e: 1
  queue_size_after_a2f: 300
  queue_size_after_streammux: 1

  streammux:
    # Do not change this config; this is internal
    adaptive_batching: 0
    # Minimum FPS for all streams
    # Pipeline will not slow down under this value if:
    # * compute allows it
    # * upload speed of audio allows it
    # Here 40 FPS
    # Numerator for that config:
    overall_min_fps_n: 40
    # Denominator for that config:
    overall_min_fps_d: 1

a2f:
  # Remove temporal smoothing
  # used for debugging individual frames generated
  temporal_smoothing: true
  device_id: 0 # Which gpu id to use

a2e:
  inference_interval: 10
  device_id: 0 # Which gpu id to use


trt_model_generation:
  a2e:
    precision: "fp16"
    min_shape: 1
    optimal_shape: 10
    maximum_shape: 128
  a2f:
    precision: "fp16"
    min_shape: 1
    optimal_shape: 10
    maximum_shape: 128

以上配置文件表示微服务正在使用的默认值。

要应用您自己的配置,请使用自定义端点启动 A2F-3D NIM,并将您的配置文件挂载到容器内部。有关详细说明,请参阅本指南的下一节。

如何使用配置文件#

要覆盖任何配置,您需要将文件挂载到 /mnt/configs 路径中的 Docker 卷中。为了方便起见,导出环境变量,其中包含被覆盖的配置所在的路径,名为 LOCAL_CONFIGS。要执行此操作,您可以按照以下示例说明进行操作

$ mkdir -p ~/.cache/audio2face-3d-configs
$ export LOCAL_CONFIGS=~/.cache/audio2face-3d-configs

您需要复制上述配置,并将它们放置在 LOCAL_CONFIGS 目录中。

然后您将拥有

$ ls $LOCAL_CONFIGS
advanced_config.yaml
claire_stylization_config.yaml
deployment_config.yaml
james_stylization_config.yaml
mark_stylization_config.yaml

模型缓存#

您可以将模型缓存在本地,这样下次运行服务时就不必生成或下载它。要缓存模型,请使用 Docker 卷挂载。确保本地路径具有执行、读取和写入权限(777 权限)。您可以按照这些说明设置本地路径,以使用正确的权限缓存模型

$ mkdir -p ~/.cache/audio2face-3d
$ chmod 777 ~/.cache/audio2face-3d
$ export LOCAL_NIM_CACHE=~/.cache/audio2face-3d

当您第二次运行时,如果您想运行 NIM 入口点而不是自定义入口点,请确保通过设置环境变量 NIM_DISABLE_MODEL_DOWNLOAD=true 来禁用模型缓存。

使用自定义入口点启动 A2F-3D NIM#

$ docker run -it --rm --name audio2face-3d \
 --gpus all \
 --network=host \
 --entrypoint /bin/bash -w /opt/nvidia/a2f_pipeline \
 -e NIM_DISABLE_MODEL_DOWNLOAD=true \
 -e NIM_SKIP_A2F_START=true \
 -v "$LOCAL_NIM_CACHE:/tmp/a2x" \
 -v "$LOCAL_CONFIGS:/mnt/configs/" \
 nvcr.io/nim/nvidia/audio2face-3d:1.2

以上命令将创建一个运行 Audio2face-3D NIM 的 Docker 容器。请注意,指定了 --gpus all 以将 GPU 桥接到 Docker 容器。您可以根据自己的喜好自定义此选项。

我们还使用 --network=host 来绑定本地网络上的所有端口。如果您想精细控制端口绑定,请使用 -p 指令以及适当的端口。

请注意用于缓存模型的两个卷挂载 -v "$LOCAL_NIM_CACHE:/tmp/a2x" 和用于覆盖配置的 -v "$LOCAL_CONFIGS:/mnt/configs/"。如果您不想更改默认配置或分别缓存模型,请跳过每个卷挂载。

然后,您应该会进入一个 shell

triton-server@host-name:/opt/nvidia/a2f_pipeline$

在容器内部,通过运行以下命令启动 NIM 服务器

$ /opt/nim/start_server.sh &

& 运算符将服务器作为后台进程启动,使您能够在容器内运行其他命令。如果您没有立即返回到 shell 提示符,请按 Enter 键以重新获得访问权限并继续执行命令。

以下命令在容器内部运行,除非另有说明。

生成 TRT 引擎#

第一步是使用提供的 python 应用程序为 AI 模型(特定于您的机器使用的 GPU)生成 TRT 引擎,如下所示

usage: generate_trt_models.py [-h] [--stylization-config STYLIZATION_CONFIG] [--advanced-config ADVANCED_CONFIG]

Generates TRT models for A2F Service.

options:
  -h, --help            show this help message and exit
  --stylization-config  STYLIZATION_CONFIG
                        file path to the stylization config
  --advanced-config     ADVANCED_CONFIG
                        file path to the advanced config

如果您想坚持使用这些默认值,则无需指定任何内容。

注意

您可以备份生成的 TRT 引擎,以跳过 NIM 启动时的模型生成,但请注意,每个模型都特定于每个 GPU。生成的模型位于 Docker 容器内的 /tmp/a2x 目录中。

使用默认配置生成 Audio2Emotion 和 Audio2Face TRT 引擎

$ ./service/generate_trt_models.py

当部署环境发生更改时,需要重新生成此 TRT 引擎。当出现 GPU 更改时尤其如此,包括不同的架构或计算能力。生成的 TRT 引擎有可能在具有完全相同的受控配置(相同的硬件 + docker)的机器上重复使用。建议在每次硬件更改时都重新生成 TRT 引擎。

启动服务#

第二步是启动服务。Audio2Face-3D 服务帮助菜单如下所示

$ a2f_pipeline.run -h
Usage: a2f_pipeline.run [--help] [--version] [--stylization-config] [--deployment-config] [--advanced-config]

Optional arguments:
  -h, --help                     shows help message and exits
  -v, --version                  prints version information and exits
  --stylization-config           file path to the stylization config
  --deployment-config            file path to the deployment config
  --advanced-config              file path to the advanced config

要使用默认配置,您只需在容器内运行

$ /usr/local/bin/a2f_pipeline.run

您应该看到如下日志,表示 A2F-3D 服务已正确启动。

[2024-04-23 12:44:33.066] [  global  ] [info] Running...

注意

启动服务时,您可能会遇到标记为 GStreamer-WARNING 的警告。出现这些警告是因为容器中缺少某些库。但是,这些警告可以安全地忽略,因为 Audio2Face-3D 不使用这些库。

更改配置 - 最短路径#

以下命令在容器内部运行。

假设您决定使用 claire 模型,您可以运行以下命令

$ ./service/generate_trt_models.py --stylization-config /mnt/configs/claire_stylization_config.yaml \
   --advanced-config /mnt/configs/advanced_config.yaml
$ a2f_pipeline.run --stylization-config /mnt/configs/claire_stylization_config.yaml \
                   --deployment-config /mnt/configs/deployment_config.yaml \
                   --advanced-config /mnt/configs/advanced_config.yaml

警告

当前的 ./service/generate_trt_models.py 不支持缓存失效。如果您更新了配置文件并想要重新生成模型,则需要删除 /tmp/a2x/ 缓存文件夹中相应的 TRT 模型

然后,您将拥有一个运行容器,其中包含自定义提供的参数。

更改配置 - 灵活方式#

指定配置文件的工作方式是通过覆盖值。这意味着您不必在配置文件中指定默认值。因此,您的配置文件只需要包含默认配置文件的一部分子集。

此外,对于风格化配置的 a2f-3d 部分;指定 inference_model_id 将自动加载与该 id 匹配的默认面部参数;指定 blendshape_id 将自动加载默认的 blendshape 参数。

一个例子可以说明这一点,应该会使事情非常清楚

示例 1:设置风格化配置以使用 Mark#

在主机上,在 $LOCAL_CONFIGS 目录中创建一个名为 short_mark_stylization_config.yaml 的文件,并添加以下行

a2f:
  inference_model_id: mark_v2.3
  blendshape_id: mark_topo1_v2.1

然后,在容器内部,运行

$ ./service/generate_trt_models.py --stylization-config  /mnt/configs/short_mark_stylization_config.yaml
$ a2f_pipeline.run --stylization-config /mnt/configs/short_mark_stylization_config.yaml

警告

当前的 ./service/generate_trt_models.py 不支持缓存失效。如果您更新了配置文件并想要重新生成模型,则需要删除 /tmp/a2x/ 缓存文件夹中相应的 TRT 模型

此命令与提供完整的默认 Mark 配置文件具有完全相同的效果。原因是,在底层,inference_model_idblendshape_id 用于加载这些默认值。

示例 2:将端点类型更新为单向#

在这里,我们将讨论 deployment_config.yaml 的设置部分。

在主机上,在 $LOCAL_CONFIGS 目录中创建一个名为 unidirectional_deployment_config.yaml 的文件,并添加以下行

endpoints:
  use_bidirectional: false

然后,在容器内部,运行

$ ./service/generate_trt_models.py
$ a2f_pipeline.run --deployment-config /mnt/configs/unidirectional_deployment_config.yaml

这会将端点类型从双向覆盖为单向。

此方法适用于提供的 yaml 文件的任何键。

警告

请确保使用与您的配置文件匹配的选项。

--stylization-config # for the <any>_stylization_config.yaml
--deployment-config # for the deployment_config.yaml
--advanced-config # for the advanced_config.yaml

高级风格化#

上述风格化配置 blendshape 调整已为新用户简化。

对于高级用户,以下部分提供了更多信息。

高级 Blendshape 调整

可以为 blendshape 调整设置 3 个更多参数

  • active_poses:哪些 Blendshape 应该处于活动状态。1 表示活动;0 表示非活动

  • cancel_poses:哪些 Blendshape 相互抵消;匹配的数字表示哪个与哪个匹配;-1 表示无操作

  • symmetry_poses:哪个 Blendshape 与另一个对称;匹配的数字表示哪个与哪个匹配;-1 表示无操作

claire_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true
  live_transition_time: 0.5
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
  inference_model_id: claire_v2.3
  blendshape_id: claire_topo1_v2.1

  face_params:
    eyelid_offset: 0.0 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.0 # Controls the magnitude of the input audio
    lip_close_offset: 0.0 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.25 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.0 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    enable_clamping_bs_weight: false

    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 1.0
      EyeLookInLeft: 1.0
      EyeLookOutLeft: 1.0
      EyeLookUpLeft: 1.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 1.0
      EyeLookInRight: 1.0
      EyeLookOutRight: 1.0
      EyeLookUpRight: 1.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 1.0
      JawLeft: 1.0
      JawRight: 1.0
      JawOpen: 1.0
      MouthClose: 1.0
      MouthFunnel: 1.0
      MouthPucker: 1.0
      MouthLeft: 1.0
      MouthRight: 1.0
      MouthSmileLeft: 1.0
      MouthSmileRight: 1.0
      MouthFrownLeft: 1.0
      MouthFrownRight: 1.0
      MouthDimpleLeft: 1.0
      MouthDimpleRight: 1.0
      MouthStretchLeft: 1.0
      MouthStretchRight: 1.0
      MouthRollLower: 1.0
      MouthRollUpper: 1.0
      MouthShrugLower: 1.0
      MouthShrugUpper: 1.0
      MouthPressLeft: 1.0
      MouthPressRight: 1.0
      MouthLowerDownLeft: 1.0
      MouthLowerDownRight: 1.0
      MouthUpperUpLeft: 1.0
      MouthUpperUpRight: 1.0
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 1.0
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 1.0
      NoseSneerRight: 1.0
      TongueOut: 1.0

    weight_offsets:
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0

    active_poses:
      EyeBlinkLeft: 1
      EyeLookDownLeft: 0
      EyeLookInLeft: 0
      EyeLookOutLeft: 0
      EyeLookUpLeft: 0
      EyeSquintLeft: 1
      EyeWideLeft: 1
      EyeBlinkRight: 1
      EyeLookDownRight: 0
      EyeLookInRight: 0
      EyeLookOutRight: 0
      EyeLookUpRight: 0
      EyeSquintRight: 1
      EyeWideRight: 1
      JawForward: 1
      JawLeft: 1
      JawRight: 1
      JawOpen: 1
      MouthClose: 1
      MouthFunnel: 1
      MouthPucker: 1
      MouthLeft: 1
      MouthRight: 1
      MouthSmileLeft: 1
      MouthSmileRight: 1
      MouthFrownLeft: 1
      MouthFrownRight: 1
      MouthDimpleLeft: 1
      MouthDimpleRight: 1
      MouthStretchLeft: 1
      MouthStretchRight: 1
      MouthRollLower: 1
      MouthRollUpper: 1
      MouthShrugLower: 1
      MouthShrugUpper: 1
      MouthPressLeft: 1
      MouthPressRight: 1
      MouthLowerDownLeft: 1
      MouthLowerDownRight: 1
      MouthUpperUpLeft: 1
      MouthUpperUpRight: 1
      BrowDownLeft: 1
      BrowDownRight: 1
      BrowInnerUp: 1
      BrowOuterUpLeft: 1
      BrowOuterUpRight: 1
      CheekPuff: 1
      CheekSquintLeft: 1
      CheekSquintRight: 1
      NoseSneerLeft: 1
      NoseSneerRight: 1
      TongueOut: 0

    cancel_poses:
      EyeBlinkLeft: -1
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: -1
      EyeBlinkRight: -1
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: -1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: -1
      MouthSmileRight: -1
      MouthFrownLeft: -1
      MouthFrownRight: -1
      MouthDimpleLeft: -1
      MouthDimpleRight: -1
      MouthStretchLeft: -1
      MouthStretchRight: -1
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: -1
      MouthPressRight: -1
      MouthLowerDownLeft: -1
      MouthLowerDownRight: -1
      MouthUpperUpLeft: -1
      MouthUpperUpRight: -1
      BrowDownLeft: -1
      BrowDownRight: -1
      BrowInnerUp: -1
      BrowOuterUpLeft: -1
      BrowOuterUpRight: -1
      CheekPuff: -1
      CheekSquintLeft: -1
      CheekSquintRight: -1
      NoseSneerLeft: -1
      NoseSneerRight: -1
      TongueOut: -1

    symmetry_poses:
      EyeBlinkLeft: 0
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: 1
      EyeBlinkRight: 0
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: 1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: 2
      MouthSmileRight: 2
      MouthFrownLeft: 3
      MouthFrownRight: 3
      MouthDimpleLeft: 4
      MouthDimpleRight: 4
      MouthStretchLeft: 5
      MouthStretchRight: 5
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: 6
      MouthPressRight: 6
      MouthLowerDownLeft: 7
      MouthLowerDownRight: 7
      MouthUpperUpLeft: 8
      MouthUpperUpRight: 8
      BrowDownLeft: 10
      BrowDownRight: 10
      BrowInnerUp: -1
      BrowOuterUpLeft: 9
      BrowOuterUpRight: 9
      CheekPuff: -1
      CheekSquintLeft: 11
      CheekSquintRight: 11
      NoseSneerLeft: 12
      NoseSneerRight: 12
      TongueOut: -1
james_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true
  live_transition_time: 0.5
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
  inference_model_id: james_v2.3
  blendshape_id: james_topo2_v2.2

  face_params:
    eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.0 # Controls the magnitude of the input audio
    lip_close_offset: -0.02 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.2 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.0 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    enable_clamping_bs_weight: false

    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 1.0
      EyeLookInLeft: 1.0
      EyeLookOutLeft: 1.0
      EyeLookUpLeft: 1.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 1.0
      EyeLookInRight: 1.0
      EyeLookOutRight: 1.0
      EyeLookUpRight: 1.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 1.0
      JawLeft: 1.0
      JawRight: 1.0
      JawOpen: 1.0
      MouthClose: 1.0
      MouthFunnel: 1.0
      MouthPucker: 1.0
      MouthLeft: 1.0
      MouthRight: 1.0
      MouthSmileLeft: 1.0
      MouthSmileRight: 1.0
      MouthFrownLeft: 1.0
      MouthFrownRight: 1.0
      MouthDimpleLeft: 1.0
      MouthDimpleRight: 1.0
      MouthStretchLeft: 1.0
      MouthStretchRight: 1.0
      MouthRollLower: 1.0
      MouthRollUpper: 1.0
      MouthShrugLower: 1.0
      MouthShrugUpper: 1.0
      MouthPressLeft: 1.0
      MouthPressRight: 1.0
      MouthLowerDownLeft: 1.0
      MouthLowerDownRight: 1.0
      MouthUpperUpLeft: 1.0
      MouthUpperUpRight: 1.0
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 1.0
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 1.0
      NoseSneerRight: 1.0
      TongueOut: 1.0

    weight_offsets:
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0

    active_poses:
      EyeBlinkLeft: 1
      EyeLookDownLeft: 0
      EyeLookInLeft: 0
      EyeLookOutLeft: 0
      EyeLookUpLeft: 0
      EyeSquintLeft: 1
      EyeWideLeft: 1
      EyeBlinkRight: 1
      EyeLookDownRight: 0
      EyeLookInRight: 0
      EyeLookOutRight: 0
      EyeLookUpRight: 0
      EyeSquintRight: 1
      EyeWideRight: 1
      JawForward: 1
      JawLeft: 1
      JawRight: 1
      JawOpen: 1
      MouthClose: 1
      MouthFunnel: 1
      MouthPucker: 1
      MouthLeft: 1
      MouthRight: 1
      MouthSmileLeft: 1
      MouthSmileRight: 1
      MouthFrownLeft: 1
      MouthFrownRight: 1
      MouthDimpleLeft: 1
      MouthDimpleRight: 1
      MouthStretchLeft: 1
      MouthStretchRight: 1
      MouthRollLower: 1
      MouthRollUpper: 1
      MouthShrugLower: 1
      MouthShrugUpper: 1
      MouthPressLeft: 1
      MouthPressRight: 1
      MouthLowerDownLeft: 1
      MouthLowerDownRight: 1
      MouthUpperUpLeft: 1
      MouthUpperUpRight: 1
      BrowDownLeft: 1
      BrowDownRight: 1
      BrowInnerUp: 1
      BrowOuterUpLeft: 1
      BrowOuterUpRight: 1
      CheekPuff: 1
      CheekSquintLeft: 1
      CheekSquintRight: 1
      NoseSneerLeft: 1
      NoseSneerRight: 1
      TongueOut: 0

    cancel_poses:
      EyeBlinkLeft: -1
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: -1
      EyeBlinkRight: -1
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: -1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: -1
      MouthSmileRight: -1
      MouthFrownLeft: -1
      MouthFrownRight: -1
      MouthDimpleLeft: -1
      MouthDimpleRight: -1
      MouthStretchLeft: -1
      MouthStretchRight: -1
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: -1
      MouthPressRight: -1
      MouthLowerDownLeft: -1
      MouthLowerDownRight: -1
      MouthUpperUpLeft: -1
      MouthUpperUpRight: -1
      BrowDownLeft: -1
      BrowDownRight: -1
      BrowInnerUp: -1
      BrowOuterUpLeft: -1
      BrowOuterUpRight: -1
      CheekPuff: -1
      CheekSquintLeft: -1
      CheekSquintRight: -1
      NoseSneerLeft: -1
      NoseSneerRight: -1
      TongueOut: -1

    symmetry_poses:
      EyeBlinkLeft: 0
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: 1
      EyeBlinkRight: 0
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: 1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: 2
      MouthSmileRight: 2
      MouthFrownLeft: 3
      MouthFrownRight: 3
      MouthDimpleLeft: 4
      MouthDimpleRight: 4
      MouthStretchLeft: 5
      MouthStretchRight: 5
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: 6
      MouthPressRight: 6
      MouthLowerDownLeft: 7
      MouthLowerDownRight: 7
      MouthUpperUpLeft: 8
      MouthUpperUpRight: 8
      BrowDownLeft: 10
      BrowDownRight: 10
      BrowInnerUp: -1
      BrowOuterUpLeft: 9
      BrowOuterUpRight: 9
      CheekPuff: -1
      CheekSquintLeft: 11
      CheekSquintRight: 11
      NoseSneerLeft: 12
      NoseSneerRight: 12
      TongueOut: -1
mark_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true
  live_transition_time: 0.5
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
  inference_model_id: mark_v2.3
  blendshape_id: mark_topo1_v2.1

  face_params:
    eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.3 # Controls the magnitude of the input audio
    lip_close_offset: -0.03 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.0023 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.4 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.1 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    enable_clamping_bs_weight: false

    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 1.0
      EyeLookInLeft: 1.0
      EyeLookOutLeft: 1.0
      EyeLookUpLeft: 1.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 1.0
      EyeLookInRight: 1.0
      EyeLookOutRight: 1.0
      EyeLookUpRight: 1.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 1.0
      JawLeft: 1.0
      JawRight: 1.0
      JawOpen: 1.0
      MouthClose: 1.0
      MouthFunnel: 1.0
      MouthPucker: 1.0
      MouthLeft: 1.0
      MouthRight: 1.0
      MouthSmileLeft: 1.0
      MouthSmileRight: 1.0
      MouthFrownLeft: 1.0
      MouthFrownRight: 1.0
      MouthDimpleLeft: 1.0
      MouthDimpleRight: 1.0
      MouthStretchLeft: 1.0
      MouthStretchRight: 1.0
      MouthRollLower: 1.0
      MouthRollUpper: 1.0
      MouthShrugLower: 1.0
      MouthShrugUpper: 1.0
      MouthPressLeft: 1.0
      MouthPressRight: 1.0
      MouthLowerDownLeft: 1.0
      MouthLowerDownRight: 1.0
      MouthUpperUpLeft: 1.0
      MouthUpperUpRight: 1.0
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 1.0
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 1.0
      NoseSneerRight: 1.0
      TongueOut: 1.0

    weight_offsets:
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0

    active_poses:
      EyeBlinkLeft: 1
      EyeLookDownLeft: 0
      EyeLookInLeft: 0
      EyeLookOutLeft: 0
      EyeLookUpLeft: 0
      EyeSquintLeft: 1
      EyeWideLeft: 1
      EyeBlinkRight: 1
      EyeLookDownRight: 0
      EyeLookInRight: 0
      EyeLookOutRight: 0
      EyeLookUpRight: 0
      EyeSquintRight: 1
      EyeWideRight: 1
      JawForward: 1
      JawLeft: 1
      JawRight: 1
      JawOpen: 1
      MouthClose: 1
      MouthFunnel: 1
      MouthPucker: 1
      MouthLeft: 1
      MouthRight: 1
      MouthSmileLeft: 1
      MouthSmileRight: 1
      MouthFrownLeft: 1
      MouthFrownRight: 1
      MouthDimpleLeft: 1
      MouthDimpleRight: 1
      MouthStretchLeft: 1
      MouthStretchRight: 1
      MouthRollLower: 1
      MouthRollUpper: 1
      MouthShrugLower: 1
      MouthShrugUpper: 1
      MouthPressLeft: 1
      MouthPressRight: 1
      MouthLowerDownLeft: 1
      MouthLowerDownRight: 1
      MouthUpperUpLeft: 1
      MouthUpperUpRight: 1
      BrowDownLeft: 1
      BrowDownRight: 1
      BrowInnerUp: 1
      BrowOuterUpLeft: 1
      BrowOuterUpRight: 1
      CheekPuff: 1
      CheekSquintLeft: 1
      CheekSquintRight: 1
      NoseSneerLeft: 1
      NoseSneerRight: 1
      TongueOut: 0

    cancel_poses:
      EyeBlinkLeft: -1
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: -1
      EyeBlinkRight: -1
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: -1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: -1
      MouthSmileRight: -1
      MouthFrownLeft: -1
      MouthFrownRight: -1
      MouthDimpleLeft: -1
      MouthDimpleRight: -1
      MouthStretchLeft: -1
      MouthStretchRight: -1
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: -1
      MouthPressRight: -1
      MouthLowerDownLeft: -1
      MouthLowerDownRight: -1
      MouthUpperUpLeft: -1
      MouthUpperUpRight: -1
      BrowDownLeft: -1
      BrowDownRight: -1
      BrowInnerUp: -1
      BrowOuterUpLeft: -1
      BrowOuterUpRight: -1
      CheekPuff: -1
      CheekSquintLeft: -1
      CheekSquintRight: -1
      NoseSneerLeft: -1
      NoseSneerRight: -1
      TongueOut: -1

    symmetry_poses:
      EyeBlinkLeft: 0
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: 1
      EyeBlinkRight: 0
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: 1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: 2
      MouthSmileRight: 2
      MouthFrownLeft: 3
      MouthFrownRight: 3
      MouthDimpleLeft: 4
      MouthDimpleRight: 4
      MouthStretchLeft: 5
      MouthStretchRight: 5
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: 6
      MouthPressRight: 6
      MouthLowerDownLeft: 7
      MouthLowerDownRight: 7
      MouthUpperUpLeft: 8
      MouthUpperUpRight: 8
      BrowDownLeft: 10
      BrowDownRight: 10
      BrowInnerUp: -1
      BrowOuterUpLeft: 9
      BrowOuterUpRight: 9
      CheekPuff: -1
      CheekSquintLeft: 11
      CheekSquintRight: 11
      NoseSneerLeft: 12
      NoseSneerRight: 12
      TongueOut: -1

Unreal Engine Metahuman 的配置文件#

如果您计划将 A2F-3D 与 MetaHuman 角色连接,那么您将需要使用为它们适配的配置文件。与默认配置文件相比,这些配置文件的唯一更改是 blendshape 乘数和偏移量

MetaHuman 风格化配置文件
claire_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true
  live_transition_time: 0.5
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
  inference_model_id: claire_v2.3
  blendshape_id: claire_topo1_v2.1

  face_params:
    eyelid_offset: 0.0 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.0 # Controls the magnitude of the input audio
    lip_close_offset: 0.0 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.25 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.0 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    enable_clamping_bs_weight: false

    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 0.7
      JawLeft: 0.2
      JawRight: 0.2
      JawOpen: 1.0
      MouthClose: 1.0
      MouthFunnel: 1.2
      MouthPucker: 1.2
      MouthLeft: 0.2
      MouthRight: 0.2
      MouthSmileLeft: 0.8
      MouthSmileRight: 0.8
      MouthFrownLeft: 0.4
      MouthFrownRight: 0.4
      MouthDimpleLeft: 0.7
      MouthDimpleRight: 0.7
      MouthStretchLeft: 0.1
      MouthStretchRight: 0.1
      MouthRollLower: 0.9
      MouthRollUpper: 0.5
      MouthShrugLower: 0.9
      MouthShrugUpper: 0.4
      MouthPressLeft: 0.8
      MouthPressRight: 0.8
      MouthLowerDownLeft: 0.8
      MouthLowerDownRight: 0.8
      MouthUpperUpLeft: 0.8
      MouthUpperUpRight: 0.8
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 0.2
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 0.8
      NoseSneerRight: 0.8
      TongueOut: 0.0

    weight_offsets:  # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0

    active_poses: # Define which poses are active and which one are not
      EyeBlinkLeft: 1
      EyeLookDownLeft: 0
      EyeLookInLeft: 0
      EyeLookOutLeft: 0
      EyeLookUpLeft: 0
      EyeSquintLeft: 1
      EyeWideLeft: 1
      EyeBlinkRight: 1
      EyeLookDownRight: 0
      EyeLookInRight: 0
      EyeLookOutRight: 0
      EyeLookUpRight: 0
      EyeSquintRight: 1
      EyeWideRight: 1
      JawForward: 1
      JawLeft: 1
      JawRight: 1
      JawOpen: 1
      MouthClose: 1
      MouthFunnel: 1
      MouthPucker: 1
      MouthLeft: 1
      MouthRight: 1
      MouthSmileLeft: 1
      MouthSmileRight: 1
      MouthFrownLeft: 1
      MouthFrownRight: 1
      MouthDimpleLeft: 1
      MouthDimpleRight: 1
      MouthStretchLeft: 1
      MouthStretchRight: 1
      MouthRollLower: 1
      MouthRollUpper: 1
      MouthShrugLower: 1
      MouthShrugUpper: 1
      MouthPressLeft: 1
      MouthPressRight: 1
      MouthLowerDownLeft: 1
      MouthLowerDownRight: 1
      MouthUpperUpLeft: 1
      MouthUpperUpRight: 1
      BrowDownLeft: 1
      BrowDownRight: 1
      BrowInnerUp: 1
      BrowOuterUpLeft: 1
      BrowOuterUpRight: 1
      CheekPuff: 1
      CheekSquintLeft: 1
      CheekSquintRight: 1
      NoseSneerLeft: 1
      NoseSneerRight: 1
      TongueOut: 0

    cancel_poses: # Define which poses cancel each other
      EyeBlinkLeft: -1
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: -1
      EyeBlinkRight: -1
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: -1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: -1
      MouthSmileRight: -1
      MouthFrownLeft: -1
      MouthFrownRight: -1
      MouthDimpleLeft: -1
      MouthDimpleRight: -1
      MouthStretchLeft: -1
      MouthStretchRight: -1
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: -1
      MouthPressRight: -1
      MouthLowerDownLeft: -1
      MouthLowerDownRight: -1
      MouthUpperUpLeft: -1
      MouthUpperUpRight: -1
      BrowDownLeft: -1
      BrowDownRight: -1
      BrowInnerUp: -1
      BrowOuterUpLeft: -1
      BrowOuterUpRight: -1
      CheekPuff: -1
      CheekSquintLeft: -1
      CheekSquintRight: -1
      NoseSneerLeft: -1
      NoseSneerRight: -1
      TongueOut: -1

    symmetry_poses: # Define which poses are symmetric to each other
      EyeBlinkLeft: 0
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: 1
      EyeBlinkRight: 0
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: 1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: 2
      MouthSmileRight: 2
      MouthFrownLeft: 3
      MouthFrownRight: 3
      MouthDimpleLeft: 4
      MouthDimpleRight: 4
      MouthStretchLeft: 5
      MouthStretchRight: 5
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: 6
      MouthPressRight: 6
      MouthLowerDownLeft: 7
      MouthLowerDownRight: 7
      MouthUpperUpLeft: 8
      MouthUpperUpRight: 8
      BrowDownLeft: 10
      BrowDownRight: 10
      BrowInnerUp: -1
      BrowOuterUpLeft: 9
      BrowOuterUpRight: 9
      CheekPuff: -1
      CheekSquintLeft: 11
      CheekSquintRight: 11
      NoseSneerLeft: 12
      NoseSneerRight: 12
      TongueOut: -1
james_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true
  live_transition_time: 0.5
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
  inference_model_id: james_v2.3
  blendshape_id: james_topo2_v2.2

  face_params:
    eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.0 # Controls the magnitude of the input audio
    lip_close_offset: -0.02 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.006 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.2 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.0 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    enable_clamping_bs_weight: false

    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 0.7
      JawLeft: 0.2
      JawRight: 0.2
      JawOpen: 0.8
      MouthClose: 0.3
      MouthFunnel: 1.0
      MouthPucker: 1.0
      MouthLeft: 0.2
      MouthRight: 0.2
      MouthSmileLeft: 1.2
      MouthSmileRight: 1.2
      MouthFrownLeft: 0.5
      MouthFrownRight: 0.5
      MouthDimpleLeft: 0.8
      MouthDimpleRight: 0.8
      MouthStretchLeft: 0.05
      MouthStretchRight: 0.05
      MouthRollLower: 0.8
      MouthRollUpper: 0.5
      MouthShrugLower: 1.0
      MouthShrugUpper: 0.4
      MouthPressLeft: 0.8
      MouthPressRight: 0.8
      MouthLowerDownLeft: 0.8
      MouthLowerDownRight: 0.8
      MouthUpperUpLeft: 0.8
      MouthUpperUpRight: 0.8
      BrowDownLeft: 1.2
      BrowDownRight: 1.2
      BrowInnerUp: 1.3
      BrowOuterUpLeft: 0.8
      BrowOuterUpRight: 0.8
      CheekPuff: 0.2
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 0.8
      NoseSneerRight: 0.8
      TongueOut: 0.0

    weight_offsets:  # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0

    active_poses: # Define which poses are active and which one are not
      EyeBlinkLeft: 1
      EyeLookDownLeft: 0
      EyeLookInLeft: 0
      EyeLookOutLeft: 0
      EyeLookUpLeft: 0
      EyeSquintLeft: 1
      EyeWideLeft: 1
      EyeBlinkRight: 1
      EyeLookDownRight: 0
      EyeLookInRight: 0
      EyeLookOutRight: 0
      EyeLookUpRight: 0
      EyeSquintRight: 1
      EyeWideRight: 1
      JawForward: 1
      JawLeft: 1
      JawRight: 1
      JawOpen: 1
      MouthClose: 1
      MouthFunnel: 1
      MouthPucker: 1
      MouthLeft: 1
      MouthRight: 1
      MouthSmileLeft: 1
      MouthSmileRight: 1
      MouthFrownLeft: 1
      MouthFrownRight: 1
      MouthDimpleLeft: 1
      MouthDimpleRight: 1
      MouthStretchLeft: 1
      MouthStretchRight: 1
      MouthRollLower: 1
      MouthRollUpper: 1
      MouthShrugLower: 1
      MouthShrugUpper: 1
      MouthPressLeft: 1
      MouthPressRight: 1
      MouthLowerDownLeft: 1
      MouthLowerDownRight: 1
      MouthUpperUpLeft: 1
      MouthUpperUpRight: 1
      BrowDownLeft: 1
      BrowDownRight: 1
      BrowInnerUp: 1
      BrowOuterUpLeft: 1
      BrowOuterUpRight: 1
      CheekPuff: 1
      CheekSquintLeft: 1
      CheekSquintRight: 1
      NoseSneerLeft: 1
      NoseSneerRight: 1
      TongueOut: 0

    cancel_poses: # Define which poses cancel each other
      EyeBlinkLeft: -1
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: -1
      EyeBlinkRight: -1
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: -1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: -1
      MouthSmileRight: -1
      MouthFrownLeft: -1
      MouthFrownRight: -1
      MouthDimpleLeft: -1
      MouthDimpleRight: -1
      MouthStretchLeft: -1
      MouthStretchRight: -1
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: -1
      MouthPressRight: -1
      MouthLowerDownLeft: -1
      MouthLowerDownRight: -1
      MouthUpperUpLeft: -1
      MouthUpperUpRight: -1
      BrowDownLeft: -1
      BrowDownRight: -1
      BrowInnerUp: -1
      BrowOuterUpLeft: -1
      BrowOuterUpRight: -1
      CheekPuff: -1
      CheekSquintLeft: -1
      CheekSquintRight: -1
      NoseSneerLeft: -1
      NoseSneerRight: -1
      TongueOut: -1

    symmetry_poses: # Define which poses are symmetric to each other
      EyeBlinkLeft: 0
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: 1
      EyeBlinkRight: 0
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: 1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: 2
      MouthSmileRight: 2
      MouthFrownLeft: 3
      MouthFrownRight: 3
      MouthDimpleLeft: 4
      MouthDimpleRight: 4
      MouthStretchLeft: 5
      MouthStretchRight: 5
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: 6
      MouthPressRight: 6
      MouthLowerDownLeft: 7
      MouthLowerDownRight: 7
      MouthUpperUpLeft: 8
      MouthUpperUpRight: 8
      BrowDownLeft: 10
      BrowDownRight: 10
      BrowInnerUp: -1
      BrowOuterUpLeft: 9
      BrowOuterUpRight: 9
      CheekPuff: -1
      CheekSquintLeft: 11
      CheekSquintRight: 11
      NoseSneerLeft: 12
      NoseSneerRight: 12
      TongueOut: -1
mark_stylization_config.yaml
# These are the default emotions applied at the beginning of any audio clip.
# Their values range from 0.0 to 1.0
default_beginning_emotions:
  amazement: 0.0
  anger: 0.0
  cheekiness: 0.0
  disgust: 0.0
  fear: 0.0
  grief: 0.0
  joy: 0.0
  outofbreath: 0.0
  pain: 0.0
  sadness: 0.0

a2e:
  enabled: true
  live_transition_time: 0.5
  post_processing_params:
    emotion_contrast: 1.0 # Increases the spread between emotion values by pushing them higher or lower
    emotion_strength: 0.6 # Sets the strength of generated emotions relative to neutral emotion
    enable_preferred_emotion: true # Activate blending preferred emotion with auto-emotion
    live_blend_coef: 0.7 # Coefficient for exponential smoothing of emotion
    max_emotions: 3 # Sets a firm limit on the quantity of emotion sliders engaged by A2E - emotions with the highest weight will be prioritized
    preferred_emotion_strength: 0.5 # Sets the strength of the preferred emotion (if is loaded) relative to generated emotions

a2f:
  # A2F model, can be one of james_v2.3, claire_v2.3 or mark_v2.3
  inference_model_id: mark_v2.3
  blendshape_id: mark_topo1_v2.1

  face_params:
    eyelid_offset: 0.06 # Adjusts the default pose of eyelid open-close
    face_mask_level: 0.6 # Determines the boundary between the upper and lower regions of the face
    face_mask_softness: 0.0085 # Determines how smoothly the upper and lower face regions blend on the boundary
    input_strength: 1.3 # Controls the magnitude of the input audio
    lip_close_offset: -0.03 # Adjusts the default pose of lip close-open
    lower_face_smoothing: 0.0023 # Applies temporal smoothing to the lower face motion
    lower_face_strength: 1.4 # Controls the range of motion on the lower regions of the face
    skin_strength: 1.1 # Controls the range of motion of the skin
    upper_face_smoothing: 0.001 # Applies temporal smoothing to the upper face motion
    upper_face_strength: 1.0 # Controls the range of motion on the upper regions of the face

  blendshape_params: # Modulates the effect of each blendshapes. Gain * w + offset
    enable_clamping_bs_weight: false

    weight_multipliers:
      EyeBlinkLeft: 1.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 1.0
      EyeWideLeft: 1.0
      EyeBlinkRight: 1.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 1.0
      EyeWideRight: 1.0
      JawForward: 0.7
      JawLeft: 0.2
      JawRight: 0.2
      JawOpen: 1.0
      MouthClose: 0.2
      MouthFunnel: 1.2
      MouthPucker: 1.2
      MouthLeft: 0.2
      MouthRight: 0.2
      MouthSmileLeft: 0.8
      MouthSmileRight: 0.8
      MouthFrownLeft: 0.5
      MouthFrownRight: 0.5
      MouthDimpleLeft: 0.8
      MouthDimpleRight: 0.8
      MouthStretchLeft: 0.05
      MouthStretchRight: 0.05
      MouthRollLower: 0.8
      MouthRollUpper: 0.5
      MouthShrugLower: 0.9
      MouthShrugUpper: 0.4
      MouthPressLeft: 0.8
      MouthPressRight: 0.8
      MouthLowerDownLeft: 0.8
      MouthLowerDownRight: 0.8
      MouthUpperUpLeft: 0.8
      MouthUpperUpRight: 0.8
      BrowDownLeft: 1.0
      BrowDownRight: 1.0
      BrowInnerUp: 1.0
      BrowOuterUpLeft: 1.0
      BrowOuterUpRight: 1.0
      CheekPuff: 0.2
      CheekSquintLeft: 1.0
      CheekSquintRight: 1.0
      NoseSneerLeft: 0.8
      NoseSneerRight: 0.8
      TongueOut: 0.0

    weight_offsets:  # Modulates the effect of each blendshapes. blendshape_values * weight_multipliers + weight_offsets
      EyeBlinkLeft: 0.0
      EyeLookDownLeft: 0.0
      EyeLookInLeft: 0.0
      EyeLookOutLeft: 0.0
      EyeLookUpLeft: 0.0
      EyeSquintLeft: 0.0
      EyeWideLeft: 0.0
      EyeBlinkRight: 0.0
      EyeLookDownRight: 0.0
      EyeLookInRight: 0.0
      EyeLookOutRight: 0.0
      EyeLookUpRight: 0.0
      EyeSquintRight: 0.0
      EyeWideRight: 0.0
      JawForward: 0.0
      JawLeft: 0.0
      JawRight: 0.0
      JawOpen: 0.0
      MouthClose: 0.0
      MouthFunnel: 0.0
      MouthPucker: 0.0
      MouthLeft: 0.0
      MouthRight: 0.0
      MouthSmileLeft: 0.0
      MouthSmileRight: 0.0
      MouthFrownLeft: 0.0
      MouthFrownRight: 0.0
      MouthDimpleLeft: 0.0
      MouthDimpleRight: 0.0
      MouthStretchLeft: 0.0
      MouthStretchRight: 0.0
      MouthRollLower: 0.0
      MouthRollUpper: 0.0
      MouthShrugLower: 0.0
      MouthShrugUpper: 0.0
      MouthPressLeft: 0.0
      MouthPressRight: 0.0
      MouthLowerDownLeft: 0.0
      MouthLowerDownRight: 0.0
      MouthUpperUpLeft: 0.0
      MouthUpperUpRight: 0.0
      BrowDownLeft: 0.0
      BrowDownRight: 0.0
      BrowInnerUp: 0.0
      BrowOuterUpLeft: 0.0
      BrowOuterUpRight: 0.0
      CheekPuff: 0.0
      CheekSquintLeft: 0.0
      CheekSquintRight: 0.0
      NoseSneerLeft: 0.0
      NoseSneerRight: 0.0
      TongueOut: 0.0

    active_poses: # Define which poses are active and which one are not
      EyeBlinkLeft: 1
      EyeLookDownLeft: 0
      EyeLookInLeft: 0
      EyeLookOutLeft: 0
      EyeLookUpLeft: 0
      EyeSquintLeft: 1
      EyeWideLeft: 1
      EyeBlinkRight: 1
      EyeLookDownRight: 0
      EyeLookInRight: 0
      EyeLookOutRight: 0
      EyeLookUpRight: 0
      EyeSquintRight: 1
      EyeWideRight: 1
      JawForward: 1
      JawLeft: 1
      JawRight: 1
      JawOpen: 1
      MouthClose: 1
      MouthFunnel: 1
      MouthPucker: 1
      MouthLeft: 1
      MouthRight: 1
      MouthSmileLeft: 1
      MouthSmileRight: 1
      MouthFrownLeft: 1
      MouthFrownRight: 1
      MouthDimpleLeft: 1
      MouthDimpleRight: 1
      MouthStretchLeft: 1
      MouthStretchRight: 1
      MouthRollLower: 1
      MouthRollUpper: 1
      MouthShrugLower: 1
      MouthShrugUpper: 1
      MouthPressLeft: 1
      MouthPressRight: 1
      MouthLowerDownLeft: 1
      MouthLowerDownRight: 1
      MouthUpperUpLeft: 1
      MouthUpperUpRight: 1
      BrowDownLeft: 1
      BrowDownRight: 1
      BrowInnerUp: 1
      BrowOuterUpLeft: 1
      BrowOuterUpRight: 1
      CheekPuff: 1
      CheekSquintLeft: 1
      CheekSquintRight: 1
      NoseSneerLeft: 1
      NoseSneerRight: 1
      TongueOut: 0

    cancel_poses: # Define which poses cancel each other
      EyeBlinkLeft: -1
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: -1
      EyeBlinkRight: -1
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: -1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: -1
      MouthSmileRight: -1
      MouthFrownLeft: -1
      MouthFrownRight: -1
      MouthDimpleLeft: -1
      MouthDimpleRight: -1
      MouthStretchLeft: -1
      MouthStretchRight: -1
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: -1
      MouthPressRight: -1
      MouthLowerDownLeft: -1
      MouthLowerDownRight: -1
      MouthUpperUpLeft: -1
      MouthUpperUpRight: -1
      BrowDownLeft: -1
      BrowDownRight: -1
      BrowInnerUp: -1
      BrowOuterUpLeft: -1
      BrowOuterUpRight: -1
      CheekPuff: -1
      CheekSquintLeft: -1
      CheekSquintRight: -1
      NoseSneerLeft: -1
      NoseSneerRight: -1
      TongueOut: -1

    symmetry_poses: # Define which poses are symmetric to each other
      EyeBlinkLeft: 0
      EyeLookDownLeft: -1
      EyeLookInLeft: -1
      EyeLookOutLeft: -1
      EyeLookUpLeft: -1
      EyeSquintLeft: -1
      EyeWideLeft: 1
      EyeBlinkRight: 0
      EyeLookDownRight: -1
      EyeLookInRight: -1
      EyeLookOutRight: -1
      EyeLookUpRight: -1
      EyeSquintRight: -1
      EyeWideRight: 1
      JawForward: -1
      JawLeft: -1
      JawRight: -1
      JawOpen: -1
      MouthClose: -1
      MouthFunnel: -1
      MouthPucker: -1
      MouthLeft: -1
      MouthRight: -1
      MouthSmileLeft: 2
      MouthSmileRight: 2
      MouthFrownLeft: 3
      MouthFrownRight: 3
      MouthDimpleLeft: 4
      MouthDimpleRight: 4
      MouthStretchLeft: 5
      MouthStretchRight: 5
      MouthRollLower: -1
      MouthRollUpper: -1
      MouthShrugLower: -1
      MouthShrugUpper: -1
      MouthPressLeft: 6
      MouthPressRight: 6
      MouthLowerDownLeft: 7
      MouthLowerDownRight: 7
      MouthUpperUpLeft: 8
      MouthUpperUpRight: 8
      BrowDownLeft: 10
      BrowDownRight: 10
      BrowInnerUp: -1
      BrowOuterUpLeft: 9
      BrowOuterUpRight: 9
      CheekPuff: -1
      CheekSquintLeft: 11
      CheekSquintRight: 11
      NoseSneerLeft: 12
      NoseSneerRight: 12
      TongueOut: -1

参数调整指南#

Audio2Face-3D 从多个来源导入推理参数:推理模型 SDK、部署时配置文件和运行时输入。通常,部署时参数会覆盖模型文件中匹配的参数,而运行时参数会覆盖部署时和模型默认参数。

有关运行时参数,请参阅 AudioStreamHeaderFaceParameters、BlendShapeParameters、EmotionParameters、EmotionPostProcessingParameters 以获取 proto 定义。

FaceParameters

运行时调整仅支持 FaceParameters 的子集。有关受支持的列表,请参阅 FaceParameters

情绪后处理参数

Audio2Emotion SDK 会自动从传入的音频中解析情绪,并生成情绪向量来驱动角色的面部动画表演。使用以下后处理参数可以进一步根据您所需的规格定制表演。请注意,下面列出的操作顺序是这些过程在技术堆栈中执行的特定顺序。

情绪对比度

情绪对比度应用于推理输出,使用 sigmoid 函数控制情绪的扩散。此调整会推高和推低值,从而允许在生成的情绪表演中获得更广泛的范围。

最大情绪数

最大情绪数允许用户对 Audio2Emotion SDK 将参与的情绪数量设置硬性限制。情绪按其强度进行优先级排序。一旦达到最大情绪数,则只会参与这些优先级排序的情绪的向量,而所有其他情绪都将为空。当声音情绪表演更微妙时,这有助于更准确地读取正确的情绪

例如 - 如果 Joy 和 Amazement 是最强烈的预测情绪,并且您将最大情绪数限制设置为 2,则只会将 Joy 和 Amazement 应用于表演。

情绪索引转换

情绪索引转换使用情绪对应关系将情绪从 Audio2Emotion 重新映射到 Audio2Face SDK。

平滑

使用实时混合系数对重新映射的情绪进行指数平滑。

混合首选情绪

首选情绪(手动情绪)和推理情绪输出相结合,生成所有情绪数据的复合最终输出。

过渡平滑

过渡平滑将指数平滑应用于最终情绪值。(Audio2Emotion + 首选情绪的组合)

情绪强度

这控制先前情绪过程中最终情绪复合的整体情绪强度。最终情绪结果的乘数。(Audio2Emotion + 首选)

首选情绪

使用情绪滑块创建首选(手动)情绪姿势,作为角色动画的基础情绪。首选情绪取自情绪小部件中的当前设置,并在整个动画中与生成的情绪混合。

Blendshape 参数

目前,模型数据中包含的默认 blendshape 参数已针对 MetaHuman 头像进行了调整。对于我们的默认头像(Claire、Mark、Ben),风格化配置中的所有 52 个 weight_multipliers 值都应设置为 1.0。

环境变量#

下表描述了可以作为 -e 参数传递给 Audio2Face-3D NIM 的环境变量,该参数添加到 docker run 命令中

变量 (Variable)

必需 (Required)

值 (Values)

注释 (Notes)

NGC_API_KEY

否 (No)

任何表示有效 NGC API 密钥的字符串

仅当您想从 NGC 下载 TRT 引擎时才需要。您必须将此变量设置为您的个人 NGC API 密钥的值。

NIM_LOGGING_JSONL

否 (No)

true / false

启用 (true) 或禁用 (false) JSON Lines 格式日志记录到 stdout。

NIM_MANIFEST_PROFILE

否 (No)

任何有效的 manifest profile 字符串

支持的模型 (Supported Models) 中为您的 GPU 选择 manifest profile ID。

NIM_DISABLE_MODEL_DOWNLOAD

否 (No)

true / false

禁用 (true) 或启用 (false) 从 NGC 自动下载 TRT 引擎。

NIM_SKIP_A2F_START

否 (No)

true / false

如果设置为 true,容器启动时将不会启动 A2F-3D 服务。

卷 (Volumes)#

下表描述了容器内部的路径,本地路径可以挂载到这些路径中。例如,您可以使用以下 docker 标志挂载卷 -v {LOCAL_PATH}:{PATH_IN_CONTAINER}

容器路径 (Container path)

必需 (Required)

注释 (Notes)

/tmp/a2x/

不是必需的,但如果未挂载此卷,则容器每次启动时都必须重新下载或生成模型

AI 模型路径。必须具有执行、读取和写入权限或 777。

/mnt/configs/

仅在您想要覆盖某些配置参数的情况下才需要

用于覆盖配置的文件路径

Audio2Face-3D 微服务的快速部署 (Quick Deployment of Audio2Face-3D Microservices)#

除了部署 Audio2Face-3D 并手动启动模型之外,您还可以使用 docker-compose 文件,按照 quick-start 说明,在 NVIDIA Audio2Face-3D Samples 仓库中提供的说明,快速部署它们。