重要提示
您正在查看 NeMo 2.0 文档。此版本对 API 进行了重大更改,并引入了一个新的库 NeMo Run。我们目前正在将 NeMo 1.0 的所有功能移植到 2.0。 有关先前版本或 2.0 中尚未提供的功能的文档,请参阅 NeMo 24.07 文档。
NeMo 音频配置文件#
本节介绍 NeMo 配置文件设置,该设置特定于音频集合中的模型。 有关如何设置和运行所有 NeMo 模型通用的实验的常规信息(例如,实验管理器和 PyTorch Lightning 训练器参数),请参阅 NeMo 模型 部分。
NeMo 音频配置文件的模型部分通常需要有关所用数据集、正在执行的任何增强的参数以及模型架构规范的信息。
所有 NeMo 音频模型的示例配置文件都可以在 examples 的 config 目录中找到。
NeMo 数据集配置#
训练、验证和测试参数分别在配置文件中使用 model.train_ds
、model.validation_ds
和 model.test_ds
部分指定。 根据任务的不同,可能存在指定加载音频示例的采样率或持续时间的参数。 某些字段可以省略,并在运行时通过命令行指定。 有关数据集类及其各自参数的列表,请参阅 API 的 数据集处理类 部分。 训练、验证和测试数据集的示例配置如下
model:
sample_rate: 16000
skip_nan_grad: false
train_ds:
manifest_filepath: ???
input_key: audio_filepath # key of the input signal path in the manifest
target_key: target_filepath # key of the target signal path in the manifest
target_channel_selector: 0 # target signal is the first channel from files in target_key
audio_duration: 4.0 # in seconds, audio segment duration for training
random_offset: true # if the file is longer than audio_duration, use random offset to select a subsegment
min_duration: ${model.train_ds.audio_duration}
batch_size: 64 # batch size may be increased based on the available memory
shuffle: true
num_workers: 8
pin_memory: true
validation_ds:
manifest_filepath: ???
input_key: audio_filepath # key of the input signal path in the manifest
target_key: target_filepath # key of the target signal path in the manifest
target_channel_selector: 0 # target signal is the first channel from files in target_key
batch_size: 64 # batch size may be increased based on the available memory
shuffle: false
num_workers: 4
pin_memory: true
test_ds:
manifest_filepath: ???
input_key: audio_filepath # key of the input signal path in the manifest
target_key: target_filepath # key of the target signal path in the manifest
target_channel_selector: 0 # target signal is the first channel from files in target_key
batch_size: 1 # batch size may be increased based on the available memory
shuffle: false
num_workers: 4
pin_memory: true
有关在线增强的更多信息可以在 示例配置中找到
Lhotse 数据集配置#
Lhotse CutSet#
以 Lhotse CutSet 格式的示例训练数据集可以配置如下
train_ds:
use_lhotse: true # enable Lhotse data loader
cuts_path: ??? # path to Lhotse cuts manifest with input signals and the corresponding target signals (target signals should be in the custom "target_recording" field)
truncate_duration: 4.00 # truncate audio to 4 seconds
truncate_offset_type: random # if the file is longer than truncate_duration, use random offset to select a subsegment
batch_size: 64 # batch size may be increased based on the available memory
shuffle: true
num_workers: 8
pin_memory: true
带有在线增强的 Lhotse CutSet#
使用带有房间脉冲响应 (RIR) 卷积和加性噪声的在线增强的 Lhotse CutSet 格式的示例训练数据集可以配置如下
train_ds:
use_lhotse: true # enable Lhotse data loader
cuts_path: ??? # path to Lhotse cuts manifest with speech signals for augmentation (including custom "target_recording" field with the same signals)
truncate_duration: 4.00 # truncate audio to 4 seconds
truncate_offset_type: random # if the file is longer than truncate_duration, use random offset to select a subsegment
batch_size: 64 # batch size may be increased based on the available memory
shuffle: true
num_workers: 8
pin_memory: true
rir_enabled: true # enable room impulse response augmentation
rir_path: ??? # path to Lhotse recordings manifest with room impulse response signals
noise_path: ??? # path to Lhotse cuts manifest with noise signals
Lhotse Shar#
以 Lhotse shar 格式的示例训练数据集可以配置如下
train_ds:
shar_path: ???
use_lhotse: true
truncate_duration: 4.00 # truncate audio to 4 seconds
truncate_offset_type: random
batch_size: 8 # batch size may be increased based on the available memory
shuffle: true
num_workers: 8
pin_memory: true
带有 Lhotse shar 格式的配置文件可以在 示例配置中找到。
模型架构配置#
每个配置文件都应描述实验中使用的模型架构。 下面显示了一个简单的可预测模型配置示例
model:
type: predictive
sample_rate: 16000
skip_nan_grad: false
num_outputs: 1
normalize_input: true # normalize the input signal to 0dBFS
train_ds:
manifest_filepath: ???
input_key: noisy_filepath
target_key: clean_filepath
audio_duration: 2.00 # trim audio to 2 seconds
random_offset: true
normalization_signal: input_signal
batch_size: 8 # batch size may be increased based on the available memory
shuffle: true
num_workers: 8
pin_memory: true
validation_ds:
manifest_filepath: ???
input_key: noisy_filepath
target_key: clean_filepath
batch_size: 8
shuffle: false
num_workers: 4
pin_memory: true
encoder:
_target_: nemo.collections.audio.modules.transforms.AudioToSpectrogram
fft_length: 510 # Number of subbands in the STFT = fft_length // 2 + 1 = 256
hop_length: 128
magnitude_power: 0.5
scale: 0.33
decoder:
_target_: nemo.collections.audio.modules.transforms.SpectrogramToAudio
fft_length: ${model.encoder.fft_length}
hop_length: ${model.encoder.hop_length}
magnitude_power: ${model.encoder.magnitude_power}
scale: ${model.encoder.scale}
estimator:
_target_: nemo.collections.audio.parts.submodules.ncsnpp.SpectrogramNoiseConditionalScoreNetworkPlusPlus
in_channels: 1 # single-channel noisy input
out_channels: 1 # single-channel estimate
num_res_blocks: 3 # increased number of res blocks
pad_time_to: 64 # pad to 64 frames for the time dimension
pad_dimension_to: 0 # no padding in the frequency dimension
loss:
_target_: nemo.collections.audio.losses.MSELoss # computed in the time domain
metrics:
val:
sisdr: # output SI-SDR
_target_: torchmetrics.audio.ScaleInvariantSignalDistortionRatio
optim:
name: adam
lr: 1e-4
# optimizer arguments
betas: [0.9, 0.999]
weight_decay: 0.0
完整的配置文件可以在 示例配置中找到。
微调配置#
所有脚本都支持通过将预训练权重从检查点部分/完全加载到当前实例化的模型中来轻松进行微调。 请注意,当前实例化的模型应具有与预训练检查点匹配的参数,以便权重可以正确加载。
预训练权重可以通过以下方式提供
提供 NeMo 模型的路径(通过
init_from_nemo_model
)提供预训练 NeMo 模型的名称(将通过云下载)(通过
init_from_pretrained_model
)
从头开始训练#
可以使用以下命令从头开始训练模型
python examples/audio/audio_to_audio_train.py \
--config-path=<path to dir of configs>
--config-name=<name of config without .yaml>) \
model.train_ds.manifest_filepath="<path to manifest file>" \
model.validation_ds.manifest_filepath="<path to manifest file>" \
trainer.devices=1 \
trainer.accelerator='gpu' \
trainer.max_epochs=50
通过 NeMo 模型进行微调#
可以使用以下命令从现有 NeMo 模型微调模型
python examples/audio/audio_to_audio_train.py \
--config-path=<path to dir of configs>
--config-name=<name of config without .yaml>) \
model.train_ds.manifest_filepath="<path to manifest file>" \
model.validation_ds.manifest_filepath="<path to manifest file>" \
trainer.devices=1 \
trainer.accelerator='gpu' \
trainer.max_epochs=50 \
+init_from_nemo_model="<path to .nemo model file>"
通过 NeMo 预训练模型名称进行微调#
可以使用以下命令从预训练的 NeMo 模型微调模型
python examples/audio/audio_to_audio_train.py \
--config-path=<path to dir of configs>
--config-name=<name of config without .yaml>) \
model.train_ds.manifest_filepath="<path to manifest file>" \
model.validation_ds.manifest_filepath="<path to manifest file>" \
trainer.devices=1 \
trainer.accelerator='gpu' \
trainer.max_epochs=50 \
+init_from_pretrained_model="<name of pretrained checkpoint>"