重要提示

您正在查看 NeMo 2.0 文档。此版本对 API 和新库 NeMo Run 进行了重大更改。我们目前正在将 NeMo 1.0 的所有功能移植到 2.0。有关先前版本或 2.0 中尚不可用的功能的文档，请参阅 NeMo 24.07 文档。

常用配置文件#

本节详细概述了 NeMo 框架配置文件设置，特别是针对 NeMo 语音增强型大型语言模型 (SpeechLLM) 集合中的模型。有关设置和执行所有 NeMo 框架模型通用的实验（包括实验管理器和 PyTorch Lightning 训练器参数）的基础知识，请参阅 core 文档。

NeMo SpeechLLM 的配置文件侧重于关键细节，例如数据集、增强、优化参数和模型架构规范。本页探讨了这些方面中的每一个方面。

在 examples 的 config 目录中查找所有 SpeechLLM 的示例配置文件。

数据集配置#

数据集配置基于 NeMo ASR 数据配置和 NLP 数据配置。

配置文件使您能够设置实验中使用的 Dataset 类接受的任何初始化参数。有关数据集及其参数的完整列表，请参阅 API 的数据集部分。

典型的训练配置如下

train_ds:
    manifest_filepath: ??? # Path to a list of JSONL files corresponding to the source data.
    global_batch_size: 4
    micro_batch_size: 2
    shuffle: True
    num_workers: 0
    pin_memory: True
    max_seq_length: 2048
    min_seq_length: 1
    drop_last: True
    concat_sampling_probabilities: null # When providing a list of datasets, this arg defines the sampling probabilities from each dataset when strategy='random'
    context_key: 'context'
    answer_key: 'answer'
    add_eos: True
    add_eos: False
    end_string: null
    add_sep: False
    add_bos: False
    separate_prompt_and_response_with_newline: False
    truncation_field: "context" # Options: ['context', 'answer']
    prompt_template: "Q: {context}\nA: {answer}" # fstring to use for assistant prompt. Example: "Q: {input}\nA: {output}"
    # ASR configs
    sample_rate: 16000 #${model.audio_encoder.preprocessor.sample_rate}
    max_duration: 24 # it is set for LibriSpeech, you may need to update it for your dataset
    min_duration: 0.1
    # tarred datasets
    is_tarred: false
    tarred_audio_filepaths: null
    shuffle_n: 2048
    # bucketing params
    bucketing_strategy: "fully_randomized"
    bucketing_batch_size: null
    # multi-audio configs
    audio_locator: null

关键配置参数包括

manifest_filepath：JSON 行格式的数据集路径，其中文件中的每一行都是一个 Python 字典。这可以是单个文件或文件列表。
global_batch_size：全局批大小，它考虑了梯度累积、数据并行性。
micro_batch_size：适合每个 GPU 的微批大小。
shuffle：是否打乱数据集。
num_workers：用于数据加载的工作线程数。
pin_memory：是否锁定内存以加快数据传输。
max_seq_length：LLM 的最大序列长度。
min_seq_length：LLM 的最小序列长度。
drop_last：如果最后一个批次小于批大小，是否丢弃。
context_key：JSON 行中与用于 LLM 输入的上下文对应的键。
answer_key：JSON 行中与用于真实答案的答案对应的键。
add_eos：是否添加序列结束标记。
add_bos：是否添加序列开始标记。
add_sep：是否添加分隔符标记。
end_string：用于触发生成结束的字符串，默认为 null 以使用 EOS 标记。
separate_prompt_and_response_with_newline：是否使用换行符分隔提示和响应。
truncation_field：如果序列长度超过最大序列长度，则要截断的字段。
prompt_template：用于 LLM 提示的 fstring，其中将格式化上下文和答案。
sample_rate：音频数据的采样率。
max_duration：要包含的音频数据的最大持续时间。
min_duration：要包含的音频数据的最小持续时间。
is_tarred：数据集是否为 tar 格式。
tarred_audio_filepaths：tar 格式音频文件的路径。
shuffle_n：在 tar 格式数据集中要打乱的样本数，不用于非 tar 格式数据集。
bucketing_strategy：用于分桶的策略，选项包括“fully_randomized”、“synced_randomized”。
bucketing_batch_size：每个桶要使用的批大小，如果未提供，则使用微批大小。
audio_locator：用于定位每个要放入文本提示中的音频位置的特殊字符串。

训练器配置#

本节概述了 Pytorch Lightning Trainer 对象的参数。

trainer:
  devices: 1 # number of GPUs (0 for CPU), or list of the GPUs to use e.g. [0, 1]
  num_nodes: 1
  max_epochs: -1
  max_steps: 2500000 # precedence over max_epochs
  logger: False  # Provided by exp_manager
  precision: bf16 # Should be set to 16 for O1 and O2 to enable the AMP.
  accelerator: gpu
  log_every_n_steps: 5  # Interval of logging.
  resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
  num_sanity_val_steps: 10 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
  enable_checkpointing: False # Provided by exp_manager
  accumulate_grad_batches: 1 # do not modify, grad acc is automatic for training megatron models
  gradient_clip_val: 1.0
  benchmark: False
  enable_model_summary: True

有关参数的详细列表，请参阅 Pytorch Lightning Trainer API 部分。

实验管理器配置#

NeMo 框架实验管理器提供了一种简化的方法来管理各种任务，例如日志记录、保存和恢复。

exp_manager:
  exp_dir: null  # exp_dir for your experiment, if None, defaults to "./nemo_experiments"
  name: ${name}
  create_wandb_logger: True
  wandb_logger_kwargs: # Whether you want exp_manger to create a Wandb logger
    name: training-session
    project: text2img
    group: nemo
    resume: True
  create_tensorboard_logger: True  # Whether you want exp_manger to create a tb logger
  create_checkpoint_callback: True  # Whether you want exp_manager to create a modelcheckpoint callback
  checkpoint_callback_params:
    monitor: reduced_train_loss
    save_top_k: 5
    every_n_epochs: 0 # Save checkpoint frequency.
    every_n_train_steps: 1000 # Mutually exclusive with every_n_epochs. It is recommended to set this if training on large-scale dataset.
    filename: '${name}--{reduced_train_loss:.2f}-{step}-{consumed_samples}'
  resume_if_exists: True
  resume_ignore_no_checkpoint: True
  resume_from_checkpoint: ${model.resume_from_checkpoint}
  ema:
    enable: True
    decay: 0.9999
    validate_original_weights: False
    every_n_steps: 1
    cpu_offload: False

优化器配置#

NeMo 框架提供了多种优化器来增强神经网络模型的训练。以下示例显示了 fused_adam 默认优化器。可以在 optim.sched 部分中指定学习率调度器。

optim:
  name: fused_adam
  lr: 0.0001
  eps: 1e-8
  betas: [ 0.9, 0.999 ]
  weight_decay: 0.01
  sched:
    name: WarmupPolicy
    warmup_steps: 10000
    warmup_ratio: null

有关支持的优化器的更多信息，请参阅 NeMo API 文档中的“优化”部分。

模型配置#

每个配置文件都应详细说明实验中使用的模型架构。

下表显示了大多数多模态语言模型中常用的参数。

参数	数据类型	描述
`micro_batch_size`	int	适合每个 GPU 的微批大小
`global_batch_size`	int	全局批大小，它考虑了梯度累积、数据并行性
`tensor_model_parallel_size`	int	层内模型并行
`pipeline_model_parallel_size`	int	层间模型并行
`seed`	int	训练中使用的种子

语音增强型语言模型 (SALM)#

有关 SALM 模型特定配置的信息，请参阅示例。

TwO Worlds (BESTOW) 的最佳功能#

有关 BESTOW 模型特定配置的信息，请参阅示例。