重要提示

您正在查看 NeMo 2.0 文档。此版本对 API 和一个新库 NeMo Run 进行了重大更改。我们目前正在将 NeMo 1.0 的所有功能移植到 2.0。有关先前版本或 2.0 中尚不可用的功能的文档，请参阅 NeMo 24.07 文档。

重要提示

在开始本教程之前，请务必查看简介，以获取有关设置 NeMo-Aligner 环境的提示。

如果您遇到任何问题，请参阅 NeMo 的已知问题页面。该页面列举了已知问题，并在适当情况下提供了建议的解决方法。

完成本教程后，请参阅评估文档，以获取有关评估已训练模型的提示。

获取预训练模型#

NeMo 框架支持使用 NeMo-Aligner 代码库进行高效的模型对齐。NeMo-Aligner 中的所有算法都适用于任何基于 NeMo GPT 的模型。要查看将流行的模型从 Hugging Face 转换为 .nemo 格式的脚本集合，请转到此处。

要开始使用，您需要获取一个预训练模型进行对齐。建议使用三个模型：2B GPT、LLama3-8B 或 Nemotron-340B。出于演示目的，将使用较小的 2B 模型，但您可以按照本教程的其余部分使用这三个模型中的任何一个。

2B GPT

在 wget http://hugging-face.cn/nvidia/GPT-2B-001/resolve/main/GPT-2B-001_bf16_tp1.nemo 获取 2B 检查点。
使用 mkdir model_checkpoint && tar -xvf GPT-2B-001_bf16_tp1.nemo -C model_checkpoint 将 NeMo 文件解压到文件夹。

运行脚本以将旧的 NeMo 检查点转换为 Megatron Core 检查点。该脚本位于此处。

python /opt/NeMo/scripts/checkpoint_converters/convert_gpt_nemo_to_mcore.py \
   --input_name_or_path ./model_checkpoint \
   --output_path ./mcore_gpt.nemo

LLaMa3-8B

将 Llama3-8B LLM 模型和分词器下载到模型的文件夹中。您可以使用 Hugging Face CLI 来完成此操作

将 LLaMa3 LLM 转换为 .nemo 格式。

python /opt/NeMo/scripts/checkpoint_converters/convert_llama_hf_to_nemo.py \
    --input_name_or_path /path/to/llama --output_path /output_path/mcore_gpt.nemo

Nemotron-340B

从 Hugging Face 下载模型。

对于所有脚本，将 *.restore_from_path 指向您下载文件的目录。 .. note

Because of the 340B's size, it is recommended that you use TP8 PP24 which will be safe for algorithms in NeMo-Aligner.

完成这些步骤后，您将获得一个名为 mcore_gpt.nemo 的文件，可在 NeMo-Aligner 中使用。

注意

如果您自带 .nemo 模型，请确保在 NeMo-Aligner 配置文件中更改 model.encoder_seq_length 以匹配您自己的模型的序列长度。

注意

当使用 Megatron Core 模型（它使用 Transformer 引擎作为后端）时，系统会尝试查找高效的内核。但是，根据您的 GPU，它可能并不总是能找到它们。如果您遇到与内核查找相关的错误，请考虑在脚本顶部设置这些变量。

export NVTE_MASKED_SOFTMAX_FUSION=0
export NVTE_FLASH_ATTN=0
export NVTE_FUSED_ATTN=0

通过监督式微调 (SFT) 进行模型对齐#

SFT 是在输入和输出的监督数据上微调模型参数的过程。它教导模型如何遵循用户指定的指令。它通常在模型预训练之后完成。它也是从人类反馈中进行强化学习 (RLHF) 和直接偏好优化 (DPO) 的重要先决步骤。Nemo-Aligner 支持两种类型的 SFT 格式

提示-响应。在提示-响应格式中，每个示例都包含一个输入提示和带注释的响应。SFT 微调基础模型以遵循提示指令并以带注释的响应风格进行回答。提示-响应格式可用于各种问题，如问答 (Q&A) 和摘要。
聊天。在聊天格式中，每个示例都包含不同角色（例如，用户和助手）之间的多轮对话。在聊天格式数据集上微调基础模型对于对齐聊天机器人非常有用。

使用提示-响应数据集进行微调#

步骤 1：格式化数据。#

此示例使用 Dolly 数据集来演示如何格式化您的 SFT 数据。此数据集包含 15,000 个指令-上下文-响应三元组。

通过输入以下命令下载数据

wget http://hugging-face.cn/datasets/databricks/databricks-dolly-15k/resolve/main/databricks-dolly-15k.jsonl

下载的数据存储在文件 databricks-dolly-15k.jsonl 中，遵循 JSONL 格式，每行结构如下所示

{
    "instruction": "When did Virgin Australia start operating?",
    "context": "Virgin Australia, the trading name of Virgin Australia Airlines Pty Ltd, is an Australian-based airline. It is the largest airline by fleet size to use the Virgin brand. It commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route.[3] It suddenly found itself as a major airline in Australia's domestic market after the collapse of Ansett Australia in September 2001. The airline has since grown to directly serve 32 cities in Australia, from hubs in Brisbane, Melbourne and Sydney.[4]",
    "response": "Virgin Australia commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route.",
    "category": "closed_qa"
}

如本示例所示，没有 SFT 要求的明确的“输入”和“输出”字段。

有关如何将此数据格式处理成带有“输入”和“输出”字段的 JSONL 文件的示例，请参阅 preprocess.py

python preprocess.py --input databricks-dolly-15k.jsonl

此脚本将指令、上下文和响应字段转换为输入和输出。它还将指令和上下文字段与 \n\n 分隔符连接起来，并随机化它们在输入中出现的顺序，以生成新的 JSONL 文件。这将生成一个名为 databricks-dolly-15k-output.jsonl 的输出文件。一个示例如下所示

{
   "input": "When did Virgin Australia start operating?\n\nVirgin Australia, the trading name of Virgin Australia Airlines Pty Ltd, is an Australian-based airline. It is the largest airline by fleet size to use the Virgin brand. It commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route. It suddenly found itself as a major airline in Australia's domestic market after the collapse of Ansett Australia in September 2001. The airline has since grown to directly serve 32 cities in Australia, from hubs in Brisbane, Melbourne and Sydney.",
   "output": "Virgin Australia commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route.",
   "category": "closed_qa"
}

提示-响应数据集也支持序列打包。序列打包是一种训练技术，其中将多个训练示例连接起来以创建一个更长的序列。这种方法消除了填充的需要，并提高了 GPU 利用率。有关序列打包及其优点的详细概述，请参阅序列打包文档。

NeMo 提供了一个脚本来打包您的 SFT 提示-响应数据集。有关如何使用此脚本的详细信息，请参阅文档的准备数据集部分。

步骤 2：运行 SFT 训练。#

现在，您将使用数据进行 NeMo-Aligner 的监督式微调。

终端

要在终端上直接运行 SFT，请使用以下命令。为了成功执行，请确保 NeMo-Aligner 存储库设置为您当前的工作目录。

python examples/nlp/gpt/train_gpt_sft.py \
   trainer.precision=bf16 \
   trainer.num_nodes=1 \
   trainer.devices=8 \
   trainer.sft.max_steps=-1 \
   trainer.sft.limit_val_batches=40 \
   trainer.sft.val_check_interval=1000 \
   model.megatron_amp_O2=True \
   model.restore_from_path=/path/to/your/mcore_gpt.nemo \
   model.optim.lr=5e-6 \
   model.answer_only_loss=True \
   model.data.num_workers=0 \
   model.data.train_ds.micro_batch_size=1 \
   model.data.train_ds.global_batch_size=128 \
   model.data.train_ds.file_path=/path/to/databricks-dolly-15k-output.jsonl \
   model.data.validation_ds.micro_batch_size=1 \
   model.data.validation_ds.global_batch_size=128 \
   model.data.validation_ds.file_path=/path/to/databricks-dolly-15k-output.jsonl \
   exp_manager.create_wandb_logger=True \
   exp_manager.explicit_log_dir=/results \
   exp_manager.wandb_logger_kwargs.project=sft_run \
   exp_manager.wandb_logger_kwargs.name=dolly_sft_run \
   exp_manager.checkpoint_callback_params.save_nemo_on_train_end=True \
   exp_manager.resume_if_exists=True \
   exp_manager.resume_ignore_no_checkpoint=True \
   exp_manager.create_checkpoint_callback=True \
   exp_manager.checkpoint_callback_params.monitor=val_loss

Slurm

要通过 Slurm 运行 SFT，请运行以下命令

#!/bin/bash
#SBATCH -A <<YOUR ACCOUNT>>
#SBATCH -p <<<YOUR PARTITION>>>
#SBATCH -N 1
#SBATCH -t 4:00:00
#SBATCH -J <<<JOB NAME>>>
#SBATCH --ntasks-per-node=8
#SBATCH --exclusive
#SBATCH --overcommit

GPFS="/path/to/nemo-aligner-repo"

TRAIN_DATA_PATH="/path/to/databricks-dolly-15k-output.jsonl"
VALID_DATA_PATH="/path/to/databricks-dolly-15k-output.jsonl"

PRETRAINED_ACTOR_NEMO_FILE="/path/to/your/mcore_gpt.nemo"

PROJECT=WANDB_PROJECT # if you want to use wandb

RESULTS_DIR="/path/to/result_dir"

OUTFILE="${RESULTS_DIR}/sft-%j_%t.out"
ERRFILE="${RESULTS_DIR}/sft-%j_%t.err"
mkdir -p ${RESULTS_DIR}

CONTAINER=<<<CONTAINER>>> # use the latest NeMo Training container, Aligner will work there

MOUNTS="--container-mounts=MOUNTS" # mounts

read -r -d '' cmd <<EOF
echo "*******STARTING********" \
&& echo "---------------" \
&& echo "Starting training" \
&& cd ${GPFS} \
&& export PYTHONPATH="${GPFS}:${PYTHONPATH}" \
&& export HYDRA_FULL_ERROR=1 \
&& python -u ${GPFS}/examples/nlp/gpt/train_gpt_sft.py \
   trainer.precision=bf16 \
   trainer.num_nodes=${SLURM_JOB_NUM_NODES} \
   trainer.devices=8 \
   trainer.sft.max_steps=-1 \
   trainer.sft.limit_val_batches=40 \
   trainer.sft.val_check_interval=100 \
   trainer.sft.save_interval=100 \
   model.megatron_amp_O2=True \
   model.restore_from_path=${PRETRAINED_ACTOR_NEMO_FILE} \
   model.optim.lr=5e-6 \
   model.answer_only_loss=True \
   model.data.num_workers=0 \
   model.data.train_ds.micro_batch_size=1 \
   model.data.train_ds.global_batch_size=128 \
   model.data.train_ds.file_path=${TRAIN_DATA_PATH} \
   model.data.validation_ds.micro_batch_size=1 \
   model.data.validation_ds.global_batch_size=128 \
   model.data.validation_ds.file_path=${VALID_DATA_PATH} \
   exp_manager.create_wandb_logger=True \
   exp_manager.explicit_log_dir=${RESULTS_DIR} \
   exp_manager.wandb_logger_kwargs.project=${PROJECT} \
   exp_manager.wandb_logger_kwargs.name=dolly_sft_run \
   exp_manager.resume_if_exists=True \
   exp_manager.resume_ignore_no_checkpoint=True \
   exp_manager.create_checkpoint_callback=True \
   exp_manager.checkpoint_callback_params.save_nemo_on_train_end=True \
   exp_manager.checkpoint_callback_params.monitor=val_loss
EOF

srun --no-container-mount-home -o $OUTFILE -e $ERRFILE --container-image=$CONTAINER $MOUNTS bash -c "${cmd}"
set +x

如果使用序列打包，请将数据路径替换为打包数据集的路径。对于每个打包数据集，您还应在配置中设置 packed_sequence=True

+model.data.train_ds.packed_sequence=True \
+model.data.validation_ds.packed_sequence=True

不需要同时打包训练数据集和验证数据集。如果仅打包训练数据集，请排除 +model.data.validation_ds.packed_sequence=True。

要扩展到数千个 GPU，请根据机器的大小相应地调整 trainer.num_nodes 和 trainer.devices。如果您运行的是更大的模型，则可能需要更改并行性。如果您在使用 Llama3-8b 时内存不足，请将张量并行性添加到您的配置中

对于在 2B 模型上进行的特定运行，最终训练损失约为 1.536。训练完成后，您将找到一个名为 megatron_gpt_sft.nemo 的文件可供使用。

注意

NeMo 框架支持 WandB 日志记录。要开始使用 WandB，请参阅快速入门指南。您可以使用 exp_manager.create_wandb_logger=True 启用 WandB 日志记录，它会将作业结果记录到 WandB。

提供的 Slurm 脚本依赖于 pyxis Slurm 扩展，该扩展需要指定 --container-image= --container-mounts=。但是，重要的是要注意，NeMo-Aligner 也可以在没有此扩展的常规 Python 环境中运行。

步骤 3：运行推理或进一步微调。#

给定已训练的 SFT 模型，您可以在新示例上运行推理或微调 SFT 模型以提高性能（例如，RLHF 或 DPO）。重要的是要注意，它们的输入需要遵循此模型中使用的提示模板。该模板由 data.train_ds.prompt_template 设置。保存的 NeMo 模型 megatron_gpt_sft.nemo 也存储了提示格式。您可以 tar -xvf megatron_gpt_sft.nemo 并在 model_config.yaml 中找到它。

在本示例中，模板是 "{input} {output}"。

使用聊天数据集进行微调#

步骤 1：格式化数据。#

在本示例中，您使用 OpenAssistant 数据集。通过使用以下脚本下载数据集并将其转换为聊天格式

python /opt/NeMo-Aligner/examples/nlp/data/steerlm/preprocess_openassistant_data.py --output_directory=data/oasst

步骤 2：运行 SFT 训练。#

现在，您将使用数据进行 NeMo-Aligner 的监督式微调。与使用提示-响应数据集的 SFT 相比，您需要设置 model.data.chat=True。

终端

要在终端上直接运行 SFT，请使用以下命令。为了成功执行，请确保 NeMo-Aligner 存储库设置为您当前的工作目录。

python examples/nlp/gpt/train_gpt_sft.py \
   trainer.precision=bf16 \
   trainer.num_nodes=1 \
   trainer.devices=8 \
   trainer.sft.max_steps=-1 \
   trainer.sft.limit_val_batches=40 \
   trainer.sft.val_check_interval=1000 \
   model.megatron_amp_O2=True \
   model.restore_from_path=/path/to/your/mcore_gpt.nemo \
   model.optim.lr=5e-6 \
   model.data.chat=True \
   model.data.num_workers=0 \
   model.data.train_ds.micro_batch_size=1 \
   model.data.train_ds.global_batch_size=128 \
   model.data.train_ds.max_seq_length=4096 \
   model.data.train_ds.file_path=data/oasst/train.jsonl \
   model.data.validation_ds.micro_batch_size=1 \
   model.data.validation_ds.global_batch_size=128 \
   model.data.validation_ds.file_path=data/oasst/val.jsonl \
   model.data.validation_ds.max_seq_length=4096 \
   exp_manager.create_wandb_logger=True \
   exp_manager.explicit_log_dir=/results \
   exp_manager.wandb_logger_kwargs.project=sft_run \
   exp_manager.wandb_logger_kwargs.name=chat_sft_run \
   exp_manager.checkpoint_callback_params.save_nemo_on_train_end=True \
   exp_manager.resume_if_exists=True \
   exp_manager.resume_ignore_no_checkpoint=True \
   exp_manager.create_checkpoint_callback=True \
   exp_manager.checkpoint_callback_params.monitor=validation_loss

Slurm

要通过 Slurm 运行 SFT，请运行以下命令

#!/bin/bash
#SBATCH -A <<YOUR ACCOUNT>>
#SBATCH -p <<<YOUR PARTITION>>>
#SBATCH -N 1
#SBATCH -t 4:00:00
#SBATCH -J <<<JOB NAME>>>
#SBATCH --ntasks-per-node=8
#SBATCH --gpus-per-node=8
#SBATCH --exclusive
#SBATCH --overcommit

GPFS="/path/to/nemo-aligner-repo"

TRAIN_DATA_PATH="data/oasst/train.jsonl"
VALID_DATA_PATH="data/oasst/val.jsonl"

PRETRAINED_ACTOR_NEMO_FILE="/path/to/your/mcore_gpt.nemo"

PROJECT=WANDB_PROJECT # if you want to use wandb

RESULTS_DIR="/path/to/result_dir"

OUTFILE="${RESULTS_DIR}/sft-%j_%t.out"
ERRFILE="${RESULTS_DIR}/sft-%j_%t.err"
mkdir -p ${RESULTS_DIR}

CONTAINER=<<<CONTAINER>>> # use the latest NeMo Training container, Aligner will work there

MOUNTS="--container-mounts=MOUNTS" # mounts

read -r -d '' cmd <<EOF
echo "*******STARTING********" \
&& echo "---------------" \
&& echo "Starting training" \
&& cd ${GPFS} \
&& export PYTHONPATH="${GPFS}:${PYTHONPATH}" \
&& export HYDRA_FULL_ERROR=1 \
&& python -u ${GPFS}/examples/nlp/gpt/train_gpt_sft.py
   trainer.precision=bf16 \
   trainer.num_nodes=${SLURM_JOB_NUM_NODES} \
   trainer.devices=8 \
   trainer.sft.max_steps=-1 \
   trainer.sft.limit_val_batches=40 \
   trainer.sft.val_check_interval=1000 \
   model.megatron_amp_O2=True \
   model.restore_from_path=${PRETRAINED_ACTOR_NEMO_FILE} \
   model.optim.lr=5e-6 \
   model.data.chat=True \
   model.data.num_workers=0 \
   model.data.train_ds.micro_batch_size=1 \
   model.data.train_ds.global_batch_size=128 \
   model.data.train_ds.file_path=${TRAIN_DATA_PATH} \
   model.data.train_ds.max_seq_length=4096 \
   model.data.validation_ds.micro_batch_size=1 \
   model.data.validation_ds.global_batch_size=128 \
   model.data.validation_ds.file_path=${VALID_DATA_PATH} \
   model.data.validation_ds.max_seq_length=4096 \
   exp_manager.create_wandb_logger=True \
   exp_manager.explicit_log_dir=${RESULTS_DIR} \
   exp_manager.wandb_logger_kwargs.project=${PROJECT} \
   exp_manager.wandb_logger_kwargs.name=chat_sft_run \
   exp_manager.resume_if_exists=True \
   exp_manager.resume_ignore_no_checkpoint=True \
   exp_manager.create_checkpoint_callback=True \
   exp_manager.checkpoint_callback_params.save_nemo_on_train_end=True \
   exp_manager.checkpoint_callback_params.monitor=validation_loss
EOF

srun --no-container-mount-home -o $OUTFILE -e $ERRFILE --container-image=$CONTAINER $MOUNTS bash -c "${cmd}"
set +x

要扩展到数千个 GPU，请根据机器的大小相应地调整 trainer.num_nodes 和 trainer.devices。

对于在 Llama3-8b 模型上进行的特定运行，最终验证损失约为 1.546。训练完成后，您将找到一个名为 megatron_gpt_sft.nemo 的文件可供使用。

步骤 3：运行推理或进一步微调。#

给定已训练的 SFT 模型，您可以在新示例上运行推理或进一步微调 SFT 模型以提高性能（例如，RLHF 或 DPO）。重要的是要注意，它们的输入需要遵循此模型中使用的提示模板。该模板由 data.chat_prompt_tokens 设置。保存的 NeMo 模型 megatron_gpt_sft.nemo 存储了提示格式。您可以 tar -xvf megatron_gpt_sft.nemo 并在 model_config.yaml 中找到它。在本示例中，它是

prompt_template: "\0System\n{system message}\n\x11User\n{turn 1 user message}\n\x11Assistant\n\x12{turn 1 assistant label}\n{turn 1 assistant message}\n\x11User\n{turn 2 user message}\n\x11Assistant\n\x12{turn 2 assistant label}\n{turn 2 assistant message}\n\x11"

您可以使用 megatron_gpt_sft.nemo 和提示模板运行推理。当询问模型什么是机器学习？时，答案将如下所示

"""
Machine learning is a field of computer science that focuses on building algorithms that allow machines to improve their performance
on a task without being explicitly programmed to do so. It involves using data to train a model to make predictions or perform
tasks based on patterns in the data.\n\nExamples of machine learning include image recognition, natural language processing, and spam
filtering. Machine learning algorithms can be classified into supervised and unsupervised learning. In supervised learning, the
algorithm is provided with labeled examples of a target variable (such as the number of stars in a picture) and is tasked with learning a
function that maps input features to the target variable.  Unsupervised learning, on the other hand, involves finding structures in unlabeled
data without knowing what those structures are.\n\nMachine learning algorithms can be trained using a variety of techniques, including
gradient descent, stochastic gradient descent, and reinforcement learning. Once trained, the algorithms can be used to make predictions
or perform tasks on new data without explicit programming.\n\nMachine learning has been used in a wide range of fields, including healthcare,
finance, retail, and robotics. It has the potential to transform industries by enabling machines to process and understand vast amounts
of data, make predictions, and take actions autonomously.\n\nIn summary, machine learning is a branch of computer science that focuses on building
algorithms that allow machines to learn from data and improve their performance on a task without being explicitly programmed to do so.
"""