配置指南

guardrails 配置包括以下内容

通用选项：要使用的 LLM、通用说明（类似于系统提示）、示例对话、哪些 rail 处于活动状态、特定 rail 配置选项等；这些选项通常放在 config.yml 文件中。
Rail：实现 rail 的 Colang 流；这些通常放在 rails 文件夹中。
操作：在 Python 中实现的自定义操作；这些通常放在配置根目录中的 actions.py 模块中，或者 actions 子包中。
知识库文档：可在 RAG（检索增强生成）场景中使用，使用内置知识库支持的文档；这些文档通常放在 kb 文件夹中。
初始化代码：执行额外初始化的自定义 Python 代码，例如注册新型 LLM。

这些文件通常包含在 config 文件夹中，该文件夹在初始化 RailsConfig 实例或启动 CLI Chat 或 Server 时被引用。

.
├── config
│   ├── rails
│   │   ├── file_1.co
│   │   ├── file_2.co
│   │   └── ...
│   ├── actions.py
│   ├── config.py
│   └── config.yml

自定义操作可以放在配置根目录中的 actions.py 模块中，或者 actions 子包中

.
├── config
│   ├── rails
│   │   ├── file_1.co
│   │   ├── file_2.co
│   │   └── ...
│   ├── actions
│   │   ├── file_1.py
│   │   ├── file_2.py
│   │   └── ...
│   ├── config.py
│   └── config.yml

自定义初始化

如果存在 config.py 模块，则在初始化 LLMRails 实例之前加载它。

如果 config.py 模块包含 init 函数，则它会在 LLMRails 实例的初始化过程中被调用。例如，您可以使用 init 函数来初始化与数据库的连接，并使用 register_action_param(...) 函数将其注册为自定义操作参数

from nemoguardrails import LLMRails

def init(app: LLMRails):
    # Initialize the database connection
    db = ...

    # Register the action parameter
    app.register_action_param("db", db)

自定义操作参数在调用自定义操作时传递给它们。

通用选项

以下小节介绍了您可以在 config.yml 文件中使用的所有配置选项。

LLM 模型

要配置 guardrails 配置将使用的主 LLM 模型，您可以设置 models 键，如下所示

models:
  - type: main
    engine: openai
    model: gpt-3.5-turbo-instruct

属性的含义如下

type：设置为 “main”，表示主 LLM 模型。
engine：LLM 提供程序，例如 openai、huggingface_endpoint、self_hosted 等。
model：模型的名称，例如 gpt-3.5-turbo-instruct。
parameters：任何其他参数，例如 temperature、top_k 等。

支持的 LLM 模型

您可以使用 LangChain 支持的任何 LLM 提供程序，例如 ai21、aleph_alpha、anthropic、anyscale、azure、cohere、huggingface_endpoint、huggingface_hub、openai、self_hosted、self_hosted_hugging_face。查看 LangChain 官方文档以获取完整列表。

注意

要使用任何提供程序，您必须安装其他软件包；当您首次尝试使用具有新提供程序的配置时，通常会收到来自 LangChain 的错误，指示您应该安装哪些软件包。

重要提示

尽管您可以实例化任何先前提及的 LLM 提供程序，但根据模型的功能，NeMo Guardrails 工具包在某些提供程序上的效果更好。该工具包包含针对某些类型的模型（例如 openai 和 nemollm）优化的提示。对于其他模型，您可以按照 LLM 提示部分中的信息自行优化提示。

NIM for LLM

NVIDIA NIM 是一组易于使用的微服务，旨在加速在云、数据中心和工作站中部署生成式 AI 模型。NVIDIA NIM for LLM 将最先进的 LLM 的强大功能带到企业应用程序，提供无与伦比的自然语言处理和理解能力。了解有关 NIM 的更多信息。

NeMo Guardrails 支持连接到 NIM，如下所示

models:
  - type: main
    engine: nim
    model: <MODEL_NAME>
    parameters:
      base_url: <NIM_ENDPOINT_URL>

例如，要连接到本地部署的 meta/llama3-8b-instruct 模型（在端口 8000 上），请使用以下模型配置

models:
  - type: main
    engine: nim
    model: meta/llama3-8b-instruct
    parameters:
      base_url: https://:8000/v1

重要提示

要使用 nim LLM 提供程序，请使用命令 pip install langchain-nvidia-ai-endpoints 安装 langchain-nvidia-ai-endpoints 软件包。

NVIDIA AI 端点

NVIDIA AI 端点为用户提供对 NVIDIA 托管的 API 端点的轻松访问，用于 NVIDIA AI 基础模型，例如 Llama 3、Mixtral 8x7B 和 Stable Diffusion。这些模型托管在 NVIDIA API 目录上，经过优化、测试并托管在 NVIDIA AI 平台上，使其快速且易于评估、进一步自定义，并在任何加速堆栈上无缝运行以达到峰值性能。

要通过 NVIDIA AI 端点使用 LLM 模型，请使用以下模型配置

models:
  - type: main
    engine: nvidia_ai_endpoints
    model: <MODEL_NAME>

例如，要使用 llama3-8b-instruct 模型，请使用以下模型配置

models:
  - type: main
    engine: nvidia_ai_endpoints
    model: meta/llama3-8b-instruct

重要提示

要使用 nvidia_ai_endpoints LLM 提供程序，您必须使用命令 pip install langchain-nvidia-ai-endpoints 安装 langchain-nvidia-ai-endpoints 软件包，并配置有效的 NVIDIA_API_KEY。

有关更多信息，请参阅用户指南。

以下是使用 Ollama 的 llama3 模型的示例配置

models:
  - type: main
    engine: ollama
    model: llama3
    parameters:
      base_url: http://your_base_url

NeMo LLM 服务

除了 LangChain 支持的 LLM 提供程序之外，NeMo Guardrails 还支持 NeMo LLM 服务。例如，要使用 GPT-43B-905 模型作为主 LLM，您应该使用以下配置

models:
  - type: main
    engine: nemollm
    model: gpt-43b-905

您还可以为特定任务使用自定义的 NeMo LLM 模型，例如，自我检查用户输入或机器人输出。例如

models:
  # ...
  - type: self_check_input
    engine: nemollm
    model: gpt-43b-002
    parameters:
      tokens_to_generate: 10
      customization_id: 6e5361fa-f878-4f00-8bc6-d7fbaaada915

使用 parameters 键时，您可以指定其他参数。支持的参数有

temperature：应使用的温度，用于进行调用；
api_host：指向 NeMo LLM 服务主机（默认 ‘https://api.llm.ngc.nvidia.com’）；
api_key：应使用的 NeMo LLM 服务密钥；
organization_id：应使用的 NeMo LLM 服务组织 ID；
tokens_to_generate：要生成的最大令牌数；
stop：应使用的停止词列表；
customization_id：如果使用自定义，则应指定 id。

api_host、api_key 和 organization_id 分别从环境变量 NGC_API_HOST、NGC_API_KEY 和 NGC_ORGANIZATION_ID 自动获取。

有关更多详细信息，请参阅 NeMo LLM 服务文档，并查看 NeMo LLM 示例配置。

TRT-LLM

NeMo Guardrails 还支持连接到 TRT-LLM 服务器。

models:
  - type: main
    engine: trt_llm
    model: <MODEL_NAME>

以下是支持的参数及其默认值列表。有关更多详细信息，请参阅 TRT-LLM 文档。

models:
  - type: main
    engine: trt_llm
    model: <MODEL_NAME>
    parameters:
      server_url: <SERVER_URL>
      temperature: 1.0
      top_p: 0
      top_k: 1
      tokens: 100
      beam_width: 1
      repetition_penalty: 1.0
      length_penalty: 1.0

自定义 LLM 模型

要注册自定义 LLM 提供程序，您需要创建一个从 BaseLanguageModel 继承的类，并使用 register_llm_provider 注册它。

重要的是要实现以下方法

必需:

_call
_llm_type

可选:

_acall
_astream
_stream
_identifying_params

换句话说，要创建自定义 LLM 提供程序，您需要实现以下接口方法：_call、_llm_type，以及可选的 _acall、_astream、_stream 和 _identifying_params。以下是如何操作

from typing import Any, Iterator, List, Optional

from langchain.base_language import BaseLanguageModel
from langchain_core.callbacks.manager import (
    CallbackManagerForLLMRun,
    AsyncCallbackManagerForLLMRun,
)
from langchain_core.outputs import GenerationChunk

from nemoguardrails.llm.providers import register_llm_provider


class MyCustomLLM(BaseLanguageModel):

    def _call(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs,
    ) -> str:
        pass

    async def _acall(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
        **kwargs,
    ) -> str:
        pass

    def _stream(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> Iterator[GenerationChunk]:
        pass

    # rest of the implementation
    ...

register_llm_provider("custom_llm", MyCustomLLM)

然后，您可以在配置中使用自定义 LLM 提供程序

models:
  - type: main
    engine: custom_llm

为每个任务配置 LLM

与 LLM 的交互以面向任务的方式结构化。LLM 的每次调用都与特定任务相关联。这些任务是 guardrail 流程不可或缺的一部分，包括

generate_user_intent：此任务将原始用户话语转换为规范形式。例如，“Hello there” 可能会转换为 express greeting。
generate_next_steps：此任务确定机器人的响应或要执行的操作。示例包括 bot express greeting 或 bot respond to question。
generate_bot_message：此任务决定要返回的确切机器人消息。
general：此任务根据用户和机器人消息的历史记录生成下一个机器人消息。当未定义对话 rail（即，没有用户消息规范形式）时使用。

有关任务的完整列表，请参阅任务类型。

您可以为特定任务使用不同的 LLM 模型。例如，您可以对来自不同提供商的 self_check_input 和 self_check_output 任务使用不同的模型。以下是一个示例配置

models:
  - type: main
    model: meta/llama-3.1-8b-instruct
    engine: nim
  - type: self_check_input
    model: meta/llama3-8b-instruct
    engine: nim
  - type: self_check_output
    model: meta/llama-3.1-70b-instruct
    engine: nim

在前面的示例中，self_check_input 和 self_check_output 任务使用不同的模型。甚至可以更精细，并为 generate_user_intent 等任务使用不同的模型

models:
  - type: main
    model: meta/llama-3.1-8b-instruct
    engine: nim
  - type: self_check_input
    model: meta/llama3-8b-instruct
    engine: nim
  - type: self_check_output
    model: meta/llama-3.1-70b-instruct
    engine: nim
  - type: generate_user_intent
    model: meta/llama-3.1-8b-instruct
    engine: nim

请记住，最适合您需求的模型将取决于您的具体要求和约束。尝试不同的模型以查看哪种模型最适合您的特定用例通常是一个好主意。

嵌入模型

要配置 guardrails 流程中各个步骤（例如，规范形式生成和下一步生成）使用的嵌入模型，请在 models 键中添加模型配置，如以下配置文件所示

models:
  - ...
  - type: embeddings
    engine: FastEmbed
    model: all-MiniLM-L6-v2

FastEmbed 引擎是默认引擎，并使用 all-MiniLM-L6-v2 模型。NeMo Guardrails 还支持使用 OpenAI 模型来计算嵌入，例如

models:
  - ...
  - type: embeddings
    engine: openai
    model: text-embedding-ada-002

支持的嵌入提供程序

下表列出了支持的嵌入提供程序

提供程序名称	`engine_name`	`模型`
FastEmbed（默认）	`FastEmbed`	`all-MiniLM-L6-v2`（默认）等。
OpenAI	`openai`	`text-embedding-ada-002` 等。
SentenceTransformers	`SentenceTransformers`	`all-MiniLM-L6-v2` 等。
NVIDIA AI 端点	`nvidia_ai_endpoints`	`nv-embed-v1` 等。

注意

您可以为任何支持的嵌入提供程序使用任何支持的模型。上表包含一个可以使用的模型的示例。

自定义嵌入提供程序

您还可以使用 LLMRails.register_embedding_provider 函数注册自定义嵌入提供程序。

要注册自定义 LLM 提供程序，请创建一个从 EmbeddingModel 继承的类，并在 config.py 中注册它。

from typing import List
from nemoguardrails.embeddings.providers.base import EmbeddingModel
from nemoguardrails import LLMRails


class CustomEmbeddingModel(EmbeddingModel):
    """An implementation of a custom embedding provider."""
    engine_name = "CustomEmbeddingModel"

    def __init__(self, embedding_model: str):
        # Initialize the model
        ...

    async def encode_async(self, documents: List[str]) -> List[List[float]]:
        """Encode the provided documents into embeddings.

        Args:
            documents (List[str]): The list of documents for which embeddings should be created.

        Returns:
            List[List[float]]: The list of embeddings corresponding to the input documents.
        """
        ...

    def encode(self, documents: List[str]) -> List[List[float]]:
        """Encode the provided documents into embeddings.

        Args:
            documents (List[str]): The list of documents for which embeddings should be created.

        Returns:
            List[List[float]]: The list of embeddings corresponding to the input documents.
        """
        ...


def init(app: LLMRails):
    """Initialization function in your config.py."""
    app.register_embedding_provider(CustomEmbeddingModel, "CustomEmbeddingModel")

然后，您可以在配置中使用自定义嵌入提供程序

models:
  # ...
  - type: embeddings
    engine: SomeCustomName
    model: SomeModelName      # supported by the provider.

嵌入搜索提供程序

NeMo Guardrails 使用嵌入搜索（也称为向量数据库）来实现 guardrails 流程和知识库功能。默认嵌入搜索使用 FastEmbed 计算嵌入（all-MiniLM-L6-v2 模型）和 Annoy 执行搜索。如上一节所示，嵌入模型同时支持 FastEmbed 和 OpenAI。SentenceTransformers 也受支持。

对于高级用例或与现有知识库的集成，您可以提供自定义嵌入搜索提供程序。

通用说明

通用说明（类似于系统提示）附加到每个提示的开头，您可以按如下所示配置它们

instructions:
  - type: general
    content: |
      Below is a conversation between the NeMo Guardrails bot and a user.
      The bot is talkative and provides lots of specific details from its context.
      If the bot does not know the answer to a question, it truthfully says it does not know.

将来，将支持多种类型的说明，因此有 type 属性和数组结构。

示例对话

示例对话为用户和机器人之间的对话设定了基调。它将帮助 LLM 更好地学习对话的格式、语气以及响应的详细程度。此部分应至少包含两个轮次。由于我们将此示例对话附加到每个提示，因此建议保持其简短且相关。

sample_conversation: |
  user "Hello there!"
    express greeting
  bot express greeting
    "Hello! How can I assist you today?"
  user "What can you do for me?"
    ask about capabilities
  bot respond about capabilities
    "As an AI assistant, I can help provide more information on NeMo Guardrails toolkit. This includes question answering on how to set it up, use it, and customize it for your application."
  user "Tell me a bit about the what the toolkit can do?"
    ask general question
  bot response for general question
    "NeMo Guardrails provides a range of options for quickly and easily adding programmable guardrails to LLM-based conversational systems. The toolkit includes examples on how you can create custom guardrails and compose them together."
  user "what kind of rails can I include?"
    request more information
  bot provide more information
    "You can include guardrails for detecting and preventing offensive language, helping the bot stay on topic, do fact checking, perform output moderation. Basically, if you want to control the output of the bot, you can do it with guardrails."
  user "thanks"
    express appreciation
  bot express appreciation and offer additional help
    "You're welcome. If you have any more questions or if there's anything else I can help you with, please don't hesitate to ask."

操作服务器 URL

如果使用操作服务器，则必须在 config.yml 中配置 URL

actions_server_url: ACTIONS_SERVER_URL

LLM 提示

您可以使用 prompts 键自定义用于各种 LLM 任务（例如，生成用户意图、生成下一步、生成机器人消息）的提示。例如，要覆盖用于 openai/gpt-3.5-turbo 模型的 generate_user_intent 任务的提示

prompts:
  - task: generate_user_intent
    models:
      - openai/gpt-3.5-turbo
    max_length: 3000
    output_parser: user_intent
    content: |-
      <<This is a placeholder for a custom prompt for generating the user intent>>

对于每个任务，您还可以指定用于 LLM 调用的提示的最大长度（以字符数表示）。如果您想限制 LLM 使用的令牌数量，或者当您想确保提示长度不超过最大上下文长度时，这很有用。当超出最大长度时，会通过从对话历史记录中删除较旧的轮次来截断提示，直到提示的长度小于或等于最大长度。默认最大长度为 16000 个字符。

NeMo Guardrails 工具包使用的任务的完整列表如下

general：生成下一个机器人消息，当不使用规范形式时；
generate_user_intent：生成规范用户消息；
generate_next_steps：生成机器人应执行/说的下一件事；
generate_bot_message：生成下一个机器人消息；
generate_value：生成上下文变量的值（又名提取用户提供的值）；
self_check_facts：根据提供的证据检查机器人响应中的事实；
self_check_input：检查是否应允许来自用户的输入；
self_check_output：检查是否应允许机器人响应；
self_check_hallucination：检查机器人响应是否为幻觉。

您可以在 prompts 文件夹中查看默认提示。

多步生成

对于针对指令遵循进行微调的大型语言模型 (LLM)，特别是那些超过 1000 亿参数的模型，可以启用复杂的多步流的生成。

实验性：此功能是实验性的，仅应用于测试和评估目的。

enable_multi_step_generation: True

最低温度

此温度将用于需要确定性行为的任务（例如，dolly-v2-3b 需要严格为正数）。

lowest_temperature: 0.1

自定义数据

如果您需要将其他配置数据传递到配置的任何自定义组件，则可以使用 custom_data 字段。

custom_data:
  custom_config_field: "some_value"

例如，您可以在 config.py 中的 init 函数中访问自定义配置（请参阅自定义初始化）。

def init(app: LLMRails):
    config = app.config

    # Do something with config.custom_data

Guardrails 定义

Guardrails（或简称 rail）通过流实现。根据其角色，rail 可以分为几个主要类别

输入 rail：当收到来自用户的新输入时触发。
输出 rail：当应向用户发送新输出时触发。
对话 rail：在解释用户消息后触发，即已识别出规范形式。
检索 rail：在执行检索步骤后触发（即，retrieve_relevant_chunks 操作已完成）。
执行 rail：在调用操作之前和之后触发。

活动 rail 使用 config.yml 中的 rails 键配置。以下是一个快速示例

rails:
  # Input rails are invoked when a new message from the user is received.
  input:
    flows:
      - check jailbreak
      - check input sensitive data
      - check toxicity
      - ... # Other input rails

  # Output rails are triggered after a bot message has been generated.
  output:
    flows:
      - self check facts
      - self check hallucination
      - check output sensitive data
      - ... # Other output rails

  # Retrieval rails are invoked once `$relevant_chunks` are computed.
  retrieval:
    flows:
      - check retrieval sensitive data

所有非输入、输出或检索流的流都被视为对话 rail 和执行 rail，即，指示对话应如何进行以及何时以及如何调用操作的流。对话/执行 rail 流不需要在配置中显式枚举。但是，还有一些其他配置选项可用于控制其行为。

rails:
  # Dialog rails are triggered after user message is interpreted, i.e., its canonical form
  # has been computed.
  dialog:
    # Whether to try to use a single LLM call for generating the user intent, next step and bot message.
    single_call:
      enabled: False

      # If a single call fails, whether to fall back to multiple LLM calls.
      fallback_to_multiple_calls: True

    user_messages:
      # Whether to use only the embeddings when interpreting the user's message
      embeddings_only: False

输入 Rail

输入 rail 处理来自用户的消息。例如

define flow self check input
  $allowed = execute self_check_input

  if not $allowed
    bot refuse to respond
    stop

输入 rail 可以通过更改 $user_message 上下文变量来更改输入。

输出 Rail

输出 rail 处理机器人消息。要处理的消息在上下文变量 $bot_message 中可用。输出 rail 可以更改 $bot_message 变量，例如，以屏蔽敏感信息。

您可以通过将 $skip_output_rails 上下文变量设置为 True 来暂时停用下一个机器人消息的输出 rail。

检索 Rail

检索 rail 处理检索到的块，即 $relevant_chunks 变量。

对话 Rail

对话 rail 强制执行特定的预定义对话路径。要使用对话 rail，您必须为各种用户消息定义规范形式，并使用它们来触发对话流。查看 Hello World 机器人以获取快速示例。有关稍微高级的示例，请查看 ABC 机器人，其中对话 rail 用于确保机器人不谈论特定主题。

对话 rail 的使用需要一个三步过程

生成规范用户消息
确定下一步并执行它们
生成机器人话语

有关详细描述，请查看 Guardrails 流程。

上述每个步骤都可能需要 LLM 调用。

单次调用模式

从版本 0.6.0 开始，NeMo Guardrails 还支持“单次调用”模式，其中所有三个步骤都使用单次 LLM 调用执行。要启用它，您必须将 single_call.enabled 标志设置为 True，如下所示。

rails:
  dialog:
    # Whether to try to use a single LLM call for generating the user intent, next step and bot message.
    single_call:
      enabled: True

      # If a single call fails, whether to fall back to multiple LLM calls.
      fallback_to_multiple_calls: True

在典型的 RAG（检索增强生成）场景中，使用此选项可在延迟方面带来 3 倍的改进，并减少使用 37% 的令牌。

重要提示：目前，单次调用模式 只能预测机器人消息作为下一步。这意味着，如果您希望 LLM 泛化并决定在动态生成的用户规范形式消息上执行操作，则它将不起作用。

仅嵌入

加速对话 rail 的另一种选择是仅使用预定义用户消息的嵌入来决定用户输入的规范形式。要启用此选项，您必须设置 embeddings_only 标志，如下所示

rails:
  dialog:
    user_messages:
      # Whether to use only the embeddings when interpreting the user's message
      embeddings_only: True
      # Use only the embeddings when the similarity is above the specified threshold.
      embeddings_only_similarity_threshold: 0.5
      # When the fallback is set to None, if the similarity is below the threshold, the user intent is computed normally using the LLM.
      # When it is set to a string value, that string value will be used as the intent.
      embeddings_only_fallback_intent: None

重要提示：仅当提供足够的示例时才建议这样做。

异常

NeMo Guardrails 支持从 flows 内部引发异常。异常是一种事件，其名称以 Exception 结尾，例如 InputRailException。当引发异常时，最终输出是一条消息，其角色设置为 exception，内容设置为有关异常的附加信息。例如

define flow input rail example
  # ...
  create event InputRailException(message="Input not allowed.")

{
  "role": "exception",
  "content": {
    "type": "InputRailException",
    "uid": "45a452fa-588e-49a5-af7a-0bab5234dcc3",
    "event_created_at": "9999-99-99999:24:30.093749+00:00",
    "source_uid": "NeMoGuardrails",
    "message": "Input not allowed."
  }
}

Guardrails 库异常

默认情况下，Guardrails 库中包含的所有 guardrail 在 rail 被触发时都会返回预定义的消息。您可以通过在 config.yml 文件中将 enable_rails_exceptions 键设置为 True 来更改此行为

enable_rails_exceptions: True

启用此设置后，当 rail 被触发时，它们将返回异常消息。为了更好地理解幕后发生的事情，以下是 self check input rail 的实现方式

define flow self check input
  $allowed = execute self_check_input
  if not $allowed
    if $config.enable_rails_exceptions
      create event InputRailException(message="Input not allowed. The input was blocked by the 'self check input' flow.")
    else
      bot refuse to respond
      stop

注意：在 Colang 2.x 中，您必须将 $config.enable_rails_exceptions 更改为 $system.config.enable_rails_exceptions，并将 send 更改为 create event。

当 self check input rail 被触发时，将返回以下异常。

{
  "role": "exception",
  "content": {
    "type": "InputRailException",
    "uid": "45a452fa-588e-49a5-af7a-0bab5234dcc3",
    "event_created_at": "9999-99-99999:24:30.093749+00:00",
    "source_uid": "NeMoGuardrails",
    "message": "Input not allowed. The input was blocked by the 'self check input' flow."
  }
}

知识库文档

默认情况下，LLMRails 实例支持使用一组文档作为生成机器人响应的上下文。要将文档包含为知识库的一部分，您必须将它们放置在配置文件夹内的 kb 文件夹中

.
├── config
│   └── kb
│       ├── file_1.md
│       ├── file_2.md
│       └── ...

目前，仅支持 Markdown 格式。对其他格式的支持将在不久的将来添加。