结构化生成#

用于 VLM 的 NIM 支持通过指定 JSON 模式、正则表达式、上下文无关文法，或将输出限制为某些特定选项来获取结构化输出。

当 NIM 是更大管道的一部分，并且 VLM 输出需要采用特定格式时，这可能很有用。

确保一致的输出格式用于下游处理
验证复杂数据结构
自动化数据提取从非结构化文本中
提高多步骤管道中的可靠性

以下是一些示例，说明如何以不同方式约束输出。

注意：只有 OpenAI 端点公开了用于结构化生成的输入字段。

JSON 模式#

您可以使用特定的 JSON 模式约束输出，方法是在 OpenAI 模式中使用 response_format 参数，并将 json_schema 作为 type。详细信息请参阅 OpenAI 文档（response_format 参数的新选项部分）

NVIDIA 建议您使用 json_schema 类型而不是 json_object 来指定 JSON 模式。使用 json_object 类型使模型能够生成任何有效的 JSON，包括空 JSON。

示例：从电影海报中提取信息#

from openai import OpenAI
from pydantic import BaseModel, Field
from typing import List, Optional

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")

# Define the pydantic models for the response format
class Date(BaseModel):
    day: int = Field(ge=1, le=31)
    month: int = Field(ge=1, le=12)
    year: Optional[int] = Field(ge=1895)

class MovieDetails(BaseModel):
    title: str
    release_date: Date
    publishers: List[str]

# Prepare the question and input image
messages = [
    {"role": "user", "content": [
        {
            "type": "text",
            "text": f"Look at the poster image. Return the title and other information about this movie in JSON format."
        },
        {
            "type": "image_url",
            "image_url":
            {
                "url": "https://vignette1.wikia.nocookie.net/disney/images/f/f2/Walleposter.jpg"
            }
        }
    ]},
]
# Send the request with `json_schema`
response = client.chat.completions.create(
    model="meta/llama-3.2-11b-vision-instruct",
    messages=messages,
    response_format={
        "type": "json_schema",
        "json_schema": {"name": "MovieDetails", "schema": MovieDetails.model_json_schema()}
    }
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# { "title": "WALL-E", "release_date":  {"year":  2008, "month": 6, "day": 27}, "publishers": ["Walt Disney Pictures", "Pixar Animation Studios"] }

较新版本的 OpenAI SDK 提供对 Pydantic 对象的原生支持，如原生 SDK 支持部分中所述。运行 pip install -U openai 以安装最新的 SDK 版本。

response = client.beta.chat.completions.parse(
    model="meta/llama-3.2-11b-vision-instruct",
    messages=messages,
    response_format=MovieDetails,
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# { "title": "WALL-E", "release_date":  {"year":  2008, "month": 6, "day": 27}, "publishers": ["Walt Disney Pictures", "Pixar Animation Studios"] }

通过使用 JSON 模式，您可以确保 VLM 的输出符合特定的结构，从而更容易在应用程序的工作流程中处理和验证生成的数据。

正则表达式#

您可以为输出格式指定正则表达式，方法是在 OpenAI 模式的 nvext 扩展中使用 guided_regex 参数。

from openai import OpenAI

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
regex = "[1-5]"
messages = [
    {"role": "user", "content": [
        {
            "type": "text",
            "text": f"Return the number of cars seen in this image"
        },
        {
            "type": "image_url",
            "image_url":
            {
                "url": "https://cdn.ebaumsworld.com/mediaFiles/picture/202553/84419818.jpg"
            }
        }
    ]},
]
response = client.chat.completions.create(
    model="meta/llama-3.2-11b-vision-instruct",
    messages=messages,
    extra_body={"nvext": {"guided_regex": regex}},
    stream=False
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# Prints:
# 2

选项#

您可以为输出指定选项列表，方法是在 OpenAI 模式的 nvext 扩展中使用 guided_choice 参数。

from openai import OpenAI

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
choices = ["Good", "Bad", "Neutral"]
# We send the list of choices in the prompt to help the model but this is not
# strictly necessary, the model will have to follow the choices in any case
messages = [
    {"role": "user", "content": [
        {
            "type": "text",
            "text": f"What is the state of pollution in this image? It should be one of {choices}"
        },
        {
            "type": "image_url",
            "image_url":
            {
                "url": "https://m.media-amazon.com/images/I/51A5iA+lNcL._AC_.jpg"
            }
        }
    ]},
]

response = client.chat.completions.create(
    model="meta/llama-3.2-11b-vision-instruct",
    messages=messages,
    extra_body={"nvext": {"guided_choice": choices}},
    stream=False
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# Prints:
# Bad

上下文无关文法#

您可以指定 EBNF 格式的上下文无关文法，方法是在 OpenAI 模式的 nvext 扩展中使用 guided_grammar 参数。

该文法使用 EBNF 语言定义。

from openai import OpenAI


client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
grammar = """
    ?start: "There are " num " cars in this image."

    ?num: /[1-5]/
"""

messages = [
    {"role": "user", "content": [
        {
            "type": "text",
            "text": f"What is in this image?"
        },
        {
            "type": "image_url",
            "image_url":
            {
                "url": "https://m.media-amazon.com/images/I/51A5iA+lNcL._AC_.jpg"
            }
        }
    ]},
]
response = client.chat.completions.create(
    model="meta/llama-3.2-11b-vision-instruct",
    messages=messages,
    extra_body={"nvext": {"guided_grammar": grammar}},
    stream=False
)
completion = response.choices[0].message.content
print(completion)
# Prints:
# There are 2 cars in this image.