为语音提示创建语法

为语音提示创建语法#

本教程将引导您完成创建自定义语音提示语法的过程，以便在反向文本规范化中使用。语音提示语法的主要应用是为自动语音识别 (ASR) 提供特定的规范化后处理。

依赖项#

您需要下载语音提示语法。

前提条件#

本教程假定您熟悉有限状态接收器和转换器。假定您熟悉 Pynini 库和 NeMo 的加权有限状态转换器 (WFST) 教程。

概述#

在功能上，语音提示至少应具有以下内容

一个直通有限状态转换器 (FST)，用于按原样转换输入文本 ($\Sigma *$ - 字母表 $\Sigma$ 上所有可能字符串的集合)。此 FST 应具有相对于其他 FST 的最长距离/权重。
每个感兴趣类别的 FST。类别的 FST 可以从其他 FST 导入。但是，导出的结果 FST 将独立于它从中导入的 FST。

语音提示中的语法是动态组成的。语法可以是 FST 的独立引用，也可以是由句子上下文中的 FST 组成。语法用 \$<FSTNAME> 表示。对于英语，支持以下语法

$OOV_NUMERIC_SEQUENCE
$OOV_ALPHA_SEQUENCE
$OOV_ALPHA_NUMERIC_SEQUENCE
$FULLPHONENUM
$POSTALCODE
$OOV_CLASS_ORDINAL
$OOV_CLASS_NUMERIC
$PERCENT
$TIME
$MONEY
$MONTH
$DAY

在 Python 中使用现有的语音提示语法#

from speech_hint import apply_hint
import pynini
from pynini.lib import pynutil

# Applying `$FULLPHONENUM` Grammar on the Input Text
apply_hint("one eight hundred five five five four oh oh one","$FULLPHONENUM")

'1-800-555-4001'

apply_hint("my phone number is one eight hundred five five five four oh oh one","$FULLPHONENUM")

'my phone number is 1-800-555-4001'

# Specifying `$FULLPHONENUM` grammar in context
apply_hint("my phone number is one eight hundred five five five four oh oh one","my phone number is $FULLPHONENUM")

'my phone number is 1-800-555-4001'

apply_hint("I think my phone number is one eight hundred five five five four oh oh one","my phone number is $FULLPHONENUM")

'I think my phone number is 1-800-555-4001'

# Specifying `$FULLPHONENUM` Grammar in Context - Context Does Not Match
apply_hint("my phone number is one eight hundred five five five four oh oh one","my phone number is not $FULLPHONENUM")

'my phone number is one eight hundred five five five four oh oh one'

用于处理字母序列的示例语法#

假设我们需要构建一个语法来支持将字母序列转换为单个词语（‘i b m’ -> ‘ibm’）。有关详细的实现，请参阅oov_class_alpha_sequence.py 脚本。

# Function to Apply FST
def apply_fst(utterance, fst):
    try:
        return pynini.shortestpath(utterance @ fst).string().strip()
    except pynini.FstOpError:
        print(f"Error: No valid output with given input: '{utterance}, {fst}'")

from en.primitives import NEMO_ALPHA, NEMO_WHITE_SPACE
character = NEMO_ALPHA
word_fst = pynini.closure(character)
sequence = character + pynini.closure(pynutil.delete(" ") + character, 1)
fst = sequence @ (word_fst)

apply_fst('i b m', fst)

'ibm'

要在 speech_hints 中使用自定义 FST（语法），请使用合适的名称将其添加到 speech_hints.py 中的 fst_dict 中。然后，您可以使用 export_to_far.py 脚本将语法导出为 FST 存档 (.far) 文件。

NVIDIA Riva