推理端点#

AlphaFold2-Multimer NIM 提供了以下端点

protein-structure/alphafold2/multimer/predict-structure-from-sequences - 根据输入的氨基酸序列列表预测蛋白质结构。
protein-structure/alphafold2/multimer/predict-MSA-from-sequences - 执行多序列比对 (MSA) 并返回用于 AlphaFold2 推理的 MSA 和模板。此端点适用于在结构预测之前批量处理长时间运行且 CPU 密集型的 MSA 运行。
protein-structure/alphafold2/multimer/predict-structure-from-MSA - 从输入的 MSA 和模板执行结构预测。当使用预先计算或自定义/外部 MSA 时，这非常有用。

用法#

下面，我们概述了 API 的三个端点。我们给出了在 NIM 正确配置时应运行的真实请求示例。

从多个输入序列预测结构（多聚体）#

predict-structure-from-sequences 端点提供完整的端到端结构预测管道，即从蛋白质序列到多聚体蛋白质结构。它需要至少 1 个，最多 6 个氨基酸序列，尽管有许多可调参数

sequences：有效氨基酸序列的数组。如果您不确定您的序列是否有效，请参阅氨基酸代码表。
databases：包含 uniref90、mgnify 和 small_bfd 中任何一个的列表。这些数据库包含用于生成多序列比对 (MSA) 的序列，该 MSA 用作 AlphaFold2 中结构预测神经网络的输入。一般来说，传递所有三个数据库将提供最准确的结构预测，但代价是需要最长的运行时间。
algorithm：用于多序列比对的算法。目前，仅支持 jackhmmer。
e_value：用于过滤 MSA 中序列的序列 e 值。值越小意味着比对越严格 - 将包含起源概率较高的序列，但这也将降低 MSA 的灵敏度。默认值 0.0001 通常是一个不错的选择。此值的范围为 0 到 1。
bit_score：用于 MSA 之前过滤的序列比特分数。如果传递此值，则将使用它代替 e 值进行过滤。一个好的起点大约是 200。此值大于零。
iterations：要执行的 MSA 迭代次数。一般来说，默认的 iterations=1 就足够了，并且花费的时间最少。
relax_prediction：设置为 True 以在预测后运行结构松弛。默认情况下设置为 True，有助于修复预测结构中的冲突。

这是一个使用 cURL 查询序列和完整数据库集的示例

curl -X 'POST' \
    -i \
    "http://127.0.0.1:8000/protein-structure/alphafold2/multimer/predict-structure-from-sequences"  \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{"sequences": ["MNVIDIAIAMAI", "IAMNVIDIAAI"], "databases": ["uniref90", "mgnify", "small_bfd"]}'

这是相同的示例，但这次使用的是 Python requests 模块

import requests
import json

url = "http://127.0.0.1:8000/protein-structure/alphafold2/multimer/predict-structure-from-sequences"
sequences = ["MNVIDIAIAMAI", "IAMNVIDIAAI"]  # Replace with the actual sequences you want to perform structure prediction on.

headers = {
    "content-type": "application/json"
}

data = {
    "sequences": sequences,
    "databases": ["uniref90", "mgnify", "small_bfd"]
}

response = requests.post(url, headers=headers, data=json.dumps(data))

# Check if the request was successful
if response.ok:
    print("Request succeeded:", response.json())
else:
    print("Request failed:", response.status_code, response.text)

此端点的输出是一个 PDB 文件。PDB 格式可以使用 pymol 和其他查看程序轻松查看；有关文档和用法，请参阅 pymol 网站。

从多个输入序列预测 MSA（多聚体）#

predict-msa-from-sequences 端点生成用于结构预测的多序列比对 (MSA) 和模板。如果您想在不同的（CPU 密集型）节点上批量预测，这将非常有用。

以下是使用 cURL 的查询示例

curl -X 'POST' \
    -i \
    "http://127.0.0.1:8000/protein-structure/alphafold2/multimer/predict-msa-from-sequences"  \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{"sequences": ["MNVIDIAIAMAI", "IAMNVIDIAAI"], "databases": ["uniref90", "mgnify", "small_bfd"]}'

这是在 Python 中使用 requests 模块的相同查询

import requests
import json

url = "http://0:8000/protein-structure/alphafold2/multimer/predict-msa-from-sequences"
sequences = ["MNVIDIAIAMAI", "IAMNVIDIAAI"]  # Replace with the actual sequences you want to perform structure prediction on.

headers = {
    "content-type": "application/json"
}

data = {
    "sequences": sequences,
    "databases": ["uniref90", "mgnify", "small_bfd"]
}

response = requests.post(url, headers=headers, data=json.dumps(data))

# Check if the request was successful
if response.ok:
    print("Request succeeded:", response.json())
else:
    print("Request failed:", response.status_code, response.text)

predict-msa-from-sequences 端点接受以下参数

sequences：有效氨基酸序列的数组。如果您不确定您的序列是否有效，请参阅氨基酸代码表。
databases：包含 uniref90、mgnify 和 small_bfd 中任何一个的列表。这些数据库包含用于生成多序列比对 (MSA) 的序列，该 MSA 用作 AlphaFold2 中结构预测神经网络的输入。一般来说，传递所有三个数据库将提供最准确的结构预测，但代价是需要最长的运行时间。如果您必须只选择一个，则 uniref90 被认为是最佳选择，但仍然建议使用所有三个。
algorithm：用于多序列比对的算法。目前，仅支持 jackhmmer。
e_value：用于过滤 MSA 中序列的序列 e 值。值越小意味着比对越严格 - 将包含起源概率较高的序列，但这也将降低 MSA 的灵敏度。默认值 0.0001 通常是一个不错的选择。此值的范围为 0 到 1。
bit_score：用于 MSA 之前过滤的序列比特分数。如果传递此值，则将使用它代替 e 值进行过滤。一个好的起点大约是 200。此值大于零。
iterations：要执行的 MSA 迭代次数。一般来说，默认的 iterations=1 就足够了，并且花费的时间最少。

从 MSA 预测蛋白质结构#

predict-structure-from-msa 端点接受 predict-msa-from-sequences 端点的结果并运行结构预测。

注意：我们不建议使用 CURL 运行 msa 到结构的预测。这是因为输入具有需要在 bash 中仔细转义的字符。为了获得最佳用户体验，我们建议通过 Python requests 模块与此端点进行交互。

predict-structure-from-msa 端点接受以下参数

sequences：有效氨基酸序列的数组。如果您不确定您的序列是否有效，请参阅氨基酸代码表。
alignments：来自 predict-msa-from-sequences 的 MSA 结果。这是一个字典数组，其中包含以下形式的元组：{<db name> : {<db name>, <MSA output>, <MSA output format>}}，每个输入氨基酸序列一个。
templates：来自结构数据库搜索的模板。这些模板采用 AlphaFold2 内部结构特定的格式；有关字段的更多详细信息，请参见此处。
relax_prediction：设置为 True 以在预测后运行结构松弛。默认情况下设置为 True，有助于修复预测结构中的冲突。

以下是使用 Python requests 模块向 predict-structure-from-msa 端点发出请求的示例

import requests
import json

url = "http://0:8000/protein-structure/alphafold2/multimer/predict-structure-from-msa"

sequences = ["STARWARSNVIDIAAAAAA"]  # Replace with the actual MSA sequences.

alignments = [{
    'uniref90': ['uniref90', '# STOCKHOLM 1.0\n\n-151285509650596177 STARWARSNVIDIAAAAAA\n#=GC RF             xxxxxxxxxxxxxxxxxxx\n//\n', 'sto'],
    'small_bfd': ['small_bfd', '# STOCKHOLM 1.0\n\n-151285509650596177 STARWARSNVIDIAAAAAA\n#=GC RF             xxxxxxxxxxxxxxxxxxx\n//\n', 'sto']
}]

templates = [
    [{'index': 1, 'name': '5X6U_E Ragulator complex protein LAMTOR3, Ragulator; Ragulator complex, scaffold, roadblock, lysosome; 2.4A {Homo sapiens}', 'aligned_cols': 10, 'sum_probs': 0.0, 'query': 'RSNVIDIAAA', 'hit_sequence': 'ASNIIDVSAA', 'indices_query': [6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'indices_hit': [23, 24, 25, 26, 27, 28, 29, 30, 31, 32]}, {'index': 2, 'name': '5X6V_E Ragulator complex protein LAMTOR3, Ragulator; Ragulator Rag GTPase complex, scaffold; 2.02A {Homo sapiens}', 'aligned_cols': 10, 'sum_probs': 7.9, 'query': 'RSNVIDIAAA', 'hit_sequence': 'ASNIIDVSAA', 'indices_query': [6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'indices_hit': [23, 24, 25, 26, 27, 28, 29, 30, 31, 32]}, {'index': 3, 'name': '6EHP_E Ragulator complex protein LAMTOR3, Ragulator; Scaffolding complex, Rag-GTPase, mTOR, Ragulator; 2.3A {Homo sapiens}', 'aligned_cols': 10, 'sum_probs': 0.0, 'query': 'RSNVIDIAAA', 'hit_sequence': 'ASNIIDVSAA', 'indices_query': [6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'indices_hit': [45, 46, 47, 48, 49, 50, 51, 52, 53, 54]}, {'index': 4, 'name': '6EHR_E Ragulator complex protein LAMTOR3, Ragulator; Scaffolding complex, Rag-GTPases, mTOR, Ragulator; 2.898A {Homo sapiens}', 'aligned_cols': 10, 'sum_probs': 7.8, 'query': 'RSNVIDIAAA', 'hit_sequence': 'ASNIIDVSAA', 'indices_query': [6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'indices_hit': [45, 46, 47, 48, 49, 50, 51, 52, 53, 54]}, {'index': 5, 'name': '6CTD_B Large-conductance mechanosensitive channel; Channel Mechanosensitive Mycobacterium tuberculosis, MEMBRANE; 5.8A {Mycobacterium tuberculosis (strain ATCC 25177 / H37Ra)}', 'aligned_cols': 11, 'sum_probs': 8.7, 'query': 'ARSNVIDIAAA', 'hit_sequence': 'ARGNIVDLAVA', 'indices_query': [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'indices_hit': [29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]}, {'index': 6, 'name': '3HZQ_A Large-conductance mechanosensitive channel; intermediate state Mechanosensitive channel osmoregulation; 3.82A {Staphylococcus aureus subsp. aureus MW2}', 'aligned_cols': 11, 'sum_probs': 8.6, 'query': 'ARSNVIDIAAA', 'hit_sequence': 'LKGNVLDLAIA', 'indices_query': [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'indices_hit': [29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]}, {'index': 7, 'name': '6B9X_A Ragulator complex protein LAMTOR1, Ragulator; Ragulator, Lamtor, SIGNALING PROTEIN; 1.42A {Homo sapiens}', 'aligned_cols': 12, 'sum_probs': 0.0, 'query': 'WARSNVIDIAAA', 'hit_sequence': 'KTASNIIDVSAA', 'indices_query': [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'indices_hit': [59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70]}, {'index': 8, 'name': '4V7H_BM Ribosome; eukaryotic ribosome, 80S, RACK1 protein; HET: OMC, PSU, 5MU, 1MA, OMG, 5MC, YYG, 7MG, 2MG, H2U, M2G; 8.9A {Thermomyces lanuginosus}', 'aligned_cols': 15, 'sum_probs': 9.1, 'query': 'RWARSNVIDIAAAAA', 'hit_sequence': 'GWKAAAAAAAAAAAA', 'indices_query': [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17], 'indices_hit': [139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153]}, {'index': 9, 'name': '6QKP_A Nucleoid-associated protein Lsr2; Tuberculosis, DNA organisation, Transcriptional regulator; NMR {Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv)}', 'aligned_cols': 12, 'sum_probs': 9.2, 'query': 'RWARSNVIDIAA', 'hit_sequence': 'EWARRNGHNVST', 'indices_query': [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], 'indices_hit': [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33]}, {'index': 10, 'name': "1QGN_F CYSTATHIONINE GAMMA-SYNTHASE; METHIONINE BIOSYNTHESIS, PYRIDOXAL 5'-PHOSPHATE, GAMMA-FAMILY; HET: PLP; 2.9A {Nicotiana tabacum} SCOP: c.67.1.3", 'aligned_cols': 10, 'sum_probs': 0.0, 'query': 'NVIDIAAAAA', 'hit_sequence': 'KAVDAAAAAA', 'indices_query': [8, 9, 10, 11, 12, 13, 14, 15, 16, 17], 'indices_hit': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]}, {'index': 11, 'name': '2OAR_E Large-conductance mechanosensitive channel; stretch activated ion channel mechanosensitive; 3.5A {Mycobacterium tuberculosis H37Ra} SCOP: f.16.1.1', 'aligned_cols': 11, 'sum_probs': 8.9, 'query': 'ARSNVIDIAAA', 'hit_sequence': 'ARGNIVDLAVA', 'indices_query': [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'indices_hit': [32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42]}, {'index': 12, 'name': '5XKX_A Flavin-containing monooxygenase; Dimethylsulfoniopropionate (DMSP) lyase, LYASE; 1.5A {Acinetobacter bereziniae NIPH 3}', 'aligned_cols': 10, 'sum_probs': 8.0, 'query': 'ARWARSNVID', 'hit_sequence': 'TVWARTTAQD', 'indices_query': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11], 'indices_hit': [356, 357, 358, 359, 360, 361, 362, 363, 364, 365]}]
]

headers = {
    "content-type": "application/json"
}

data = {
    "sequences": sequences,
    "alignments": alignments,
    "templates": templates
}

response = requests.post(url, headers=headers, data=json.dumps(data))

# Check if the request was successful
if response.ok:
    print("Request succeeded:", response.json())
else:
    print("Request failed:", response.status_code, response.text)

结构预测模块的规模与序列长度呈二次方关系。长序列可能需要几个小时才能预测。