MolMIM 终端节点#

MolMIM 提供以下终端节点和相关功能

/embedding - 从 MolMIM 检索给定输入分子的嵌入向量。
/hidden - 从 MolMIM 检索给定输入分子的隐藏状态（在 MolMIM 手稿的图 1 中显示为“潜在代码”）。
/decode - 将隐藏状态表示解码为 SMILES 字符串序列。
/sampling - 从种子分子在给定缩放半径内采样潜在空间。此方法以无引导方式从给定输入生成新的分子样本。
/generate - 生成新分子（可选地针对特定属性进行优化）。如果启用 CMA-ES 引导的采样，此方法将生成新的优化分子。

Notebooks#

下面，我们提供示例 notebooks，演示如何在药物发现的背景下使用这些终端节点。

使用 MolMIM 嵌入向量聚类分子 - 使用 MolMIM 的 /embedding 终端节点，在 MolMIM 的嵌入空间中按相似性聚类分子

ClusterMolMIMEmbeddings.ipynb
通过操作 MolMIM 隐藏状态在分子之间插值 - 使用 MolMIM 的 /hidden 和 /decode 终端节点，在两个不同的种子分子之间插值新的分子

MolMIMInterpolation.ipynb
使用 MolMIM NIM 采样化学空间以进行药物发现 - 使用 MolMIM 的 /sampling 和 /generate 终端节点，探索种子分子周围的分子空间并提高其药物相似性定量估计 (QED) 分数

MolMIMGeneration.ipynb

用法#

以下示例包括 CURL 和 Python 命令，用于测试每个终端节点。在适用的情况下，这些示例包括使用单个和多个 SMILES 序列示例测试终端节点功能的命令。

MolMIM NIM 将请求和其他信息记录到其运行的终端的 stdout。您可以参考这些输出，以识别任何请求的问题或验证正确处理的请求。

嵌入向量#

/embedding

请求体
- sequences：字符串数组（SMILES 字符串）
响应
- embeddings：浮点数数组的数组（嵌入向量）

以下命令向 /embedding 终端节点发送 POST 请求，提供包含单个分子序列 (CC(Cc1ccc(cc1)C(C(=O)O)C)C) 的 JSON 对象，以从 MolMIM 检索其嵌入向量。

Bash

curl -X 'POST' \
    -i \
    "https://:8000/embedding" \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'

Python

import requests
import json

url = "https://:8000/embedding"

headers = {
    'accept': 'application/json',
    'Content-Type': 'application/json'
}

data = json.dumps({"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]})

response = requests.post(url, headers=headers, data=data)

print(response.text)

下面的命令向 /embedding 终端节点发送 POST 请求，提供包含两个分子序列 (CN1C=NC2=C1C(=O)N(C(=O)N2C)C 和 CC(Cc1ccc(cc1)C(C(=O)O)C)C) 的 JSON 对象，以从 MolMIM 检索其嵌入向量。

Bash

curl -X 'POST' \
    -i \
    "https://:8000/embedding" \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'

Python

import requests
import json

url = "https://:8000/embedding"

data = {"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}

headers = {
    'accept': 'application/json',
    'Content-Type': 'application/json'
}

response = requests.post(url, headers=headers, json=data)

print(response.text)

隐藏状态#

/hidden

请求体
- sequences：字符串数组（SMILES 字符串）
响应
- hiddens：浮点数数组的数组的数组（隐藏状态）
- mask：布尔值数组的数组（掩码）

以下命令向 /hidden 终端节点发送 POST 请求，提供包含单个分子序列 (CC(Cc1ccc(cc1)C(C(=O)O)C)C) 的 JSON 对象，以从 MolMIM 检索其隐藏状态表示。响应将保存到本地文件 local-hidden-single.json。

Bash

curl -X 'POST' \
    "https://:8000/hidden" \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}' > local-hidden-single.json

Python

import requests
import json

url = "https://:8000/hidden"
headers = {
    'accept': 'application/json',
    'Content-Type': 'application/json'
}
data = '{"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'
response = requests.post(url, headers=headers, data=data)

with open('local-hidden-single.json', 'w') as f:
    json.dump(response.json(), f)

以下命令向 /hidden 终端节点发送 POST 请求，提供包含两个分子序列 (CN1C=NC2=C1C(=O)N(C(=O)N2C)C 和 CC(Cc1ccc(cc1)C(C(=O)O)C)C) 的 JSON 对象，以从 MolMIM 检索其隐藏状态表示。响应将保存到本地文件 local-hidden-multiple.json。

Bash

curl -X 'POST' \
    "https://:8000/hidden" \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'  > local-hidden-multiple.json

Python

import requests
import json

url = "https://:8000/hidden"
headers = {
    'accept': 'application/json',
    'Content-Type': 'application/json'
}

data = {
    "sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]
}

response = requests.post(url, headers=headers, json=data)

with open('local-hidden-multiple.json', 'w') as f:
    json.dump(response.json(), f)

解码#

/decode

请求体
- hiddens：浮点数数组的数组的数组（隐藏状态）
- mask：布尔值数组的数组（掩码）
响应
- generated：字符串数组（SMILES 字符串）

以下命令向 /decode 终端节点发送 POST 请求，提供 local-hidden-single.json 文件的内容（其中包含单个分子的隐藏状态表示），以将隐藏状态解码为 SMILES 字符串序列。

注意

对于下面的每个 /decode 命令，您将需要先前调用 /hidden 终端节点后保存的输出。

Bash

curl -X 'POST' \
    -i \
    "https://:8000/decode" \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '@./local-hidden-single.json'

Python

import requests
import json

with open('./local-hidden-single.json') as f:
    data = json.load(f)

response = requests.post('https://:8000/decode', 
                         headers={'accept': 'application/json', 'Content-Type': 'application/json'}, 
                         json=data)

print(response.text)

以下命令向 /decode 终端节点发送 POST 请求，提供 local-hidden-multiple.json 文件的内容（其中包含多个分子的隐藏状态表示），以将隐藏状态解码为 SMILES 字符串序列。

Bash

curl -X 'POST' \
    -i \
    "https://:8000/decode" \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '@./local-hidden-multiple.json'

Python

import requests
import json

with open('./local-hidden-multiple.json', 'r') as f:
    data = json.load(f)

response = requests.post('https://:8000/decode', 
                         headers={'accept': 'application/json', 'Content-Type': 'application/json'}, 
                         json=data)

print(response.text)

采样#

/sampling

请求体
- sequences：字符串数组（SMILES 字符串）
- beam_size：整数（束宽，介于 1 和 10 之间，默认值：1）
- num_molecules：整数（分子数量，介于 1 和 10 之间，默认值：1）
- scaled_radius：浮点数（缩放半径，介于 0 和 2 之间，默认值：0.7）
响应
- generated：字符串数组的数组（SMILES 字符串）

以下命令向 /sampling 终端节点发送 POST 请求，提供包含一个分子序列 (CN1C=NC2=C1C(=O)N(C(=O)N2C)C) 的 JSON 对象。MolMIM 服务器从每个种子分子在给定缩放半径内采样潜在空间，以无引导方式生成新的分子样本。

Bash

curl -X POST \
    localhost:8000/sampling \
    --header 'Content-Type: application/json' \
    -d '{"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'

Python

import requests
import json

url = "https://:8000/sampling"
data = {"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}

headers = {
    'Content-Type': 'application/json'
}

response = requests.post(url, headers=headers, json=data)

print(response.text)

以下命令向 /sampling 终端节点发送 POST 请求，提供包含两个分子序列 (CN1C=NC2=C1C(=O)N(C(=O)N2C)C 和 CC(Cc1ccc(cc1)C(C(=O)O)C)C) 的 JSON 对象。MolMIM 服务器从每个种子分子在给定缩放半径内采样潜在空间，以无引导方式生成新的分子样本。

Bash

curl -X POST \
    localhost:8000/sampling \
    --header 'Content-Type: application/json' \
    -d '{"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'

Python

import requests
import json

url = "https://:8000/sampling"
data = {"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}

headers = {"Content-Type": "application/json"}

response = requests.post(url, headers=headers, json=data)

print(response.text)

生成#

请求体
- smi：字符串（SMILES 字符串）
- algorithm：字符串（要使用的算法，可以是 “CMA-ES” 或 “none”，默认值：“CMA-ES”）
- iterations：整数（迭代次数，介于 1 和 1000 之间，默认值：10）
- min_similarity：浮点数（最小相似度，介于 0 和 0.7 之间，默认值：0.7）
- minimize：布尔值（是否最小化属性，默认值：false）
- num_molecules：整数（分子数量，介于 1 和 100 之间，默认值：10）
- particles：整数（粒子数量，介于 2 和 1000 之间，默认值：30）
- property_name：字符串（要优化的属性，可以是 “QED” 或 “plogP”，默认值：“QED”）
- scaled_radius：浮点数（缩放半径，介于 0 和 2 之间，默认值：1.0）
响应
- generated：字符串数组（SMILES 字符串）

/generate 终端节点提供两个备选选项

CMA-ES - 一种黑盒优化算法，可以引导 MolMIM 采样以优化特定属性；在本例中，为 QED 或 plogP。
随机采样 - 功能类似于 /sampling 终端节点，但采样参数的灵活性较差。

每种算法类型所需的参数

对于 “CMA-ES” 算法
- smi
- num_molecules
- property_name
- minimize
- min_similarity
- particles
- iterations
对于随机采样 (“none”) 算法
- smi
- num_molecules
- particles
- scaled_radius

第一组命令使用 CMA-ES 算法生成五个分子，最大化 QED 属性，最小相似度为 0.4，八个粒子和三个迭代。

Bash

curl --request POST \
    localhost:8000/generate \
    --header 'Content-Type: application/json' \
    --data-raw '{"smi":"CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "algorithm":"CMA-ES", "num_molecules":5, "property_name":"QED", "minimize": false, "min_similarity": 0.4, "particles": 8, "iterations": 3}'

Python

import requests
import json

url = 'https://:8000/generate'

data = {
    "smi": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C",
    "algorithm": "CMA-ES",
    "num_molecules": 5,
    "property_name": "QED",
    "minimize": False,
    "min_similarity": 0.4,
    "particles": 8,
    "iterations": 3
}

headers = {'Content-Type': 'application/json'}

response = requests.post(url, headers=headers, json=data)

print(response.text)

第二组命令使用 CMA-ES 算法生成五个分子，最大化 plogP，最小相似度为 0.4，八个粒子和三个迭代。

Bash

curl --request POST \
    localhost:8000/generate \
    --header 'Content-Type: application/json' \
    --data-raw '{"smi":"CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "algorithm":"CMA-ES", "num_molecules":5, "property_name":"plogP", "minimize": true, "min_similarity": 0.4, "particles": 8, "iterations": 3}'

Python

import requests
import json

url = "https://:8000/generate"

data = {
    "smi": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C",
    "algorithm": "CMA-ES",
    "num_molecules": 5,
    "property_name": "plogP",
    "minimize": True,
    "min_similarity": 0.4,
    "particles": 8,
    "iterations": 3
}

headers = {
    'Content-Type': 'application/json'
}

response = requests.post(url, headers=headers, json=data)

print(response.text)

最后一套命令使用随机采样 (“none”) 算法，以 SMILES 字符串 (CN1C=NC2=C1C(=O)N(C(=O)N2C)C) 指定的种子分子生成五个分子，使用八个粒子和 1.0 的缩放半径。

Bash

curl --request POST \
    localhost:8000/generate \
    --header 'Content-Type: application/json' \
    --data-raw '{"smi":"CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "algorithm":"none", "num_molecules":5, "particles": 8, "scaled_radius": 1.0}'

Python

import requests
import json

url = "https://:8000/generate"

data = {
    "smi": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C",
    "algorithm": "none",
    "num_molecules": 5,
    "particles": 8,
    "scaled_radius": 1.0
}

headers = {
    'Content-Type': 'application/json'
}

response = requests.post(url, headers=headers, json=data)

print(response.text)