加载

`default_ngc_client()`

创建一个默认的 NGC 客户端。

这应该从 ~/.ngc/config 或传递给 Docker 容器的环境变量中加载 NGC API 密钥。

源代码位于 bionemo/core/data/load.py

def default_ngc_client() -> ngcsdk.Client:
    """Create a default NGC client.

    This should load the NGC API key from ~/.ngc/config, or from environment variables passed to the docker container.
    """
    return ngcsdk.Client()

`default_pbss_client()`

为 PBSS 创建一个默认的 S3 客户端。

源代码位于 bionemo/core/data/load.py

def default_pbss_client():
    """Create a default S3 client for PBSS."""
    retry_config = Config(retries={"max_attempts": 10, "mode": "standard"})
    return boto3.client("s3", endpoint_url="https://pbss.s8k.io", config=retry_config)

`load(model_or_data_tag, source=DEFAULT_SOURCE, resources=None, cache_dir=None)`

从 PBSS 或 NGC 下载资源。

参数

名称	类型	描述	默认值
`model_or_data_tag`	`str`	指向所需资源的指针。必须是 resources 字典中的一个键。	必需
`source`	`SourceOptions`	可以是 "pbss" (NVIDIA 内部下载) 或 "ngc" (NVIDIA GPU 云)。默认为 "pbss"。	`DEFAULT_SOURCE`
`resources`	`dict[str, Resource] \| None`	资源的自定义字典。如果为 None，将使用默认资源。（主要用于测试。）	`None`
`cache_dir`	`Path \| None`	存储下载文件的目录。默认为 BIONEMO_CACHE_DIR。（主要用于测试。）	`None`

引发

类型	描述
`ValueError`	如果未找到所需的标签，或者请求了 NGC url 但未提供。

返回

类型	描述
`Path`	一个 Path 对象，指向下载的文件，或包含解压后的文件的文件夹
`Path`	文件。

示例

对于在 'filename.yaml' 中指定且标签为 'tag' 的资源，以下代码将下载文件

>>> load("filename/tag")
PosixPath(/tmp/bionemo/downloaded-file-name)

源代码位于 bionemo/core/data/load.py

def load(
    model_or_data_tag: str,
    source: SourceOptions = DEFAULT_SOURCE,
    resources: dict[str, Resource] | None = None,
    cache_dir: Path | None = None,
) -> Path:
    """Download a resource from PBSS or NGC.

    Args:
        model_or_data_tag: A pointer to the desired resource. Must be a key in the resources dictionary.
        source: Either "pbss" (NVIDIA-internal download) or "ngc" (NVIDIA GPU Cloud). Defaults to "pbss".
        resources: A custom dictionary of resources. If None, the default resources will be used. (Mostly for testing.)
        cache_dir: The directory to store downloaded files. Defaults to BIONEMO_CACHE_DIR. (Mostly for testing.)

    Raises:
        ValueError: If the desired tag was not found, or if an NGC url was requested but not provided.

    Returns:
        A Path object pointing either at the downloaded file, or at a decompressed folder containing the
        file(s).

    Examples:
        For a resource specified in 'filename.yaml' with tag 'tag', the following will download the file:
        >>> load("filename/tag")
        PosixPath(/tmp/bionemo/downloaded-file-name)
    """
    if resources is None:
        resources = get_all_resources()

    if cache_dir is None:
        cache_dir = BIONEMO_CACHE_DIR

    if model_or_data_tag not in resources:
        raise ValueError(f"Resource '{model_or_data_tag}' not found.")

    if source == "ngc" and resources[model_or_data_tag].ngc is None:
        raise ValueError(f"Resource '{model_or_data_tag}' does not have an NGC URL.")

    resource = resources[model_or_data_tag]
    filename = str(resource.pbss).split("/")[-1]

    extension = "".join(Path(filename).suffixes)
    processor = _get_processor(extension, resource.unpack, resource.decompress)

    if source == "pbss":
        download_fn = _s3_download
        url = resource.pbss

    elif source == "ngc":
        assert resource.ngc_registry is not None
        download_fn = NGCDownloader(filename=filename, ngc_registry=resource.ngc_registry)
        url = resource.ngc

    else:
        raise ValueError(f"Source '{source}' not supported.")

    download = pooch.retrieve(
        url=str(url),
        fname=f"{resource.sha256}-{filename}",
        known_hash=resource.sha256,
        path=cache_dir,
        downloader=download_fn,
        processor=processor,
    )

    # Pooch by default returns a list of unpacked files if they unpack a zipped or tarred directory. Instead of that, we
    # just want the unpacked, parent folder.
    if isinstance(download, list):
        return Path(processor.extract_dir)  # type: ignore

    else:
        return Path(download)