跳到内容

预处理

ResourcePreprocessor dataclass

基类: ABC

定义 ResourcePreprocessor 的接口。实现者承诺提供完整的 RemoteResource 和自由形式的预处理方法。此接口可用于通用地定义来自配置文件的workflow。

remote -> prepare -> prepared data.
源代码在 bionemo/geneformer/data/preprocess.py
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
@dataclass
class ResourcePreprocessor(ABC):
    """Interface defining a ResourcePreprocessor. Implementors promise to provide both a complete RemoteResource and a freeform
    preprocess method. This interface can be used to generically define a workflow from a config file.

        remote -> prepare -> prepared data.
    """  # noqa: D205

    root_directory: Optional[str] = field(default_factory=RemoteResource.get_env_tmpdir)
    dest_directory: str = "data"

    def get_checksums(self) -> List[str]:  # noqa: D102
        return [resource.checksum for resource in self.get_remote_resources()]

    def get_urls(self) -> List[str]:  # noqa: D102
        return [resource.url for resource in self.get_remote_resources()]

    @abstractmethod
    def get_remote_resources(self) -> List[RemoteResource]:
        """Gets the remote resources associated with this preparor."""
        raise NotImplementedError()

    @abstractmethod
    def prepare(self) -> List:
        """Returns a list of prepared filenames."""
        raise NotImplementedError()

get_remote_resources() abstractmethod

获取与此预处理器关联的远程资源。

源代码在 bionemo/geneformer/data/preprocess.py
44
45
46
47
@abstractmethod
def get_remote_resources(self) -> List[RemoteResource]:
    """Gets the remote resources associated with this preparor."""
    raise NotImplementedError()

prepare() abstractmethod

返回准备好的文件名列表。

源代码在 bionemo/geneformer/data/preprocess.py
49
50
51
52
@abstractmethod
def prepare(self) -> List:
    """Returns a list of prepared filenames."""
    raise NotImplementedError()