跳到内容

单细胞行数据集

SingleCellRowDataset

基类:SingleCellRowDatasetCore, Dataset

ann 数据帧(具有备用数组格式的 hdf5 文件)中的一行。

源代码在 bionemo/scdl/api/single_cell_row_dataset.py
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
class SingleCellRowDataset(SingleCellRowDatasetCore, Dataset):
    """One row in an ann dataframe (hdf5 file with a spare array format)."""

    @abstractmethod
    def load(self, data_path: str) -> None:
        """Loads the data from datapath.

        Calls to __len__ and __getitem__ Must be valid after a call to
        this method.
        """
        raise NotImplementedError()

    @abstractmethod
    def save(self, data_path: str) -> None:
        """Saves the class to an archive at datapath."""
        raise NotImplementedError()

    pass

load(data_path) abstractmethod

从数据路径加载数据。

调用 lengetitem 在调用此方法后必须有效。

源代码在 bionemo/scdl/api/single_cell_row_dataset.py
 93
 94
 95
 96
 97
 98
 99
100
@abstractmethod
def load(self, data_path: str) -> None:
    """Loads the data from datapath.

    Calls to __len__ and __getitem__ Must be valid after a call to
    this method.
    """
    raise NotImplementedError()

save(data_path) abstractmethod

将类保存到数据路径的存档中。

源代码在 bionemo/scdl/api/single_cell_row_dataset.py
102
103
104
105
@abstractmethod
def save(self, data_path: str) -> None:
    """Saves the class to an archive at datapath."""
    raise NotImplementedError()

SingleCellRowDatasetCore

基类:ABC

实现实际的类似 ann 数据的接口。

源代码在 bionemo/scdl/api/single_cell_row_dataset.py
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
class SingleCellRowDatasetCore(ABC):
    """Implements the actual ann data-like interface."""

    @abstractmethod
    def load_h5ad(self, h5ad_path: str) -> None:
        """Loads an H5AD file and converts it into the backing representation.

        Calls to __len__ and __getitem__ Must be valid after a call to
        this method.
        """
        raise NotImplementedError()

    @abstractmethod
    def number_nonzero_values(self) -> int:
        """Return the number of non-zero values in the data."""
        raise NotImplementedError()

    @abstractmethod
    def number_of_values(self) -> int:
        """Return the total number of values in the data."""
        raise NotImplementedError()

    @abstractmethod
    def number_of_rows(self) -> int:
        """Return the number of rows in the data."""
        raise NotImplementedError()

    @abstractmethod
    def shape(self) -> Tuple[int, List[int]]:
        """Returns the shape of the object, which may be ragged.

        A ragged dataset is where the number and dimension of features
        can be different at every row.
        """
        raise NotImplementedError()

    def sparsity(self) -> float:
        """Return the sparsity of the underlying data.

        Sparsity is defined as the fraction of zero values in the data.
        It is within the range [0, 1.0]. If there are no values, the
        sparsity is defined as 0.0.
        """
        total_values = self.number_of_values()
        if total_values == 0:
            return 0.0

        nonzero_values = self.number_nonzero_values()
        zero_values = total_values - nonzero_values
        sparsity_value = zero_values / total_values
        return sparsity_value

    @abstractmethod
    def version(self) -> str:
        """Returns a version number.

        (following <major>.<minor>.<point> convention).
        """
        pass

load_h5ad(h5ad_path) abstractmethod

加载 H5AD 文件并将其转换为后备表示形式。

调用 lengetitem 在调用此方法后必须有效。

源代码在 bionemo/scdl/api/single_cell_row_dataset.py
32
33
34
35
36
37
38
39
@abstractmethod
def load_h5ad(self, h5ad_path: str) -> None:
    """Loads an H5AD file and converts it into the backing representation.

    Calls to __len__ and __getitem__ Must be valid after a call to
    this method.
    """
    raise NotImplementedError()

number_nonzero_values() abstractmethod

返回数据中非零值的数量。

源代码在 bionemo/scdl/api/single_cell_row_dataset.py
41
42
43
44
@abstractmethod
def number_nonzero_values(self) -> int:
    """Return the number of non-zero values in the data."""
    raise NotImplementedError()

number_of_rows() abstractmethod

返回数据中的行数。

源代码在 bionemo/scdl/api/single_cell_row_dataset.py
51
52
53
54
@abstractmethod
def number_of_rows(self) -> int:
    """Return the number of rows in the data."""
    raise NotImplementedError()

number_of_values() abstractmethod

返回数据中值的总数。

源代码在 bionemo/scdl/api/single_cell_row_dataset.py
46
47
48
49
@abstractmethod
def number_of_values(self) -> int:
    """Return the total number of values in the data."""
    raise NotImplementedError()

shape() abstractmethod

返回对象的形状,该形状可能是参差不齐的。

参差不齐的数据集是指每行的特征数量和维度可能不同的数据集。

源代码在 bionemo/scdl/api/single_cell_row_dataset.py
56
57
58
59
60
61
62
63
@abstractmethod
def shape(self) -> Tuple[int, List[int]]:
    """Returns the shape of the object, which may be ragged.

    A ragged dataset is where the number and dimension of features
    can be different at every row.
    """
    raise NotImplementedError()

sparsity()

返回底层数据的稀疏性。

稀疏性定义为数据中零值的比例。它在 [0, 1.0] 范围内。如果没有值,则稀疏性定义为 0.0。

源代码在 bionemo/scdl/api/single_cell_row_dataset.py
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
def sparsity(self) -> float:
    """Return the sparsity of the underlying data.

    Sparsity is defined as the fraction of zero values in the data.
    It is within the range [0, 1.0]. If there are no values, the
    sparsity is defined as 0.0.
    """
    total_values = self.number_of_values()
    if total_values == 0:
        return 0.0

    nonzero_values = self.number_nonzero_values()
    zero_values = total_values - nonzero_values
    sparsity_value = zero_values / total_values
    return sparsity_value

version() abstractmethod

返回版本号。

(遵循..惯例)。

源代码在 bionemo/scdl/api/single_cell_row_dataset.py
81
82
83
84
85
86
87
@abstractmethod
def version(self) -> str:
    """Returns a version number.

    (following <major>.<minor>.<point> convention).
    """
    pass