重要提示
您正在查看 NeMo 2.0 文档。此版本引入了 API 的重大更改和一个新的库,NeMo Run。我们目前正在将 NeMo 1.0 中的所有功能移植到 2.0。有关先前版本或 2.0 中尚不可用的功能的文档,请参阅 NeMo 24.07 文档。
任务去污染#
基类#
- class nemo_curator.tasks.DownstreamTask#
- class nemo_curator.tasks.import_task(task_path)#
模块#
- class nemo_curator.TaskDecontamination(
- tasks: DownstreamTask | Iterable[DownstreamTask],
- text_field='text',
- max_ngram_size=13,
- max_matches=10,
- min_document_length=200,
- remove_char_each_side=200,
- max_splits=10,
- removed_dir=None,
- call(
- dataset: DocumentDataset,
对数据集执行任意操作
- 参数:
dataset (DocumentDataset) – 要操作的数据集
- prepare_task_ngram_count() dict #
计算每个任务中所有 n-gram 的字典作为键,每个值设置为 0。
任务#
- class nemo_curator.tasks.Race(min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.Squad(min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.ArcEasy(min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.ArcChallenge(min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.OpenBookQA(min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.BoolQ(min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.Copa(min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.RTE(min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.MultiRC(min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.WSC(min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.CB(min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.ANLI(min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.Record(min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.COQA(file_path, min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.TriviaQA(min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.Quac(min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.WebQA(min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.Drop(min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.WiC(min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.MMLU(path=None, min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.BigBenchHard(path=None, min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.BigBenchLight(path=None, min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.Multilingual(path=None, min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.PIQA(min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.Winogrande(min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.Lambada(file_path, min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.NumDasc(n, file_path, min_ngram_size=8, max_ngram_size=13)#
- class nemo_curator.tasks.StoryCloze(file_path, min_ngram_size=8, max_ngram_size=13)#