Clara Parabricks v4.4.0

FQ2BAM 教程

本教程将向您展示如何运行我们的核心比对工具 FQ2BAM,该工具允许您按照 GATK 最佳实践以极快的速度比对 FASTQ 文件。这包括黄金标准比对工具 BWA-MEM,它内置了输出文件的坐标排序,并可选择应用碱基质量分数重校准和标记重复读取。

fq2bam 工具对双端 FASTQ 文件数据进行比对、排序(按坐标)和标记重复项。本示例中使用的数据文件取自上一节中下载的示例数据。

注意

fq2bam 工具默认需要至少 38 GB 的 GPU 内存;--low-memory 选项将把此要求降低到 16 GB 的 GPU 内存,但会降低处理速度。

如果您使用 NVIDIA Parabricks 示例数据执行以下命令,您应该获得与此处显示的结果相同的结果。

在执行此命令之前,请确保您的当前目录是您提取示例数据的位置;它应该有一个 parabricks_sample 子目录。

复制
已复制!
            

$ docker run \ --gpus all \ --rm \ --volume $(pwd):/workdir \ --volume $(pwd):/outputdir \ nvcr.io/nvidia/clara/clara-parabricks:4.4.0-1 \ pbrun fq2bam \ --ref /workdir/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \ --in-fq /workdir/parabricks_sample/Data/sample_1.fq.gz /workdir/parabricks_sample/Data/sample_2.fq.gz \ --out-bam /outputdir/fq2bam_output.bam [Parabricks Options Mesg]: Checking argument compatibility [Parabricks Options Mesg]: Automatically generating ID prefix [Parabricks Options Mesg]: Read group created for /workdir/parabricks_sample/Data/sample_1.fq.gz and /workdir/parabricks_sample/Data/sample_2.fq.gz [Parabricks Options Mesg]: @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1 [PB Info 2022-Sep-02 19:49:27] ------------------------------------------------------------------------------ [PB Info 2022-Sep-02 19:49:27] || Parabricks accelerated Genomics Pipeline || [PB Info 2022-Sep-02 19:49:27] || Version 4.0.0-1 || [PB Info 2022-Sep-02 19:49:27] || GPU-BWA mem, Sorting Phase-I || [PB Info 2022-Sep-02 19:49:27] ------------------------------------------------------------------------------ [M::bwa_idx_load_from_disk] read 0 ALT contigs [PB Warning 2022-Sep-02 19:50:02][ParaBricks/src/pbOpts.cu:325] WARNING The system has 12 threads, however recommended number of threads with 1 GPU is 16. The run might not finish or might have less than expected performance. [PB Info 2022-Sep-02 19:50:02] GPU-BWA mem [PB Info 2022-Sep-02 19:50:02] ProgressMeter Reads Base Pairs Aligned [PB Info 2022-Sep-02 19:50:45] 5043564 580000000 [PB Info 2022-Sep-02 19:51:21] 10087128 1160000000 [PB Info 2022-Sep-02 19:51:59] 15130692 1740000000 [PB Info 2022-Sep-02 19:52:39] 20174256 2320000000 [PB Info 2022-Sep-02 19:53:20] 25217820 2900000000 [PB Info 2022-Sep-02 19:53:58] 30261384 3480000000 [PB Info 2022-Sep-02 19:54:36] 35304948 4060000000 [PB Info 2022-Sep-02 19:55:13] 40348512 4640000000 [PB Info 2022-Sep-02 19:55:53] 45392076 5220000000 [PB Info 2022-Sep-02 19:56:36] 50435640 5800000000 [PB Info 2022-Sep-02 19:57:02] GPU-BWA Mem time: 420.426442 seconds [PB Info 2022-Sep-02 19:57:02] GPU-BWA Mem is finished. [main] CMD: /usr/local/parabricks/binaries//bin/bwa mem -Z ./pbOpts.txt /workdir/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta /workdir/parabricks_sample/Data/sample_1.fq.gz /workdir/parabricks_sample/Data/sample_2.fq.gz @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1 [main] Real time: 455.468 sec; CPU: 4766.384 sec [PB Info 2022-Sep-02 19:57:02] ------------------------------------------------------------------------------ [PB Info 2022-Sep-02 19:57:02] || Program: GPU-BWA mem, Sorting Phase-I || [PB Info 2022-Sep-02 19:57:02] || Version: 4.0.0-1 || [PB Info 2022-Sep-02 19:57:02] || Start Time: Fri Sep 2 19:49:27 2022 || [PB Info 2022-Sep-02 19:57:02] || End Time: Fri Sep 2 19:57:02 2022 || [PB Info 2022-Sep-02 19:57:02] || Total Time: 7 minutes 35 seconds || [PB Info 2022-Sep-02 19:57:02] ------------------------------------------------------------------------------ [PB Info 2022-Sep-02 19:57:03] ------------------------------------------------------------------------------ [PB Info 2022-Sep-02 19:57:03] || Parabricks accelerated Genomics Pipeline || [PB Info 2022-Sep-02 19:57:03] || Version 4.0.0-1 || [PB Info 2022-Sep-02 19:57:03] || Sorting Phase-II || [PB Info 2022-Sep-02 19:57:03] ------------------------------------------------------------------------------ [PB Info 2022-Sep-02 19:57:03] progressMeter - Percentage [PB Info 2022-Sep-02 19:57:03] 0.0 0.00 GB [PB Info 2022-Sep-02 19:57:13] 72.8 0.00 GB [PB Info 2022-Sep-02 19:57:23] Sorting and Marking: 20.001 seconds [PB Info 2022-Sep-02 19:57:23] ------------------------------------------------------------------------------ [PB Info 2022-Sep-02 19:57:23] || Program: Sorting Phase-II || [PB Info 2022-Sep-02 19:57:23] || Version: 4.0.0-1 || [PB Info 2022-Sep-02 19:57:23] || Start Time: Fri Sep 2 19:57:03 2022 || [PB Info 2022-Sep-02 19:57:23] || End Time: Fri Sep 2 19:57:23 2022 || [PB Info 2022-Sep-02 19:57:23] || Total Time: 20 seconds || [PB Info 2022-Sep-02 19:57:23] ------------------------------------------------------------------------------ [PB Info 2022-Sep-02 19:57:23] ------------------------------------------------------------------------------ [PB Info 2022-Sep-02 19:57:23] || Parabricks accelerated Genomics Pipeline || [PB Info 2022-Sep-02 19:57:23] || Version 4.0.0-1 || [PB Info 2022-Sep-02 19:57:23] || Marking Duplicates, BQSR || [PB Info 2022-Sep-02 19:57:23] ------------------------------------------------------------------------------ [PB Info 2022-Sep-02 19:57:24] progressMeter - Percentage [PB Info 2022-Sep-02 19:57:34] 13.6 16.60 GB [PB Info 2022-Sep-02 19:57:44] 31.1 13.45 GB [PB Info 2022-Sep-02 19:57:54] 46.8 10.22 GB [PB Info 2022-Sep-02 19:58:04] 61.1 7.05 GB [PB Info 2022-Sep-02 19:58:14] 77.3 3.84 GB [PB Info 2022-Sep-02 19:58:24] 91.4 0.60 GB [PB Info 2022-Sep-02 19:58:34] 100.0 0.00 GB [PB Info 2022-Sep-02 19:59:18] BQSR and writing final BAM: 113.592 seconds [PB Info 2022-Sep-02 19:59:18] ------------------------------------------------------------------------------ [PB Info 2022-Sep-02 19:59:18] || Program: Marking Duplicates, BQSR || [PB Info 2022-Sep-02 19:59:18] || Version: 4.0.0-1 || [PB Info 2022-Sep-02 19:59:18] || Start Time: Fri Sep 2 19:57:23 2022 || [PB Info 2022-Sep-02 19:59:18] || End Time: Fri Sep 2 19:59:18 2022 || [PB Info 2022-Sep-02 19:59:18] || Total Time: 1 minute 55 seconds || [PB Info 2022-Sep-02 19:59:18] ------------------------------------------------------------------------------ Please visit https://docs.nvda.net.cn/clara/#parabricks for detailed documentation

在 AWS g4dn.8xlarge 实例(32 个 vCPU,一个 T4 GPU,128 GB 内存)上,这大约需要六分钟。

如果您收到内存不足错误,请确保您的计算机有足够的 RAM,并且没有其他程序占用大量内存。

fq2bam 命令生成三个输出文件

复制
已复制!
            

$ ls -l total 14330820 -rw-r--r-- 1 root root 4819386804 Sep 2 15:58 fq2bam_output.bam -rw-r--r-- 1 root root 6882792 Sep 2 15:59 fq2bam_output.bam.bai -rw-r--r-- 1 root root 87690 Sep 2 15:59 fq2bam_output_chrs.txt (input files not shown)

fq2bam_output.bam 的第一行(使用 samtools view fq2bam_output.bam 命令查看)如下所示

复制
已复制!
            

HWI-D00127:570:HK3TJBCX2:1:1202:9643:76055 99 chr1 10027 26 24M5I86M = 10178 231 ACCCTAACCCTAACCCTAACCCGACCCCGACCCCGACCCAAACCCAAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCTAACCCTAACCCTAACC DDDDDHGHIIIIIHIIHHIHHHIHIIIIIIHDHHIHHHIHIHIIIIFHIEHHIIHHIIIIEHIIIIHHIHIIICHE@1FHH?1GEFE1111D11<FH11<FD11<<FFE111<11 MD:Z:22T5T0A4T5T41A27 PG:Z:MarkDuplicatesRG:Z:HK3TJBCX2.1 NM:i:11 AS:i:69 XS:i:72 ....

注意

如果 fq2bam 命令在内存不足的系统上运行,您将在初始标头后看到此消息

警告
系统有 62 GB 内存,但建议 1 个 GPU 配备 64 GB RAM。
运行可能无法完成或性能可能低于预期。

上一篇 获取示例数据
下一篇 HaplotypeCaller 教程
© 版权所有 2025, Nvidia。 上次更新于 2025 年 1 月 13 日。