DOCA Bench 示例调用
本指南提供该工具各种调用的示例,以帮助提供关于该工具和被测特性的指导和见解。
为了使示例更清晰,某些冗长的输出和重复的信息已被删除或缩短,特别是 DOCA Bench 首次执行时配置或默认值的输出已被删除。
命令行选项可能需要更新以适应您的环境(例如,TCP 地址、端口号、接口名称、用户名)。 有关更多信息,请参阅“命令行参数”部分。
此测试调用 DOCA Bench 以以太网接收模式运行,配置为接收大小为 1500 字节的以太网帧。
该测试使用单核运行 3 秒,并使用最大突发大小为 512 帧。
该测试在默认吞吐量模式下运行,吞吐量数据在测试运行结束时显示。
伴随应用程序使用 6 个核心持续发送大小为 1500 字节的以太网帧,直到被 DOCA Bench 停止。
命令行
doca_bench --core-mask 0x02 \
--pipeline-steps doca_eth::rx \
--device b1:00.1 \
--data-provider random-data \
--uniform-job-size 1500 \
--run-limit-seconds 3 \
--attribute doca_eth.max-burst-size=512 \
--companion-connection-string proto=tcp,addr=10.10.10.10,port=12345,user=bob,dev=ens4f1np1 \
--attribute doption.companion_app.path=/opt/mellanox/doca/tools/doca_bench_companion \
--companion-core-list 6 \
--job-output-buffer-size 1500 \
--mtu-size raw_eth
结果输出
[main] doca_bench : 2.7.0084
[main] release build
+ + + + + + + + + + + + + + + + + + + + + + + + + +
DOCA bench supported modules: [doca_comm_channel, doca_compress, doca_dma, doca_ec, doca_eth, doca_sha, doca_comch, doca_rdma, doca_aes_gcm]
+ + + + + + + + + + + + + + + + + + + + + + + + + +
DOCA bench configuration
Static configuration: [
Attributes: [doca_eth.l4-chksum-offload:false, doca_eth.max-burst-size:512, doption.companion_app.path:/opt/mellanox/doca/tools/doca_bench_companion, doca_eth.l3-chksum-offload:false]
Companion configuration: [
Device: ens4f1np1
Remote IP address: "bob@10.10.10.10"
Core set: [6]
]
Pipelines: [
Steps: [
name: "doca_eth::rx"
attributes: []
]
Use remote input buffers: no
Use remote output buffers: no
Latency bucket_range: 10000ns-110000ns
]
Run limits: [
Max execution time: 3seconds
Max jobs executed: -- not configured --
Max bytes processed: -- not configured --
]
Data provider: [
Name: "random-data"
Job output buffer size: 1500
]
Device: "b1:00.1"
Device representor: "-- not configured --"
Warm up job count: 100
Input files dir: "-- not configured --"
Output files dir: "-- not configured --"
Core set: [1]
Benchmark mode: throughput
Warnings as errors: no
CSV output: [
File name: -- not configured --
Selected stats: []
Deselected stats: []
Separate dynamic values: no
Collect environment information: no
Append to stats file: no
]
]
Test permutations: [
Attributes: []
Uniform job size: 1500
Core count: 1
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: ETH_FRAME
SQ depth: -- not configured --
RQ depth: -- not configured --
Input data file: -- not configured --
]
[main] Initialize framework...
[main] Start execution...
Preparing...
EAL: Detected CPU lcores: 36
EAL: Detected NUMA nodes: 4
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /run/user/48679/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
TELEMETRY: No legacy callbacks, legacy socket not created
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000:b1:00.1 (socket 2)
[08:19:32:110524][398304][DOCA][WRN][engine_model.c:90][adapt_queue_depth] adapting queue depth to 128.
Executing...
Data path thread [0] started...
WT[0] Executing 100 warm-up tasks using 100 unique tasks
Cleanup...
[main] Completed! tearing down...
Aggregate stats
Duration: 3000633 micro seconds
Enqueued jobs: 611215
Dequeued jobs: 611215
Throughput: 000.204 MOperations/s
Ingress rate: 002.276 Gib/s
Egress rate: 002.276 Gib/s
结果概述
由于指定了单核,因此只显示一个统计信息输出部分。
此测试调用 DOCA Bench 以以太网发送模式运行,配置为发送大小为 1500 字节的以太网帧
随机数据用于填充以太网帧
该测试使用单核运行 3 秒,并使用最大突发大小为 512 帧
未启用 L3 和 L4 校验和卸载
该测试在默认吞吐量模式下运行,吞吐量数据在测试运行结束时显示
伴随应用程序使用 6 个核心持续接收大小为 1500 字节的以太网帧,直到被 DOCA Bench 停止
命令行
doca_bench --core-mask 0x02 \
--pipeline-steps doca_eth::tx \
--device b1:00.1 \
--data-provider random-data \
--uniform-job-size 1500 \
--run-limit-seconds 3 \
--attribute doca_eth.max-burst-size=512 \
--attribute doca_eth.l4-chksum-offload=false \
--attribute doca_eth.l3-chksum-offload=false \
--companion-connection-string proto=tcp,addr=10.10.10.10,port=12345,user=bob,dev=ens4f1np1 \
--attribute doption.companion_app.path=/opt/mellanox/doca/tools/doca_bench_companion \
--companion-core-list 6 \
--job-output-buffer-size 1500
结果输出
[main] doca_bench : 2.7.0084
[main] release build
+ + + + + + + + + + + + + + + + + + + + + + + + + +
DOCA bench supported modules: [doca_comm_channel, doca_compress, doca_dma, doca_ec, doca_eth, doca_sha, doca_comch, doca_rdma, doca_aes_gcm]
+ + + + + + + + + + + + + + + + + + + + + + + + + +
DOCA bench configuration
Static configuration: [
Attributes: [doca_eth.l4-chksum-offload:false, doca_eth.max-burst-size:512, doption.companion_app.path:/opt/mellanox/doca/tools/doca_bench_companion, doca_eth.l3-chksum-offload:false]
Companion configuration: [
Device: ens4f1np1
Remote IP address: "bob@10.10.10.10"
Core set: [6]
]
Pipelines: [
Steps: [
name: "doca_eth::tx"
attributes: []
]
Use remote input buffers: no
Use remote output buffers: no
Latency bucket_range: 10000ns-110000ns
]
Run limits: [
Max execution time: 3seconds
Max jobs executed: -- not configured --
Max bytes processed: -- not configured --
]
Data provider: [
Name: "random-data"
Job output buffer size: 1500
]
Device: "b1:00.1"
Device representor: "-- not configured --"
Warm up job count: 100
Input files dir: "-- not configured --"
Output files dir: "-- not configured --"
Core set: [1]
Benchmark mode: throughput
Warnings as errors: no
CSV output: [
File name: -- not configured --
Selected stats: []
Deselected stats: []
Separate dynamic values: no
Collect environment information: no
Append to stats file: no
]
]
Test permutations: [
Attributes: []
Uniform job size: 1500
Core count: 1
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: -- not configured --
RQ depth: -- not configured --
Input data file: -- not configured --
]
[main] Initialize framework...
[main] Start execution...
Preparing...
Executing...
Data path thread [0] started...
WT[0] Executing 100 warm-up tasks using 100 unique tasks
Cleanup...
[main] Completed! tearing down...
Aggregate stats
Duration: 3000049 micro seconds
Enqueued jobs: 17135128
Dequeued jobs: 17135128
Throughput: 005.712 MOperations/s
Ingress rate: 063.832 Gib/s
Egress rate: 063.832 Gib/s
结果概述
由于指定了单核,因此只显示一个统计信息输出部分。
此测试在 x86 主机端调用 DOCA Bench 以运行 AES-GM 解密步骤
文件集文件用于指示要解密的文件。 文件集文件的内容列出了要解密的文件名。
用于加密和解密的密钥使用
doca_aes_gcm.key文件属性指定。 这包含要使用的密钥。它将运行直到处理完 5000 个作业
它在精确延迟模式下运行,延迟和吞吐量数据在测试运行结束时显示
指定了核心掩码以指示核心 12、13、14 和 15 将用于此测试
命令行
doca_bench --mode precision-latency \
--core-mask 0xf000 \
--warm-up-jobs 32 \
--device 17:00.0 \
--data-provider file-set \
--data-provider-input-file aes_64_128.fileset \
--run-limit-jobs 5000 \
--pipeline-steps doca_aes_gcm::decrypt \
--attribute doca_aes_gcm.key-file='aes128.key' \
--job-output-buffer-size 80
结果输出
[main] Completed! tearing down...
Worker thread[0](core: 12) stats:
Duration: 10697 micro seconds
Enqueued jobs: 5000
Dequeued jobs: 5000
Throughput: 000.467 MOperations/s
Ingress rate: 000.265 Gib/s
Egress rate: 000.223 Gib/s
Worker thread[1](core: 13) stats:
Duration: 10700 micro seconds
Enqueued jobs: 5000
Dequeued jobs: 5000
Throughput: 000.467 MOperations/s
Ingress rate: 000.265 Gib/s
Egress rate: 000.223 Gib/s
Worker thread[2](core: 14) stats:
Duration: 10733 micro seconds
Enqueued jobs: 5000
Dequeued jobs: 5000
Throughput: 000.466 MOperations/s
Ingress rate: 000.264 Gib/s
Egress rate: 000.222 Gib/s
Worker thread[3](core: 15) stats:
Duration: 10788 micro seconds
Enqueued jobs: 5000
Dequeued jobs: 5000
Throughput: 000.463 MOperations/s
Ingress rate: 000.262 Gib/s
Egress rate: 000.221 Gib/s
Aggregate stats
Duration: 10788 micro seconds
Enqueued jobs: 20000
Dequeued jobs: 20000
Throughput: 001.854 MOperations/s
Ingress rate: 001.050 Gib/s
Egress rate: 000.884 Gib/s
min: 1878 ns
max: 4956 ns
median: 2134 ns
mean: 2145 ns
90th %ile: 2243 ns
95th %ile: 2285 ns
99th %ile: 2465 ns
99.9th %ile: 3193 ns
99.99th %ile: 4487 ns
结果概述
由于指定了核心掩码但没有指定核心计数,因此将使用掩码中的所有核心。
对于使用的每个核心以及聚合统计信息,都会显示一个统计信息部分。
此测试在 BlueField 端调用 DOCA Bench 以运行 AES-GM 加密步骤
大小为 2KB 的文本文件是加密阶段的输入
用于加密和解密的密钥使用
doca_aes_gcm.key属性指定它将运行直到处理完 2000 个作业
它在批量延迟模式下运行,延迟和吞吐量数据在测试运行结束时显示
指定了单核和 2 个线程
命令行
doca_bench --mode bulk-latency \
--core-list 3 \
--threads-per-core 2 \
--warm-up-jobs 32 \
--device 03:00.0 \
--data-provider file \
--data-provider-input-file plaintext_2k.txt \
--run-limit-jobs 2000 \
--pipeline-steps doca_aes_gcm::encrypt \
--attribute doca_aes_gcm.key="0123456789abcdef0123456789abcdef" \
--uniform-job-size 2048 \
--job-output-buffer-size 4096
结果输出
[main] Completed! tearing down...
Worker thread[0](core: 3) stats:
Duration: 501 micro seconds
Enqueued jobs: 2048
Dequeued jobs: 2048
Throughput: 004.082 MOperations/s
Ingress rate: 062.279 Gib/s
Egress rate: 062.644 Gib/s
Worker thread[1](core: 3) stats:
Duration: 466 micro seconds
Enqueued jobs: 2048
Dequeued jobs: 2048
Throughput: 004.386 MOperations/s
Ingress rate: 066.922 Gib/s
Egress rate: 067.314 Gib/s
Aggregate stats
Duration: 501 micro seconds
Enqueued jobs: 4096
Dequeued jobs: 4096
Throughput: 008.163 MOperations/s
Ingress rate: 124.558 Gib/s
Egress rate: 125.287 Gib/s
Latency report:
:
:
:
:
:
::
::
::
::
.::. . . ..
------------------------------------------------------------------------------------------------------
[<10000ns]: 0
.. OUTPUT RETRACTED (SHORTENED) ..
[26000ns -> 26999ns]: 0
[27000ns -> 27999ns]: 128
[28000ns -> 28999ns]: 2176
[29000ns -> 29999ns]: 1152
[30000ns -> 30999ns]: 128
[31000ns -> 31999ns]: 0
[32000ns -> 32999ns]: 0
[33000ns -> 33999ns]: 128
[34000ns -> 34999ns]: 0
[35000ns -> 35999ns]: 0
[36000ns -> 36999ns]: 0
[37000ns -> 37999ns]: 0
[38000ns -> 38999ns]: 128
[39000ns -> 39999ns]: 0
[40000ns -> 40999ns]: 0
[41000ns -> 41999ns]: 0
[42000ns -> 42999ns]: 0
[43000ns -> 43999ns]: 128
[44000ns -> 44999ns]: 128
[45000ns -> 45999ns]: 0
.. OUTPUT RETRACTED (SHORTENED) ..
[>110000ns]: 0
结果概述
由于指定了单核,因此只显示一个统计信息输出部分。
此测试在主机端调用 DOCA Bench 以在管道中运行 2 个 AES-GM 步骤,首先加密文本文件,然后解密来自加密步骤的相关输出
大小为 2KB 的文本文件是加密阶段的输入
input-cwd选项指示 DOCA Bench 在不同的位置查找输入文件,在本例中为父目录用于加密和解密的密钥使用
doca_aes_gcm.key文件属性指定,指示密钥可以在指定的文件中找到它将运行直到处理完 204800 字节
它在默认吞吐量模式下运行,吞吐量数据在测试运行结束时显示
命令行
doca_bench --core-mask 0xf00 \
--core-count 1 \
--warm-up-jobs 32 \
--device 17:00.0 \
--data-provider file \
--input-cwd ../. \
--data-provider-input-file plaintext_2k.txt \
--run-limit-bytes 204800 \
--pipeline-steps doca_aes_gcm::encrypt,doca_aes_gcm::decrypt \
--attribute doca_aes_gcm.key-file='aes128.key' \
--uniform-job-size 2048 \
--job-output-buffer-size 4096
结果输出
Executing...
Worker thread[0](core: 8) [doca_aes_gcm::encrypt>>doca_aes_gcm::decrypt] started...
Worker thread[0] Executing 32 warm-up tasks using 32 unique tasks
Cleanup...
[main] Completed! tearing down...
Aggregate stats
Duration: 79 micro seconds
Enqueued jobs: 214
Dequeued jobs: 214
Throughput: 002.701 MOperations/s
Ingress rate: 041.214 Gib/s
Egress rate: 041.214 Gib/s
结果概述
由于指定了单核,因此只显示一个统计信息输出部分。
此测试在主机端调用 DOCA Bench 以使用 SHA256 算法执行 SHA 操作,并创建一个包含测试配置和统计信息的 CSV 文件
提供了一个包含 1 个核心的列表,每个核心的线程计数为 2
命令行
doca_bench --core-mask 2 \
--threads-per-core 2 \
--pipeline-steps doca_sha \
--device d8:00.0 \
--data-provider random-data \
--uniform-job-size 2048 \
--job-output-buffer-size 2048 \
--run-limit-seconds 3 \
--attribute doca_sha.algorithm=sha256 \
--warm-up-jobs 100 \
--csv-output-file /tmp/sha_256_test.csv
结果输出
Executing...
Data path thread [0] started...
WT[0] Executing 100 warm-up tasks using 100 unique tasks
Data path thread [1] started...
WT[1] Executing 100 warm-up tasks using 100 unique tasks
Cleanup...
[main] Completed! tearing down...
Stats for thread[0](core: 1)
Duration: 3000064 micro seconds
Enqueued jobs: 3713935
Dequeued jobs: 3713935
Throughput: 001.238 MOperations/s
Ingress rate: 018.890 Gib/s
Egress rate: 000.295 Gib/s
Stats for thread[1](core: 1)
Duration: 3000056 micro seconds
Enqueued jobs: 3757335
Dequeued jobs: 3757335
Throughput: 001.252 MOperations/s
Ingress rate: 019.110 Gib/s
Egress rate: 000.299 Gib/s
Aggregate stats
Duration: 3000064 micro seconds
Enqueued jobs: 7471270
Dequeued jobs: 7471270
Throughput: 002.490 MOperations/s
Ingress rate: 038.000 Gib/s
Egress rate: 000.594 Gib/s
结果概述
由于已指定单核且线程计数为 2,因此会显示每个线程的统计信息以及聚合统计信息。
还可以观察到在核心 1 上启动了 2 个线程,每个线程都执行预热作业。
/tmp/sha_256_test.csv 的内容如下所示。 可以看到,列出了测试使用的配置以及测试运行的相关统计信息
cfg.companion.connection_string,cfg.pipeline.steps,cfg.pipeline.use_remote_input_buffers,cfg.pipeline.use_remote_output_buffers,cfg.pipeline.bulk_latency.lower_bound,cfg.pipeline.bulk_latency.bucket_width,cfg.run_limit.duration,cfg.r un_limit.jobs,cfg.run_limit.bytes,cfg.data_provider.type,cfg.data_provider.output_buffer_size,cfg.device.pci_address,cfg.input.cwd,cfg.output.cwd,cfg.warmup_job_count,cfg.core_set,cfg.benchmark_mode,cfg.warnings_are_errors,cfg.attrib ute.doca_compress.algorithm,cfg.attribute.doca_ec.matrix_type,cfg.attribute.doca_ec.data_block_count,cfg.attribute.doca_ec.redundancy_block_count,cfg.attribute.doca_ec.use_precomputed_matrix,cfg.attribute.doca_eth.l3_chksum_offload,c fg.attribute.doca_eth.l4_chksum_offload,cfg.attribute.doca_sha.algorithm,cfg.uniform_job_size,cfg.core_count,cfg.per_core_thread_count,cfg.task_pool_size,cfg.data_provider_job_count,cfg.sg_config,cfg.mtu-size,cfg.send-queue-size,cfg. receive-queue-size,cfg.data-provider-input-file,cfg.attribute.mmo.log_qp_depth,cfg.attribute.mmo.log_num_qps,stats.input.job_count,stats.output.job_count,stats.input.byte_count,stats.output.byte_count,stats.input.throughput.bytes,sta ts.output.throughput.bytes,stats.input.throughput.rate,stats.output.throughput.rate
,[doca_sha],0,0,10000,1000,3,,,random-data,2048,d8:00.0,,,100,[1],throughput,0,,,,,,,,sha256,2048,1,2,1024,128,1 fragments,,,,,,,7471270,7471270,15301160960,239109312,038.000 Gib/s,000.594 Gib/s,2.490370 MOperations/s,2.490370 MOpera tions/s
此测试在主机端调用 DOCA Bench 以使用 SHA512 算法执行 SHA 操作,并创建一个包含测试配置和统计信息的 csv 文件,
该命令重复执行,并添加了 csv-append-mode 选项。 这指示 DOCA Bench 将测试运行统计信息附加到现有的 csv 文件中。
提供了一个包含 1 个核心的列表,每个核心的线程计数为 2。
命令行
创建初始
/tmp/sha_512_test.csv文件doca_bench --core-list
2\ --threads-per-core2\ --pipeline-steps doca_sha \ --device d8:00.0\ --data-provider random-data \ --uniform-job-size2048\ --job-output-buffer-size2048\ --run-limit-seconds3\ --attribute doca_sha.algorithm=sha512 \ --warm-up-jobs100\ --csv-output-file /tmp/sha_512_test.csv第二个命令是
./doca_bench --core-list
2\ --threads-per-core2\ --pipeline-steps doca_sha \ --device d8:00.0\ --data-provider random-data \ --uniform-job-size2048\ --job-output-buffer-size2048\ --run-limit-seconds3\ --attribute doca_sha.algorithm=sha512 \ --warm-up-jobs100\ --csv-output-file /tmp/sha_512_test.csv \ --csv-append-mode这使得 DOCA Bench 将第二个命令运行的配置和统计信息附加到
/tmp/sha_512_test.csv文件中。
结果输出
这是第一个命令运行的结果输出快照
Executing...
Data path thread [0] started...
WT[0] Executing 100 warm-up tasks using 100 unique tasks
Data path thread [1] started...
WT[1] Executing 100 warm-up tasks using 100 unique tasks
Cleanup...
[main] Completed! tearing down...
Stats for thread[0](core: 2)
Duration: 3015185 micro seconds
Enqueued jobs: 3590717
Dequeued jobs: 3590717
Throughput: 001.191 MOperations/s
Ingress rate: 018.171 Gib/s
Egress rate: 000.568 Gib/s
Stats for thread[1](core: 2)
Duration: 3000203 micro seconds
Enqueued jobs: 3656044
Dequeued jobs: 3656044
Throughput: 001.219 MOperations/s
Ingress rate: 018.594 Gib/s
Egress rate: 000.581 Gib/s
Aggregate stats
Duration: 3015185 micro seconds
Enqueued jobs: 7246761
Dequeued jobs: 7246761
Throughput: 002.403 MOperations/s
Ingress rate: 036.673 Gib/s
Egress rate: 001.146 Gib/s
这是第二个命令运行的结果输出快照
Executing...
Data path thread [0] started...
WT[0] Executing 100 warm-up tasks using 100 unique tasks
Data path thread [1] started...
WT[1] Executing 100 warm-up tasks using 100 unique tasks
Cleanup...
[main] Completed! tearing down...
Stats for thread[0](core: 2)
Duration: 3000072 micro seconds
Enqueued jobs: 3602562
Dequeued jobs: 3602562
Throughput: 001.201 MOperations/s
Ingress rate: 018.323 Gib/s
Egress rate: 000.573 Gib/s
Stats for thread[1](core: 2)
Duration: 3000062 micro seconds
Enqueued jobs: 3659148
Dequeued jobs: 3659148
Throughput: 001.220 MOperations/s
Ingress rate: 018.611 Gib/s
Egress rate: 000.582 Gib/s
Aggregate stats
Duration: 3000072 micro seconds
Enqueued jobs: 7261710
Dequeued jobs: 7261710
Throughput: 002.421 MOperations/s
Ingress rate: 036.934 Gib/s
Egress rate: 001.154 Gib/s
结果概述
由于已指定单核且线程计数为 2,因此会显示每个线程的统计信息以及聚合统计信息。
还可以观察到在核心 1 上启动了 2 个线程,每个线程都执行预热作业。
/tmp/sha_256_test.csv 的内容在第一个命令运行后如下所示。 可以看到,列出了测试使用的配置以及测试运行的相关统计信息
cfg.companion.connection_string,cfg.pipeline.steps,cfg.pipeline.use_remote_input_buffers,cfg.pipeline.use_remote_output_buffers,cfg.pipeline.bulk_latency.lower_bound,cfg.pipeline.bulk_latency.bucket_width,cfg.run_limit.duration,cfg.run_limit.jobs,cfg.run_limit.bytes,cfg.data_provider.type,cfg.data_provider.output_buffer_size,cfg.device.pci_address,cfg.input.cwd,cfg.output.cwd,cfg.warmup_job_count,cfg.core_set,cfg.benchmark_mode,cfg.warnings_are_errors,cfg.attribute.doca_compress.algorithm,cfg.attribute.doca_ec.matrix_type,cfg.attribute.doca_ec.data_block_count,cfg.attribute.doca_ec.redundancy_block_count,cfg.attribute.doca_ec.use_precomputed_matrix,cfg.attribute.doca_eth.l3_chksum_offload,cfg.attribute.doca_eth.l4_chksum_offload,cfg.attribute.doca_sha.algorithm,cfg.uniform_job_size,cfg.core_count,cfg.per_core_thread_count,cfg.task_pool_size,cfg.data_provider_job_count,cfg.sg_config,cfg.mtu-size,cfg.send-queue-size,cfg.receive-queue-size,cfg.data-provider-input-file,cfg.attribute.mmo.log_qp_depth,cfg.attribute.mmo.log_num_qps,stats.input.job_count,stats.output.job_count,stats.input.byte_count,stats.output.byte_count,stats.input.throughput.bytes,stats.output.throughput.bytes,stats.input.throughput.rate,stats.output.throughput.rate
,[doca_sha],0,0,10000,1000,3,,,random-data,2048,d8:00.0,,,100,[2],throughput,0,,,,,,,,sha512,2048,1,2,1024,128,1 fragments,,,,,,,7246761,7246761,14841366528,463850048,036.673 Gib/s,001.146 Gib/s,2.403422 MOperations/s,2.403422 MOperations/s
/tmp/sha_256_test.csv 的内容在第二个命令运行后如下所示。 可以看到,添加了第二个条目,详细说明了测试使用的配置以及测试运行的相关统计信息
cfg.companion.connection_string,cfg.pipeline.steps,cfg.pipeline.use_remote_input_buffers,cfg.pipeline.use_remote_output_buffers,cfg.pipeline.bulk_latency.lower_bound,cfg.pipeline.bulk_latency.bucket_width,cfg.run_limit.duration,cfg.run_limit.jobs,cfg.run_limit.bytes,cfg.data_provider.type,cfg.data_provider.output_buffer_size,cfg.device.pci_address,cfg.input.cwd,cfg.output.cwd,cfg.warmup_job_count,cfg.core_set,cfg.benchmark_mode,cfg.warnings_are_errors,cfg.attribute.doca_compress.algorithm,cfg.attribute.doca_ec.matrix_type,cfg.attribute.doca_ec.data_block_count,cfg.attribute.doca_ec.redundancy_block_count,cfg.attribute.doca_ec.use_precomputed_matrix,cfg.attribute.doca_eth.l3_chksum_offload,cfg.attribute.doca_eth.l4_chksum_offload,cfg.attribute.doca_sha.algorithm,cfg.uniform_job_size,cfg.core_count,cfg.per_core_thread_count,cfg.task_pool_size,cfg.data_provider_job_count,cfg.sg_config,cfg.mtu-size,cfg.send-queue-size,cfg.receive-queue-size,cfg.data-provider-input-file,cfg.attribute.mmo.log_qp_depth,cfg.attribute.mmo.log_num_qps,stats.input.job_count,stats.output.job_count,stats.input.byte_count,stats.output.byte_count,stats.input.throughput.bytes,stats.output.throughput.bytes,stats.input.throughput.rate,stats.output.throughput.rate
,[doca_sha],0,0,10000,1000,3,,,random-data,2048,d8:00.0,,,100,[2],throughput,0,,,,,,,,sha512,2048,1,2,1024,128,1 fragments,,,,,,,7246761,7246761,14841366528,463850048,036.673 Gib/s,001.146 Gib/s,2.403422 MOperations/s,2.403422 MOperations/s
,[doca_sha],0,0,10000,1000,3,,,random-data,2048,d8:00.0,,,100,[2],throughput,0,,,,,,,,sha512,2048,1,2,1024,128,1 fragments,,,,,,,7261710,7261710,14871982080,464806784,036.934 Gib/s,001.154 Gib/s,2.420512 MOperations/s,2.420512 MOperations/s
此测试在 BlueField 端调用 DOCA Bench 以使用 SHA1 算法执行 SHA 操作,并在测试运行期间每 2000 毫秒显示统计信息
提供了一个包含 3 个核心的列表,每个核心的线程计数为 2,核心计数为 1
core-count 指示 DOCA Bench 使用核心列表中的第一个核心编号,在本例中为核心编号 2
命令行
doca_bench --core-list 2,3,4 \
--core-count 1 \
--threads-per-core 2 \
--pipeline-steps doca_sha \
--device 03:00.0 \
--data-provider random-data \
--uniform-job-size 2048 \
--job-output-buffer-size 2048 \
--run-limit-seconds 3 \
-attribute doca_sha.algorithm=sha1 \
--warm-up-jobs 100 \
--rt-stats-interval 2000
结果输出
Executing...
Data path thread [0] started...
WT[0] Executing 100 warm-up tasks using 100 unique tasks
Data path thread [1] started...
WT[1] Executing 100 warm-up tasks using 100 unique tasks
Stats for thread[0](core: 2)
Duration: 965645 micro seconds
Enqueued jobs: 1171228
Dequeued jobs: 1171228
Throughput: 001.213 MOperations/s
Ingress rate: 018.505 Gib/s
Egress rate: 000.181 Gib/s
Stats for thread[1](core: 2)
Duration: 965645 micro seconds
Enqueued jobs: 1171754
Dequeued jobs: 1171754
Throughput: 001.213 MOperations/s
Ingress rate: 018.514 Gib/s
Egress rate: 000.181 Gib/s
Aggregate stats
Duration: 965645 micro seconds
Enqueued jobs: 2342982
Dequeued jobs: 2342982
Throughput: 002.426 MOperations/s
Ingress rate: 037.019 Gib/s
Egress rate: 000.362 Gib/s
Stats for thread[0](core: 2)
Duration: 2968088 micro seconds
Enqueued jobs: 3653691
Dequeued jobs: 3653691
Throughput: 001.231 MOperations/s
Ingress rate: 018.783 Gib/s
Egress rate: 000.183 Gib/s
Stats for thread[1](core: 2)
Duration: 2968088 micro seconds
Enqueued jobs: 3689198
Dequeued jobs: 3689198
Throughput: 001.243 MOperations/s
Ingress rate: 018.965 Gib/s
Egress rate: 000.185 Gib/s
Aggregate stats
Duration: 2968088 micro seconds
Enqueued jobs: 7342889
Dequeued jobs: 7342889
Throughput: 002.474 MOperations/s
Ingress rate: 037.748 Gib/s
Egress rate: 000.369 Gib/s
Cleanup...
[main] Completed! tearing down...
Stats for thread[0](core: 2)
Duration: 3000122 micro seconds
Enqueued jobs: 3694128
Dequeued jobs: 3694128
Throughput: 001.231 MOperations/s
Ingress rate: 018.789 Gib/s
Egress rate: 000.184 Gib/s
Stats for thread[1](core: 2)
Duration: 3000089 micro seconds
Enqueued jobs: 3751128
Dequeued jobs: 3751128
Throughput: 001.250 MOperations/s
Ingress rate: 019.079 Gib/s
Egress rate: 000.186 Gib/s
Aggregate stats
Duration: 3000122 micro seconds
Enqueued jobs: 7445256
Dequeued jobs: 7445256
Throughput: 002.482 MOperations/s
Ingress rate: 037.867 Gib/s
Egress rate: 000.370 Gib/s
结果概述
尽管已指定包含 3 个核心的核心列表,但核心计数为 1 指示 DOCA Bench 使用核心列表中的第一个条目。
可以看到,由于已指定线程计数为 2,因此在核心 2 上创建了 2 个线程。
已指定 2000 毫秒的瞬态统计信息间隔,可以看到每个线程的瞬态统计信息以及最终的聚合统计信息。
此测试调用 DOCA Bench 以在主机上执行本地 DMA 操作
它指定应使用核心计数 1、2 和 4 执行核心扫描,使用选项
--sweep core-count,1,4,*2测试输出将保存在 CSV 文件
/tmp/dma_sweep.csv中,并应用过滤器,以便仅记录统计信息。 不会记录配置信息。
命令行
doca_bench --core-mask 0xff \
--sweep core-count,1,4,*2 \
--pipeline-steps doca_dma \
--device d8:00.0 \
--data-provider random-data \
--uniform-job-size 2048 \
--job-output-buffer-size 2048 \
--run-limit-seconds 5 \
--csv-output-file /tmp/dma_sweep.csv \
--csv-stats "stats.*"
结果概述
Test permutations: [
Attributes: []
Uniform job size: 2048
Core count: 1
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: -- not configured --
RQ depth: -- not configured --
Input data file: -- not configured --
--------------------------------
Attributes: []
Uniform job size: 2048
Core count: 2
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: -- not configured --
RQ depth: -- not configured --
Input data file: -- not configured --
--------------------------------
Attributes: []
Uniform job size: 2048
Core count: 4
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: -- not configured --
RQ depth: -- not configured --
Input data file: -- not configured --
]
[main] Initialize framework...
[main] Start execution...
Preparing permutation 1 of 3...
Executing permutation 1 of 3...
Data path thread [0] started...
WT[0] Executing 100 warm-up tasks using 100 unique tasks
Cleanup permutation 1 of 3...
Aggregate stats
Duration: 5000191 micro seconds
Enqueued jobs: 22999128
Dequeued jobs: 22999128
Throughput: 004.600 MOperations/s
Ingress rate: 070.185 Gib/s
Egress rate: 070.185 Gib/s
Preparing permutation 2 of 3...
Executing permutation 2 of 3...
Data path thread [0] started...
WT[0] Executing 100 warm-up tasks using 100 unique tasks
Data path thread [1] started...
WT[1] Executing 100 warm-up tasks using 100 unique tasks
Cleanup permutation 2 of 3...
Stats for thread[0](core: 0)
Duration: 5000066 micro seconds
Enqueued jobs: 14409794
Dequeued jobs: 14409794
Throughput: 002.882 MOperations/s
Ingress rate: 043.975 Gib/s
Egress rate: 043.975 Gib/s
Stats for thread[1](core: 1)
Duration: 5000188 micro seconds
Enqueued jobs: 14404708
Dequeued jobs: 14404708
Throughput: 002.881 MOperations/s
Ingress rate: 043.958 Gib/s
Egress rate: 043.958 Gib/s
Aggregate stats
Duration: 5000188 micro seconds
Enqueued jobs: 28814502
Dequeued jobs: 28814502
Throughput: 005.763 MOperations/s
Ingress rate: 087.932 Gib/s
Egress rate: 087.932 Gib/s
Preparing permutation 3 of 3...
Executing permutation 3 of 3...
Data path thread [1] started...
Data path thread [0] started...
WT[0] Executing 100 warm-up tasks using 100 unique tasks
WT[1] Executing 100 warm-up tasks using 100 unique tasks
Data path thread [3] started...
WT[3] Executing 100 warm-up tasks using 100 unique tasks
Data path thread [2] started...
WT[2] Executing 100 warm-up tasks using 100 unique tasks
Cleanup permutation 3 of 3...
[main] Completed! tearing down...
Stats for thread[0](core: 0)
Duration: 5000092 micro seconds
Enqueued jobs: 7227025
Dequeued jobs: 7227025
Throughput: 001.445 MOperations/s
Ingress rate: 022.055 Gib/s
Egress rate: 022.055 Gib/s
Stats for thread[1](core: 1)
Duration: 5000081 micro seconds
Enqueued jobs: 7223269
Dequeued jobs: 7223269
Throughput: 001.445 MOperations/s
Ingress rate: 022.043 Gib/s
Egress rate: 022.043 Gib/s
Stats for thread[2](core: 2)
Duration: 5000047 micro seconds
Enqueued jobs: 7229678
Dequeued jobs: 7229678
Throughput: 001.446 MOperations/s
Ingress rate: 022.063 Gib/s
Egress rate: 022.063 Gib/s
Stats for thread[3](core: 3)
Duration: 5000056 micro seconds
Enqueued jobs: 7223037
Dequeued jobs: 7223037
Throughput: 001.445 MOperations/s
Ingress rate: 022.043 Gib/s
Egress rate: 022.043 Gib/s
Aggregate stats
Duration: 5000092 micro seconds
Enqueued jobs: 28903009
Dequeued jobs: 28903009
Throughput: 005.780 MOperations/s
Ingress rate: 088.203 Gib/s
Egress rate: 088.203 Gib/s
结果概述
输出给出了正在执行的排列的摘要,然后继续显示每个排列的统计信息。
可以看到 CSV 输出文件内容仅包含统计信息。 不包含配置信息。
每个扫描排列都有一个条目
stats.input.job_count,stats.output.job_count,stats.input.byte_count,stats.output.byte_count,stats.input.throughput.bytes,stats.output.throughput.bytes,stats.input.throughput.rate,stats.output.throughput.rate
22999128,22999128,47102214144,47102214144,070.185 Gib/s,070.185 Gib/s,4.599650 MOperations/s,4.599650 MOperations/s
28814502,28814502,59012100096,59012100096,087.932 Gib/s,087.932 Gib/s,5.762683 MOperations/s,5.762683 MOperations/s
28903009,28903009,59193362432,59193362432,088.203 Gib/s,088.203 Gib/s,5.780495 MOperations/s,5.780495 MOperations/s
此测试调用 DOCA Bench 以在主机上执行本地 DMA 操作。
它指定应使用作业大小 1024 和 2048 执行统一作业大小扫描,使用选项 --sweep uniform-job-size,1024,2048。
测试输出将保存在 CSV 文件 /tmp/dma_sweep_job_size.csv 中,并且启用了环境信息收集。
命令行
doca_bench --core-mask 0xff \
--core-count 1 \
--pipeline-steps doca_dma \
--device d8:00.0 \
--data-provider random-data \
--sweep uniform-job-size,1024,2048 \
--job-output-buffer-size 2048 \
--run-limit-seconds 5 \
--csv-output-file /tmp/dma_sweep_job_size.csv \
--enable-environment-information
结果概述
Test permutations: [
Attributes: []
Uniform job size: 1024
Core count: 1
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: -- not configured --
RQ depth: -- not configured --
Input data file: -- not configured --
--------------------------------
Attributes: []
Uniform job size: 2048
Core count: 1
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: -- not configured --
RQ depth: -- not configured --
Input data file: -- not configured --
]
[main] Initialize framework...
[main] Start execution...
Preparing permutation 1 of 2...
Executing permutation 1 of 2...
Data path thread [0] started...
WT[0] Executing 100 warm-up tasks using 100 unique tasks
Cleanup permutation 1 of 2...
Aggregate stats
Duration: 5000083 micro seconds
Enqueued jobs: 23645128
Dequeued jobs: 23645128
Throughput: 004.729 MOperations/s
Ingress rate: 036.079 Gib/s
Egress rate: 036.079 Gib/s
Preparing permutation 2 of 2...
Executing permutation 2 of 2...
Data path thread [0] started...
WT[0] Executing 100 warm-up tasks using 100 unique tasks
Cleanup permutation 2 of 2...
[main] Completed! tearing down...
Aggregate stats
Duration: 5000027 micro seconds
Enqueued jobs: 22963128
Dequeued jobs: 22963128
Throughput: 004.593 MOperations/s
Ingress rate: 070.078 Gib/s
Egress rate: 070.078 Gib/s
结果概述
输出给出了正在执行的排列的摘要,然后继续显示每个排列的统计信息。
可以看到 CSV 输出文件内容包含统计信息和环境信息。
每个扫描排列都有一个条目。
cfg.companion.connection_string,cfg.pipeline.steps,cfg.pipeline.use_remote_input_buffers,cfg.pipeline.use_remote_output_buffers,cfg.pipeline.bulk_latency.lower_bound,cfg.pipeline.bulk_latency.bucket_width,cfg.run_limit.duration,cfg.run_limit.jobs,cfg.run_limit.bytes,cfg.data_provider.type,cfg.data_provider.output_buffer_size,cfg.device.pci_address,cfg.input.cwd,cfg.output.cwd,cfg.warmup_job_count,cfg.core_set,cfg.benchmark_mode,cfg.warnings_are_errors,cfg.attribute.doca_compress.algorithm,cfg.attribute.doca_ec.matrix_type,cfg.attribute.doca_ec.data_block_count,cfg.attribute.doca_ec.redundancy_block_count,cfg.attribute.doca_ec.use_precomputed_matrix,cfg.attribute.doca_eth.l3_chksum_offload,cfg.attribute.doca_eth.l4_chksum_offload,cfg.attribute.doca_sha.algorithm,cfg.uniform_job_size,cfg.core_count,cfg.per_core_thread_count,cfg.task_pool_size,cfg.data_provider_job_count,cfg.sg_config,cfg.mtu-size,cfg.send-queue-size,cfg.receive-queue-size,cfg.data-provider-input-file,cfg.attribute.mmo.log_qp_depth,cfg.attribute.mmo.log_num_qps,stats.input.job_count,stats.output.job_count,stats.input.byte_count,stats.output.byte_count,stats.input.throughput.bytes,stats.output.throughput.bytes,stats.input.throughput.rate,stats.output.throughput.rate,host.pci.3.address,host.pci.3.ext_tag,host.pci.3.link_type,host.pci.2.ext_tag,host.pci.2.address,host.cpu.0.model,host.ofed_version,host.pci.4.max_read_request,host.pci.2.width,host.cpu.1.logical_cores,host.pci.2.eswitch_mode,host.pci.3.max_read_request,host.pci.4.address,host.pci.2.link_type,host.pci.1.max_read_request,host.pci.4.link_type,host.cpu.socket_count,host.pci.0.ext_tag,host.pci.6.port_speed,host.cpu.0.physical_cores,host.pci.7.port_speed,host.memory.dimm_slot_count,host.cpu.1.model,host.pci.0.max_payload_size,host.pci.6.relaxed_ordering,host.doca_host_package_version,host.pci.6.max_payload_size,host.pci.0.gen,host.pci.4.width,host.pci.2.gen,host.pci.1.max_payload_size,host.pci.4.relaxed_ordering,host.pci.3.width,host.cpu.0.logical_cores,host.cpu.0.arch,host.pci.4.port_speed,host.pci.4.eswitch_mode,host.pci.7.address,host.pci.5.eswitch_mode,host.pci.5.address,host.cpu.1.arch,host.pci.0.eswitch_mode,host.pci.7.width,host.pci.7.link_type,host.pci.1.link_type,host.pci.3.gen,host.pci.7.max_read_request,host.pci.7.eswitch_mode,host.pci.6.gen,host.pci.2.port_speed,host.pci.7.gen,host.pci.2.relaxed_ordering,host.pci.6.width,host.pci.4.gen,host.pci.6.address,host.hostname,host.pci.5.link_type,host.pci.6.link_type,host.pci.6.max_read_request,host.pci.7.max_payload_size,host.pci.5.gen,host.pci.6.eswitch_mode,host.pci.5.width,host.pci.3.relaxed_ordering,host.pci.4.ext_tag,host.pci.0.width,host.pci.5.port_speed,host.pci.2.max_payload_size,host.pci.3.max_payload_size,host.pci.5.max_payload_size,host.pci.2.max_read_request,host.pci.0.address,host.pci.gen,host.os.family,host.pci.1.gen,host.pci.5.relaxed_ordering,host.pci.1.port_speed,host.pci.7.ext_tag,host.pci.1.address,host.pci.3.eswitch_mode,host.pci.3.port_speed,host.pci.0.max_read_request,host.pci.1.ext_tag,host.pci.0.relaxed_ordering,host.pci.0.link_type,host.pci.5.max_read_request,host.pci.4.max_payload_size,host.pci.device_count,host.memory.populated_dimm_count,host.memory.installed_capacity,host.pci.6.ext_tag,host.os.kernel_version,host.pci.0.port_speed,host.pci.1.width,host.pci.7.relaxed_ordering,host.pci.1.relaxed_ordering,host.os.version,host.os.name,host.cpu.1.physical_cores,host.numa_node_count,host.pci.5.ext_tag,host.pci.1.eswitch_mode
,[doca_dma],0,0,10000,1000,5,,,random-data,2048,d8:00.0,,,100,"[0, 1, 2, 3, 4, 5, 6, 7]",throughput,0,,,,,,,,,1024,1,1,1024,128,1 fragments,,,,,,,23645128,23645128,24212611072,24212611072,036.079 Gib/s,036.079 Gib/s,4.728947 MOperations/s,4.728947 MOperations/s,0000:5e:00.1,true,Infiniband,true,0000:5e:00.0,N/A,OFED-internal-24.04-0.4.8,N/A,x63,N/A,N/A,N/A,0000:af:00.0,Infiniband,N/A,Ethernet,2,true,N/A,N/A,N/A,N/A,N/A,N/A,true,<none>,N/A,Gen15,x63,Gen15,N/A,true,x63,N/A,x86_64,104857600000,N/A,0000:d8:00.1,N/A,0000:af:00.1,x86_64,N/A,x63,Ethernet,Infiniband,Gen15,N/A,N/A,Gen15,N/A,Gen15,true,x63,Gen15,0000:d8:00.0,zibal,Ethernet,Ethernet,N/A,N/A,Gen15,N/A,x63,true,true,x63,104857600000,N/A,N/A,N/A,N/A,0000:3b:00.0,N/A,Linux,Gen15,true,N/A,true,0000:3b:00.1,N/A,N/A,N/A,true,true,Infiniband,N/A,N/A,8,N/A,270049112064,true,5.4.0-174-generic,N/A,x63,true,true,20.04.1 LTS (Focal Fossa),Ubuntu,N/A,2,true,N/A
,[doca_dma],0,0,10000,1000,5,,,random-data,2048,d8:00.0,,,100,"[0, 1, 2, 3, 4, 5, 6, 7]",throughput,0,,,,,,,,,2048,1,1,1024,128,1 fragments,,,,,,,22963128,22963128,47028486144,47028486144,070.078 Gib/s,070.078 Gib/s,4.592600 MOperations/s,4.592600 MOperations/s,0000:5e:00.1,true,Infiniband,true,0000:5e:00.0,N/A,OFED-internal-24.04-0.4.8,N/A,x63,N/A,N/A,N/A,0000:af:00.0,Infiniband,N/A,Ethernet,2,true,N/A,N/A,N/A,N/A,N/A,N/A,true,<none>,N/A,Gen15,x63,Gen15,N/A,true,x63,N/A,x86_64,104857600000,N/A,0000:d8:00.1,N/A,0000:af:00.1,x86_64,N/A,x63,Ethernet,Infiniband,Gen15,N/A,N/A,Gen15,N/A,Gen15,true,x63,Gen15,0000:d8:00.0,zibal,Ethernet,Ethernet,N/A,N/A,Gen15,N/A,x63,true,true,x63,104857600000,N/A,N/A,N/A,N/A,0000:3b:00.0,N/A,Linux,Gen15,true,N/A,true,0000:3b:00.1,N/A,N/A,N/A,true,true,Infiniband,N/A,N/A,8,N/A,270049112064,true,5.4.0-174-generic,N/A,x63,true,true,20.04.1 LTS (Focal Fossa),Ubuntu,N/A,2,true,N/A
此测试调用 DOCA Bench 以在主机上执行远程 DMA 操作
它指定了要在主机上使用的伴随连接详细信息,以及要使用远程输出缓冲区
命令行
doca_bench --core-list 12 \
--pipeline-steps doca_dma \
--device 03:00.0 \
--data-provider random-data \
--uniform-job-size 2048 \
--job-output-buffer-size 2048 \
--use-remote-output-buffers \
--companion-connection-string proto=tcp,port=12345,mode=host,dev=17:00.0,user=bob,addr=10.10.10.10 \
--run-limit-seconds 5
结果概述
Executing...
Worker thread[0](core: 12) [doca_dma] started...
Worker thread[0] Executing 100 warm-up tasks using 100 unique tasks
Cleanup...
[main] Completed! tearing down...
Aggregate stats
Duration: 5000073 micro seconds
Enqueued jobs: 32202128
Dequeued jobs: 32202128
Throughput: 006.440 MOperations/s
Ingress rate: 098.272 Gib/s
Egress rate: 098.272 Gib/s
结果概述
无。
此测试仅与 BlueField-2 相关。
此测试调用 DOCA Bench 以使用随机数据作为输入运行压缩
指定的压缩算法为“deflate”
命令行
doca_bench --core-list 2 \
--pipeline-steps doca_compress::compress \
--device 03:00.0 \
--data-provider random-data \
--uniform-job-size 2048 \
--job-output-buffer-size 4096 \
--run-limit-seconds 3 \
--attribute doca_compress.algorithm="deflate"
结果输出
Executing...
Data path thread [0] started...
WT[0] Executing 100 warm-up tasks using 100 unique tasks
Cleanup...
[main] Completed! tearing down...
Aggregate stats
Duration: 3000146 micro seconds
Enqueued jobs: 5340128
Dequeued jobs: 5340128
Throughput: 001.780 MOperations/s
Ingress rate: 027.160 Gib/s
Egress rate: 027.748 Gib/s
结果概述
无
此测试调用 DOCA Bench 以使用随机数据作为输入运行解压缩
此测试指定文件集的数据提供程序,其中包含 LZ4 压缩文件的文件名
指定使用远程输入缓冲区用于输入作业
它指定了要在主机上用于远程输入缓冲区的伴随连接详细信息
命令行
doca_bench --core-list 12 \
--pipeline-steps doca_compress::decompress \
--device 03:00.0 \
--data-provider file-set \
--data-provider-input-file lz4_compressed_64b_buffers.fs \
--job-output-buffer-size 4096 \
--run-limit-seconds 3 \
--attribute doca_compress.algorithm="lz4" \
--use-remote-output-buffers \
--companion-connection-string proto=tcp,port=12345,mode=host,dev=17:00.0,user=bob,addr=10.10.10.10
结果输出
Executing...
Worker thread[0](core: 12) [doca_compress::decompress] started...
Worker thread[0] Executing 100 warm-up tasks using 100 unique tasks
Cleanup...
[main] Completed! tearing down...
Aggregate stats
Duration: 3000043 micro seconds
Enqueued jobs: 15306128
Dequeued jobs: 15306128
Throughput: 005.102 MOperations/s
Ingress rate: 003.155 Gib/s
Egress rate: 002.433 Gib/s
结果注释
无
此测试调用 DOCA Bench 以运行 EC 创建步骤。
它在批量延迟模式下运行,并指定
doca_ec的data_block_count、redundancy_block_count和matrix_type属性
命令行
doca_bench --mode bulk-latency \
--core-list 12 \
--pipeline-steps doca_ec::create \
--device 17:00.0 \
--data-provider random-data \
--uniform-job-size 1024 \
--job-output-buffer-size 1024 \
--run-limit-seconds 3 \
--attribute doca_ec.data_block_count=16 \
--attribute doca_ec.redundancy_block_count=16 \
--attribute doca_ec.matrix_type=cauchy
结果输出
批量延迟输出将类似于“BlueField 端解压缩 LZ4 示例”部分中介绍的输出。
结果注释
批量延迟输出将类似于本页前面介绍的输出。
此测试调用 DOCA Bench 以运行 EC 创建步骤
它在精确延迟模式下运行,并指定
doca_ec的data_block_count、redundancy_block_count和matrix_type属性
命令行
doca_bench --mode precision-latency \
--core-list 12 \
--pipeline-steps doca_ec::create \
--device 03:00.0 \
--data-provider random-data \
--uniform-job-size 1024 \
--job-output-buffer-size 1024 \
--run-limit-jobs 5000 \
--attribute doca_ec.data_block_count=16 \
--attribute doca_ec.redundancy_block_count=16 \
--attribute doca_ec.matrix_type=cauchy
结果输出
无
结果注释
精确延迟输出将类似于本页前面介绍的输出。
此测试在 Comch 消费者模式下调用 DOCA Bench,在主机端和 BlueField 端使用核心列表
运行限制为 500 个作业
命令行
./doca_bench --core-list 4 --warm-up-jobs 32 --pipeline-steps doca_comch::consumer --device ca:00.0 --data-provider random-data --run-limit-jobs 500 --core-count 1 --uniform-job-size 4096 --job-output-buffer-size 4096 --companion-connection-string proto=tcp,mode=dpu,dev=03:00.0,user=bob,addr=10.10.10.10,port=12345 --attribute dopt.companion_app.path=<path to DPU doca_bench_companion application location> --data-provider-job-count 256 --companion-core-list 12
结果输出
[main] Completed! tearing down...
Aggregate stats
Duration: 1415 micro seconds
Enqueued jobs: 500
Dequeued jobs: 500
Throughput: 000.353 MOperations/s
Ingress rate: 000.000 Gib/s
Egress rate: 010.782 Gib/s
结果注释
聚合统计信息显示测试在处理完 500 个作业后完成。
此测试在 Comch 生产者模式下调用 DOCA Bench,在主机端和 BlueField 端使用核心掩码
运行限制为 1000 个作业
命令行
doca_bench --core-list 4 \
--warm-up-jobs 32 \
--pipeline-steps doca_comch::producer \
--device ca:00.0 \
--data-provider random-data \
--run-limit-jobs 500 \
--core-count 1 \
--uniform-job-size 4096 \
--job-output-buffer-size 4096 \
--companion-connection-string proto=tcp,mode=dpu,dev=03:00.0,user=bob,addr=10.10.10.10,port=12345 \
--attribute dopt.companion_app.path=<path to DPU doca_bench_companion location> \
--data-provider-job-count 256 \
--companion-core-list 12
结果概述
[main] Completed! tearing down...
Aggregate stats
Duration: 407 micro seconds
Enqueued jobs: 500
Dequeued jobs: 500
Throughput: 001.226 MOperations/s
Ingress rate: 037.402 Gib/s
Egress rate: 000.000 Gib/s
结果注释
聚合统计信息显示测试在处理完 500 个作业后完成。
此测试在 RDMA 发送模式下调用 DOCA Bench,在发送端和接收端使用核心列表
发送队列大小配置为 50 个条目
命令行
doca_bench --pipeline-steps doca_rdma::send \
--device d8:00.0 \
--data-provider random-data \
--uniform-job-size 2048 \
--job-output-buffer-size 2048 \
--run-limit-seconds 3 \
--send-queue-size 50 \
--companion-connection-string proto=tcp,addr=10.10.10.10,port=12345,user=bob,dev=ca:00.0 \
--companion-core-list 12 \
--core-list 12
结果输出
Test permutations: [
Attributes: []
Uniform job size: 2048
Core count: 1
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: 50
RQ depth: -- not configured --
Input data file: -- not configured --
]
结果注释
配置输出显示发送队列大小配置为 50。
此测试在 RDMA 接收模式下调用 DOCA Bench,在发送端和接收端使用核心列表
接收队列大小配置为 100 个条目
命令行
doca_bench --pipeline-steps doca_rdma::receive \
--device d8:00.0 \
--data-provider random-data \
--uniform-job-size 2048 \
--job-output-buffer-size 2048 \
--run-limit-seconds 3 \
--receive-queue-size 100 \
--companion-connection-string proto=tcp,addr=10.10.10.10,port=12345,user=bob,dev=ca:00.0 \
--companion-core-list 12 \
--core-list 12
结果输出
Test permutations: [
Attributes: []
Uniform job size: 2048
Core count: 1
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: -- not configured --
RQ depth: 100
Input data file: -- not configured --
]
结果概述
配置输出显示接收队列大小配置为 100。