Overview

GPUDirect Storage on chili10-101d

This report documents enabling GPUDirect Storage on chili10-101d (dual EPYC 9754, 4× L40S, 4× Solidigm D7-PS1010 NVMe, ConnectX-7 400 GbE) and benchmarking both the local NVMe path (raid0 XFS) and the remote path (NFSoRDMA to a VAST cluster). Six runs total.

L40S × 4EPYC 9754 × 2Solidigm D7-PS1010 × 4 (raid0)ConnectX-7 400 GbEVAST NFSoRDMA
Local NVMe peak read
53.0GiB/s
4 GPUs concurrent, 8 MiB block, raid0 over 4× Solidigm D7-PS101091% of 4-drive theoretical (58 GB/s)
NFSoRDMA peak read
43.4GiB/s
4 GPUs concurrent, 64 workers ea., 1 MiB block, VAST96% of 400 GbE line rate — single NIC saturated
Single-GPU NFSoRDMA
26.4GiB/s
1 GPU, 128 workers, 1 MiB94% of PCIe 4.0 x16 practical ceiling
Stack pinned
4.5.5
vastnfs-dkmsMOFED 24.10 · nvidia-fs 2.28.4 · CUDA 12.9
Peak read bandwidth — all runs
Local NVMe (raid0 XFS) scales with drives; NFSoRDMA on a single ConnectX-7 saturates at ~43 GiB/s against the 400 GbE line rate.
Local NVMe (raid0 XFS)NFSoRDMA (VAST)
Run index
Click any row to jump to the detail page with the full charts, config, and raw data.