A Cost-Effective Flexible System Optimized for DNN and ML
TimeMonday, November 12th7pm - 9pm
DescriptionHardware accelerators (e.g., GPU) are increasingly used for compute-intensive tasks (e.g., AI and HPC). When multiple accelerator and storage devices are present, direct data paths between the devices bypassing the host memory may be used (P2P). Current P2P provided by NVIDIA CUDA driver is limited to the NVIDIA GPUs under the same PCIe root complex and only up to 9 GPUs allowed in the P2P communication.
In our design, we used a simplified architecture as the basic building block. The new PCIe switch allows PCIe ID translation between different PCIe domains and customized routing. Together with the PCIe Gen 4, the blocks can stack together to scale out. This design is especially desired for the collective communications in DNN/ML and many HPC applications. Compared to other PCIe expansion enclosures, our design allows a CPU card installed to make the system self-sufficient/operational.
On the system software side, our solution breaks the 9-GPU under the same PCIe root complex limit and is not limited to NVIDIA GPUs. For example, the data can be transferred between NVMe storage and GPU memory directly.
Overall the new design provides a more cost-effective, robust and flexible solution that is optimized for DNN/ML and HPC applications.