<span class="var-sub_title">Deep Learning at Scale on Nvidia V100 Accelerators</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

The 9th International Workshop on Performance Modeling, Benchmarking, and Simulation of High-Performance Computer Systems (PMBS18)


Deep Learning at Scale on Nvidia V100 Accelerators

Abstract: The recent explosion in the popularity of Deep Learning (DL) is due to a combination of improved algorithms, access to large datasets and increased computational power. This had led to a plethora of open-source DL frameworks, each with varying characteristics and capabilities. End users are then left with the difficult task of determining software and hardware configurations to get optimal performance from each framework.

We share our experiences and develop best practices for DL training with TensorFlow, MXNet, and Caffe2. The paper also looks at DL inferencing with TensorRT on Nvidia V100 “Volta” GPUs. It focuses on one of the more prominent neural network architectures, Resnet50, combined with Imagenet dataset. We quantify the impact of hardware attributes on DL workloads such as the usage of PCIe vs NVLink GPUs, performance past a single worker node, effect of high speed interconnect such as InfiniBand EDR on training, and the implication of utilizing a network attached storage and its advantages.


Archive Materials


Back to The 9th International Workshop on Performance Modeling, Benchmarking, and Simulation of High-Performance Computer Systems (PMBS18) Archive Listing

Back to Full Workshop Archive Listing