Deep Learning at Scale on Nvidia V100 Accelerators
Abstract: The recent explosion in the popularity of Deep Learning (DL) is due to a combination of improved algorithms, access to large datasets and increased computational power. This had led to a plethora of open-source DL frameworks, each with varying characteristics and capabilities. End users are then left with the difﬁcult task of determining software and hardware conﬁgurations to get optimal performance from each framework.
We share our experiences and develop best practices for DL training with TensorFlow, MXNet, and Caffe2. The paper also looks at DL inferencing with TensorRT on Nvidia V100 “Volta” GPUs. It focuses on one of the more prominent neural network architectures, Resnet50, combined with Imagenet dataset. We quantify the impact of hardware attributes on DL workloads such as the usage of PCIe vs NVLink GPUs, performance past a single worker node, effect of high speed interconnect such as InﬁniBand EDR on training, and the implication of utilizing a network attached storage and its advantages.
Back to The 9th International Workshop on Performance Modeling, Benchmarking, and Simulation of High-Performance Computer Systems (PMBS18) Archive Listing
Back to Full Workshop Archive Listing