search-icon
Paper
:
Anatomy of High-Performance Deep Learning Convolutions on SIMD Architectures
Event Type
Paper
Registration Categories
TP
Tags
Applications
Cosmology
Data Analytics
Deep Learning
Machine Learning
Programming Systems
Storage
Visualization
TimeThursday, November 15th2:30pm - 3pm
LocationC140/142
DescriptionConvolution layers are prevalent in many classes of deep neural networks, including Convolutional Neural Networks (CNNs) which provide state-of-the-art results for tasks like image recognition, neural machine translation, and speech recognition. The computationally expensive nature of a convolution operation has led to the proliferation of implementations including matrix-matrix multiplication formulation, and direct convolution primarily targeting GPUs. In this paper, we introduce direct convolution kernels for x86 architectures, in particular for Xeon and Xeon Phi systems, which are implemented via a dynamic compilation approach. Our JIT-based implementation shows close to theoretical peak performance, depending on the setting and the CPU architecture at hand. We additionally demonstrate how these JIT-optimized kernels can be integrated into a light-weight multi-node graph execution model. This illustrates that single- and multi-node runs yield high efficiencies and high image-throughputs when executing state of the art image recognition tasks on CPUs.
Back To Top Button