Authors: Volodymyr Kindratenko (University of Illinois, National Center for Supercomputing Applications), Yangang Wang (Chinese Academy of Sciences)
Abstract: Deep Learning (DL) heavily relies on fast hardware and parallel algorithms to train complex neural networks. This BoF will bring together researchers and developers working on the design of next-generation computer systems for DL and parallel DL algorithms that can exploit the potential of these new systems. Research teams working on major deep learning systems deployed in the field are invited to discuss latest hardware and software trends and to exchange views on the role Artificial Intelligence (AI) in general and DL in particular will play in the near future for big data analytics and HPC applications.
Long Description: Deep learning (DL) has emerged in the last few years as an enabling technology for a range of novel applications that previously were considered impossible to realize due to the high computational complexity and unavailability of sufficiently large data sets. Examples include self-driving cars, real-time speech translators, personal assistants, and so on. High performance Computing (HPC) community is also starting to make use of these techniques, with notable examples ranging from studying molecular systems for drug discovery to gravitational wave analysis for estimating properties of colliding black holes.
Graphical Processing Units (GPUs) made the deep networks at the core of these applications trainable in an acceptable time. Parallel training algorithms are now readily available in popular frameworks, such as TensorFlow and Caffe, and are widely used. However, significant challenges remain in bringing DL closer to the edge where real-time decisions need to be made and also in scaling up DL algorithms to enable faster processing of bigger datasets while using more complex models. These challenges are being addressed via the development of novel computer architectures tailored towards DL algorithms, such as Google’s TPUs, as well as scalable parallel network training algorithms that can make use of a large number of compute units, such as Horovod framework from Uber. This BoF is envisioned as a place for leading developers of DL hardware and software to brief the community about the upcoming systems and future plans, both in the academic research domain and in commercial applications, and to stimulate the discussion about these systems and their potential use by the HPC community.
The BoF is envisioned as a community building event co-organized by researchers from the National Center for Supercomputing Applications (NCSA) in the US and the Computer Network Information Center (CNIC), Chinese Academy of Sciences. Both organizations have deployed systems tailored for DL applications and are working on the development of next-generation systems for DL at scale. Leading technology developers and providers from the US, such as NVIDIA, Google, Microsoft, Intel, IBM, and leading AI companies from China, such as Sugon, SenseTime, Face++ and Caiyun, will be invited to present both about latest hardware for DL and software frameworks. Also, research centers that have recently deployed major DL-oriented systems or are working on the upcoming deployment will be invited to share their views on the architectural trends, applicability and limitations of existing DL frameworks, and case studies. These will include latest IBM Power9 based HPC systems in the US and DCU/Cambricon accelerator in China.
This BoF will complement 4th Workshop on Machine Learning in HPC Environments (MLHPC’18) by focusing on future trends in next-generation computer systems for DL and parallel DL algorithms for these systems rather than discussing how current HPC systems can be used for DL.
Back to Birds of a Feather Archive Listing