<span class="var-sub_title">Data-Parallel Python for High Energy Physics Analyses</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

8th Workshop on Python for High-Performance and Scientific Computing


Data-Parallel Python for High Energy Physics Analyses

Abstract: In this paper, we explore features available in Python which are useful for data reduction tasks in High Energy Physics (HEP). High-level abstractions in Python are convenient for implementing data reduction tasks. However, in order for such abstractions to be practical, the efficiency of their performance must also be high. Because the data sets we process are typically large, we care about both I/O performance and in-memory processing speed. In particular, we evaluate the use of data-parallel programming, using MPI and numpy, to process a large experimental data set (42 TiB) stored in an HDF5 file. We measure the speed of processing of the data, distinguishing between the time spent reading data and the time spent processing the data in memory, and demonstrate the scalability of both, using up to 1200 KNL nodes (76800 cores) on Cori at NERSC.

Archive Materials


Back to 8th Workshop on Python for High-Performance and Scientific Computing Archive Listing

Back to Full Workshop Archive Listing