Authors: Jun Makino (Kobe University, RIKEN), Kei Hiraki (University of Tokyo), John Shalf (Lawrence Berkeley National Laboratory), Jose Gracia (High Performance Computing Center Stuttgart)
Abstract: HPL has been and still is the single most widely used benchmark for the HPC systems, even though there have been many criticisms. The most important criticism is that HPL measures only the peak floating point performance and its result has little correlation with real application performance. HPCG (and also HPGMG) have been proposed as either alternative or complimentary benchmarks. HPCG measures mainly the main memory bandwidth. In this BoF, we want to exchange opinions on what aspects of machines should be measured and how by these benchmarks, in particular, when used as the requirement for new machines.
Long Description: The HPL benchmark has been the most widely accepted measure for HPC systems, at least for the last two decades. On the other hand, there have been long-standing criticisms that HPL measures only one aspect of the performance of HPC systems --- the peak floating-point performance. Since the main part of computation in HPL can be transformed to multiplications of dense matrices, HPL performance number reflects the efficiency of the DGEMM implementation and absolute peak performance, and not much else.
In practice, however, it is not that simple to achieve high efficiency on HPL, since other part of calculation, such as the pivot search and the row exchange, can dominate the calculation time when the size of the matrix is small. On the other hand, this means that if the size of the main memory is large enough, one can always achieve high efficiency on HPL.
HPCG was proposed as a possible alternative to HPL. As its name suggest, HPCG measures the performance of HPC systems for iterations of the Conjugate Gradient method to solve large sparse matrix. Thus, in HPCG, the most time-consuming part of the calculation is multiplication of a sparse matrix and a vector using indirect access, without much room for further optimization.
This choice of HPCG to prohibit certain optimizations is quite different from the regulation of HPL, in which the implementers are allowed to implement whatever optimizations, except the ones which reduces the total number of floating point operations executed.
As a result, HPCG measures essentially one single number: The main memory bandwidth for contiguous memory access. Of course, if the main memory is vert small, we'll see the effect of the network performance.
It is obvious that neither HPL nor HPCG is sufficient to describe the performance characteristics of HPC systems. Thus the natural question is what set of benchmarks should be used to measure what aspect of machines. Of course, if the machine will be used for limited number of applications, in principle we can use the applications themselves to evaluate the hardware. However, it is not always possible to run full applications on new machines, in particular of they do not exist.
This BoF will discuss what factors determine the performance of real-world applications, and how we can design benchmarks which can be used to measure these factors.
Back to Birds of a Feather Archive Listing