Authors: Jun Makino (Kobe University, RIKEN), Kei Hiraki (University of Tokyo), John Shalf (Lawrence Berkeley National Laboratory), Jose Gracia (High Performance Computing Center Stuttgart)
Abstract: HPL has been and still is the single most widely used benchmark for the HPC systems, even though there have been many criticisms. The most important criticism is that HPL measures only the peak floating point performance and its result has little correlation with real application performance. HPCG (and also HPGMG) have been proposed as either alternative or complimentary benchmarks. HPCG measures mainly the main memory bandwidth. In this BoF, we want to exchange opinions on what aspects of machines should be measured and how by these benchmarks, in particular, when used as the requirement for new machines.
Long Description: The HPL benchmark has been the most widely accepted measure for
HPC systems, at least for the last two decades. On the other
hand, there have been long-standing criticisms that HPL measures
only one aspect of the performance of HPC systems --- the peak
floating-point performance. Since the main part of computation in
HPL can be transformed to multiplications of dense matrices, HPL
performance number reflects the efficiency of the DGEMM
implementation and absolute peak performance, and not much else.
In practice, however, it is not that simple to achieve high efficiency
on HPL, since other part of calculation, such as the pivot search and
the row exchange, can dominate the calculation time when the size of
the matrix is small. On the other hand, this means that if the size of
the main memory is large enough, one can always achieve high
efficiency on HPL.
HPCG was proposed as a possible alternative to HPL. As its name
suggest, HPCG measures the performance of HPC systems for
iterations of the Conjugate Gradient method to solve large sparse
matrix. Thus, in HPCG, the most time-consuming part of the
calculation is multiplication of a sparse matrix and a vector
using indirect access, without much room for further
optimization.
This choice of HPCG to prohibit certain optimizations is quite
different from the regulation of HPL, in which the implementers are
allowed to implement whatever optimizations, except the ones which
reduces the total number of floating point operations executed.
As a result, HPCG measures essentially one single number: The
main memory bandwidth for contiguous memory access. Of course, if
the main memory is vert small, we'll see the effect of the
network performance.
It is obvious that neither HPL nor HPCG is sufficient to describe
the performance characteristics of HPC systems. Thus the natural
question is what set of benchmarks should be used to measure what
aspect of machines. Of course, if the machine will be used for
limited number of applications, in principle we can use the
applications themselves to evaluate the hardware. However, it is
not always possible to run full applications on new machines, in
particular of they do not exist.
This BoF will discuss what factors determine the performance of
real-world applications, and how we can design benchmarks which
can be used to measure these factors.
URL: http://www.jmlab.jp/?p=828
Back to Birds of a Feather Archive Listing