search-icon
Workshop
:
High Performance Implementation of Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme
Event Type
Workshop
Registration Categories
W
Tags
Exascale
Hot Topics
Reproducibility
Scientific Computing
TimeSunday, November 11th2:51pm - 3:11pm
LocationD221
DescriptionThis study presents a high performance implementation of Basic Linear Algebra Subprograms (BLAS) routines supporting reproducibility and tunable accuracy using the Ozaki scheme, which is an accurate matrix-multiplication method proposed by Ozaki et al. in 2011. The method ensures reproducibility and realizes tunable accuracy, including correct-rounding, by eliminating the effect of rounding-error in the computation. The most advantage of the method is that the method can be constructed based on level-3 BLAS. In this study, we show the implementation of three routines from level 1-3 BLAS: inner-product (DOT), matrix-vector multiplication (GEMV), and matrix-matrix multiplication (GEMM), with several optimization techniques for reducing the memory consumption and improving the performance. The performance evaluation on Titan V GPU demonstrates that our implementation achieves more than 73% of the expected peak performance.
Archive
Back To Top Button