<span class="var-sub_title">High Performance Implementation of Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Computational Reproducibility at Exascale 2018 (CRE2018)


High Performance Implementation of Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme

Authors: Daichi Mukunoki (Tokyo Woman's Christian University)

Abstract: This study presents a high performance implementation of Basic Linear Algebra Subprograms (BLAS) routines supporting reproducibility and tunable accuracy using the Ozaki scheme, which is an accurate matrix-multiplication method proposed by Ozaki et al. in 2011. The method ensures reproducibility and realizes tunable accuracy, including correct-rounding, by eliminating the effect of rounding-error in the computation. The most advantage of the method is that the method can be constructed based on level-3 BLAS. In this study, we show the implementation of three routines from level 1-3 BLAS: inner-product (DOT), matrix-vector multiplication (GEMV), and matrix-matrix multiplication (GEMM), with several optimization techniques for reducing the memory consumption and improving the performance. The performance evaluation on Titan V GPU demonstrates that our implementation achieves more than 73% of the expected peak performance.

Website: http://www.cs.fsu.edu/~cre

Archive Materials


Back to Computational Reproducibility at Exascale 2018 (CRE2018) Archive Listing

Back to Full Workshop Archive Listing