BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160726Z
LOCATION:D221
DTSTART;TZID=America/Chicago:20181111T145100
DTEND;TZID=America/Chicago:20181111T151100
UID:submissions.supercomputing.org_SC18_sess162_ws_cre104@linklings.com
SUMMARY:High Performance Implementation of Reproducible BLAS Routines with
Tunable Accuracy Using Ozaki Scheme
DESCRIPTION:Workshop\nExascale, Hot Topics, Reproducibility, Scientific Co
mputing, Workshop Reg Pass\n\nHigh Performance Implementation of Reproduci
ble BLAS Routines with Tunable Accuracy Using Ozaki Scheme\n\nMukunoki, Og
ita, Ozaki\n\nThis study presents a high performance implementation of Bas
ic Linear Algebra Subprograms (BLAS) routines supporting reproducibility a
nd tunable accuracy using the Ozaki scheme, which is an accurate matrix-mu
ltiplication method proposed by Ozaki et al. in 2011. The method ensures r
eproducibility and realizes tunable accuracy, including correct-rounding,
by eliminating the effect of rounding-error in the computation. The most a
dvantage of the method is that the method can be constructed based on leve
l-3 BLAS. In this study, we show the implementation of three routines from
level 1-3 BLAS: inner-product (DOT), matrix-vector multiplication (GEMV),
and matrix-matrix multiplication (GEMM), with several optimization techni
ques for reducing the memory consumption and improving the performance. Th
e performance evaluation on Titan V GPU demonstrates that our implementati
on achieves more than 73% of the expected peak performance.
URL:https://sc18.supercomputing.org/presentation/?id=ws_cre104&sess=sess16
2
END:VEVENT
END:VCALENDAR