BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160728Z
LOCATION:D165
DTSTART;TZID=America/Chicago:20181112T153000
DTEND;TZID=America/Chicago:20181112T160000
UID:submissions.supercomputing.org_SC18_sess161_ws_pmbsf107@linklings.com
SUMMARY:Is Data Placement Optimization Still Relevant on Newer GPUs?
DESCRIPTION:Workshop\nBenchmarks, Parallel Programming Languages, Librarie
 s, and Models, Performance, Simulation, Workshop Reg Pass\n\nIs Data Place
 ment Optimization Still Relevant on Newer GPUs?\n\nBari, Stoltzfus, Lin, L
 iao, Emani...\n\nModern supercomputers often use Graphic Processing Units 
 (or GPUs) to meet the evergrowing demands for energy efficient high perfor
 mance computing. GPUs have a complex memory architecture with various type
 s of memories and caches, such as global memory, shared memory, constant m
 emory, and texture memory. Data placement optimization, i.e. optimizing th
 e placement of data among these different memories, has a significant impa
 ct on the performance of HPC applications running on early generations of 
 GPUs. However, newer generations of GPUs have new memory features. They al
 so implement the same high-level memory hierarchy differently.\n\nIn this 
 paper, we design a set of experiments to explore the relevance of data pla
 cement optimizations on several generations of NVIDIA GPUs, including Kepl
 er, Maxwell, Pascal, and Volta. Our experiments include a set of memory mi
 crobenchmarks, CUDA kernels and a proxy application. The experiments are c
 onfigured to include different CUDA thread blocks, data input sizes, and d
 ata placement choices. The results show that newer generations of GPUs are
  less sensitive to data placement optimization compared to older ones, mos
 tly due to improvements to caches of the global memory.
URL:https://sc18.supercomputing.org/presentation/?id=ws_pmbsf107&sess=sess
 161
END:VEVENT
END:VCALENDAR

