BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160904Z
LOCATION:C2/3/4 Ballroom
DTSTART;TZID=America/Chicago:20181113T083000
DTEND;TZID=America/Chicago:20181113T170000
UID:submissions.supercomputing.org_SC18_sess322_post255@linklings.com
SUMMARY:Optimization of Ultrasound Simulations on Multi-GPU Servers
DESCRIPTION:Poster\nTech Program Reg Pass, Exhibits Reg Pass\n\nOptimizati
 on of Ultrasound Simulations on Multi-GPU Servers\n\nVaverka, Spetko, Tree
 by, Jaros\n\nRealistic ultrasound simulations have found a broad area of a
 pplications in preoperative photoacoustic screening and non-invasive ultra
 sound treatment planing. However, the domains are typically thousands of w
 avelengths in size, leading to large-scale numerical models with billions 
 of unknowns. The current trend in accelerated computing is towards the use
  of fat nodes with multiple GPUs per node. The multi-GPU version of our k-
 Wave acoustic toolbox is based on the local Fourier basis domain decomposi
 tion where 3D simulation domain is partitioned into rectangular cuboid blo
 cks assigned to particular GPUs. This paper investigates the benefits of u
 sing the CUDA-Aware MPI and CUDA peer-to-peer transfers on an 8-GPU server
  equipped with Nvidia P40 GPUs. The server has a total GPU memory of 192 G
 B and a  single-precision performance of 96 Tflops. These techniques reduc
 es the overall simulation time a factor of 2-3.6.
URL:https://sc18.supercomputing.org/presentation/?id=post255&sess=sess322
END:VEVENT
END:VCALENDAR

