BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160731Z
LOCATION:D161
DTSTART;TZID=America/Chicago:20181116T103000
DTEND;TZID=America/Chicago:20181116T105000
UID:submissions.supercomputing.org_SC18_sess153_ws_espt103@linklings.com
SUMMARY:PARLOT: Efficient Whole-Program Call Tracing for HPC Applications
DESCRIPTION:Workshop\nPerformance, Productivity, Workshop Reg Pass\n\nPARL
 OT: Efficient Whole-Program Call Tracing for HPC Applications\n\nTaheri, D
 evale, Gopalakrishnan, Burtscher\n\nThe complexity of HPC software and har
 dware is quickly increasing. As a consequence, the need for efficient exec
 ution tracing to gain insight into HPC application behavior is steadily gr
 owing. Unfortunately, available tools either do not produce traces with en
 ough detail or incur large overheads. An efficient tracing method that ove
 rcomes the tradeoff between maximum information and minimum overhead is th
 erefore urgently needed. This paper presents such a method and tool, calle
 d ParLoT, with the following key features. (1) It describes a technique th
 at makes low-overhead on-the-fly compression of whole-program call traces 
 feasible. (2) It presents a new, highly efficient, incremental trace-compr
 ession approach that reduces the trace volume dynamically, which lowers no
 t only the needed bandwidth but also the tracing overhead. (3) It collects
  all caller/callee relations, call frequencies, call stacks, as well as th
 e full trace of all calls and returns executed by each thread, including i
 n library code. (4) It works on top of existing dynamic binary instrumenta
 tion tools, thus requiring neither source-code modifications nor recompila
 tion. (5) It supports program analysis and debugging at the thread, thread
 -group, and program level. \n\nThis paper establishes that comparable capa
 bilities are currently unavailable. Our experiments with the NAS parallel 
 benchmarks running on the Comet supercomputer with up to 1,024 cores show 
 that ParLoT can collect whole-program function-call traces at an average t
 racing bandwidth of just 56 kB/s per core.
URL:https://sc18.supercomputing.org/presentation/?id=ws_espt103&sess=sess1
 53
END:VEVENT
END:VCALENDAR

