BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160728Z
LOCATION:D161
DTSTART;TZID=America/Chicago:20181112T163000
DTEND;TZID=America/Chicago:20181112T165000
UID:submissions.supercomputing.org_SC18_sess158_ws_lasalss110@linklings.co
m
SUMMARY:A General-Purpose Hierarchical Mesh Partitioning Method with Node
Balancing Strategies for Large-Scale Numerical Simulations
DESCRIPTION:Workshop\nAlgorithms, Heterogeneous Systems, Resiliency, Works
hop Reg Pass\n\nA General-Purpose Hierarchical Mesh Partitioning Method wi
th Node Balancing Strategies for Large-Scale Numerical Simulations\n\nKong
, Stogner, Gaston, Peterson, Permann...\n\nLarge-scale parallel numerical
simulations are essential for a wide range of engineering problems\nthat
involve complex, coupled physical processes interacting across a broad ra
nge of spatial\nand temporal scales. The data structures involved in suc
h simulations (meshes, sparse matrices, etc.) are frequently represented a
s graphs, and these graphs must be optimally partitioned across the availa
ble computational resources in order for the underlying calculations to sc
ale efficiently. Partitions which minimize the number of graph edges that
are cut (edge-cuts) while simultaneously maintaining a balance in the amou
nt of work (i.e. graph nodes) assigned to each processor core are desirabl
e, and the performance of most existing partitioning software begins to de
grade in this metric for partitions with more than than $O(10^3)$ processo
r cores. In this work, we consider a general-purpose hierarchical partitio
ner which takes into account the existence of multiple processor cores and
shared memory in a compute node while partitioning a graph into an arbitr
ary number of subgraphs. We demonstrate that our algorithms significantly
improve the preconditioning efficiency and overall performance of realisti
c numerical simulations running on up to 32,768 processor cores with near
ly $10^9$ unknowns.
URL:https://sc18.supercomputing.org/presentation/?id=ws_lasalss110&sess=se
ss158
END:VEVENT
END:VCALENDAR