BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160727Z
LOCATION:D167/174
DTSTART;TZID=America/Chicago:20181112T110000
DTEND;TZID=America/Chicago:20181112T113000
UID:submissions.supercomputing.org_SC18_sess151_ws_mlhpce113@linklings.com
SUMMARY:Large-Scale Clustering Using MPI-Based Canopy
DESCRIPTION:Workshop\nDeep Learning, Machine Learning, Workshop Reg Pass\n
 \nLarge-Scale Clustering Using MPI-Based Canopy\n\nHeinis\n\nAnalyzing mas
 sive amounts of data and extracting value from it has become key across di
 fferent disciplines. Many approaches have been developed to extract insigh
 t from the plethora of data available.  As the amount of data grow rapidly
 , however, current approaches for analysis struggle to scale. This is part
 icularly true for clustering algorithms which try to find patterns in the 
 data. \n\nA wide range of clustering approaches has been developed in rece
 nt years. What they all share is that they require parameters (number of c
 lusters, size of clusters etc.) to be set a priori. Typically these parame
 ters are determined through trial and error in several iterations or throu
 gh pre-clustering algorithms. Several pre-clustering algorithms have been 
 developed, but similarly to clustering algorithms, they do not scale well 
 for the rapidly growing amounts of data.\n\nIn this paper, we thus take on
 e such pre-clustering algorithm, Canopy, and develop a parallel version ba
 sed on MPI. As we show, doing so is not straightforward and without optimi
 zation, a considerable amount of time is spent waiting for synchronization
 , severely limiting scalability. We thus optimize our approach to spend as
  little time as possible with idle cores and synchronization barriers. As 
 our experiments show, our approach scales near linear with increasing data
 set size.
URL:https://sc18.supercomputing.org/presentation/?id=ws_mlhpce113&sess=ses
 s151
END:VEVENT
END:VCALENDAR

