<span class="var-sub_title">Large-Scale Hierarchical K-Means for Heterogeneous Many-Core Supercomputers</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Large-Scale Hierarchical K-Means for Heterogeneous Many-Core Supercomputers


Authors: Liandeng Li (Tsinghua University; National Supercomputing Center, Wuxi), Teng Yu (University of St Andrews), Wenlai Zhao (Tsinghua University; National Supercomputing Center, Wuxi), Haohuan Fu (Tsinghua University; National Supercomputing Center, Wuxi), Chenyu Wang (University of St Andrews; National Supercomputing Center, Wuxi), Li Tan (Beijing Technology and Business University), Guangwen Yang (Tsinghua University; National Supercomputing Center, Wuxi), John Thomson (University of St Andrews)

Abstract: This paper presents a novel design and implementation of k-means clustering algorithm targeting the Sunway TaihuLight supercomputer. We introduce a multi-level parallel partition approach that not only partitions by dataflow and centroid, but also by dimension. Our multi-level (nkd) approach unlocks the potential of the hierarchical parallelism in the SW26010 heterogeneous many-core processor and the system architecture of the supercomputer.

Our design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability, significantly improving the capability of k-means over previous approaches. The evaluation shows our implementation achieves performance of less than 18 seconds per iteration for a large-scale clustering case with 196,608 data dimensions and 2,000 centroids by applying 4,096 nodes (1,064,496 cores) in parallel, making k-means a more feasible solution for complex scenarios.



Presentation: file


Back to Technical Papers Archive Listing