Scale-Free Graph Processing on a NUMA Machine
Authors: Tanuj K. Aasawat (University of British Columbia), Tahsin Reza (University of British Columbia), Matei Ripeanu (University of British Columbia)
Abstract: Modern shared-memory systems embrace the NUMA architecture which has proven to be more scalable than the SMP architecture. In many ways, a NUMA system resembles a shared-nothing distributed system: physically distinct processing units and memory regions. Memory accesses to remote NUMA domains are more expensive than local accesses. This poses the opportunity to transfer the know-how and design of distributed graph processing to develop shared-memory graph processing solutions optimized for NUMA systems. To this end, we explore if a distributed-memory like middleware that makes graph partitioning and communication between partitions explicit, can improve the performance on a NUMA system. We design and implement a NUMA aware graph processing framework that embraces design philosophies of distributed graph processing system: in particular explicit partitioning and inter-partition communication, and at the same time exploits optimization opportunities specific to single-node systems. We demonstrate up to 13.9x speedup over a state-of-the-art NUMA-aware framework, Polymer and up to 3.7x scalability on a four-socket NUMA machine using graphs with tens of billions of edges.
Back to IA^3 2018: 8th Workshop on Irregular Applications: Architectures and Algorithms Archive Listing
Back to Full Workshop Archive Listing