<span class="var-sub_title">Mitigating Performance and Progress Variability in Iterative Asynchronous Algorithms</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Mitigating Performance and Progress Variability in Iterative Asynchronous Algorithms


Student: Justs Zarins (University of Edinburgh)
Supervisor: Michele Weiland (University of Edinburgh)

Abstract: Large HPC machines are susceptible to irregular performance. Factors like chip manufacturing differences, heat management, and network congestion combine to result in varying execution time for the same code and input sets. Asynchronous algorithms offer a partial solution. In these algorithms, fast workers are not forced to synchronize with slow ones. Instead they continue computing updates, and moving toward the solution, using the latest data available to them, which may have become stale (i.e. a number of iterations out of date compared to the most recent data). While this allows for high computational efficiency, the convergence rate of asynchronous algorithms tends to be lower.

To address this problem, we are using the unique properties of asynchronous algorithms to develop load balancing strategies for iterative asynchronous algorithms in both shared and distributed memory. Our poster shows how our solution attenuates noise, resulting in significant reduction progress imbalance and time-to-solution variability.


ACM-SRC Semi-Finalist: no

Poster: PDF
Poster Summary: pdf
Reproducibility Description Appendix: PDF


Back to Poster Archive Listing