<span class="var-sub_title">Pufferbench: Evaluating and Optimizing Malleability of Distributed Storage</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

PDSW-DISCS: Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems


Pufferbench: Evaluating and Optimizing Malleability of Distributed Storage

Authors: Nathanael Cheriere (IRISA, ENS Rennes)

Abstract: Malleability is the property of an application to be dynamically rescaled at run time. It requires the possibility to dynamically add or remove resources to the infrastructure without interruption. Yet, many Big Data applications cannot benefit from their inherent malleability, since their colocated distributed storage system is not malleable in practice. Commissioning or decommissioning storage nodes is generally assumed to be slow, as such operations have typically been designed for maintenance only. New technologies, however, enable faster data transfers. Still, evaluating the performance of rescaling operations on a given platform is a challenge in itself: no tool currently exists for this purpose.

We introduce Pufferbench, a benchmark for evaluating how fast one can scale up and down a distributed storage system on a given infrastructure and, thereby, how viably can one implement storage malleability on it. Besides, it can serve to quickly prototype and evaluate mechanisms for malleability in existing distributed storage systems. We validate Pufferbench against theoretical lower bounds for commission and decommission: it can achieve performance within 16% of them. We use Pufferbench to evaluate in practice these operations in HDFS: commission in HDFS could be accelerated by as much as 14 times! Our results show that: (1) the lower bounds for commission and decommission times we previously established are sound and can be approached in practice; (2) HDFS could handle these operations much more efficiently; most importantly, (3) malleability in distributed storage systems is viable and should be further leveraged for Big Data applications.


Archive Materials


Back to PDSW-DISCS: Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems Archive Listing

Back to Full Workshop Archive Listing