<span class="var-sub_title">Tivan: A Scalable Data Collection and Analytics Cluster</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

The 2nd Industry/University Joint International Workshop on Data Center Automation, Analytics, and Control (DAAC)


Tivan: A Scalable Data Collection and Analytics Cluster

Authors:

Abstract: Log analysis is a critical part of every data center to diagnose system issues and determine performance problems. As data centers increase in size, the need for a dedicated, scalable log analysis system becomes necessary. At Los Alamos National Laboratory, our clusters and support infrastructure are currently producing over seventy million logs a day. The large amount of log data makes it extremely difficult for system administrators, application developers, and researchers to find relevant events and perform detailed analysis. In order to manage this data and to run scalable analytics, we designed and deployed a data analytics cluster based on open source software. In this paper, we describe the design and trade-offs, the current uses of this system, and analytics utilizing the system that are aiding in the discovery of system issues. We have found that this data analytics cluster has helped us to proactively identify cluster issues more efficiently.

Archive Materials


Back to The 2nd Industry/University Joint International Workshop on Data Center Automation, Analytics, and Control (DAAC) Archive Listing

Back to Full Workshop Archive Listing