Tivan: A Scalable Data Collection and Analytics Cluster
HPC Center Planning and Operations
State of the Practice
TimeMonday, November 12th10:50am - 11:10am
DescriptionLog analysis is a critical part of every data center to diagnose system issues and determine performance problems. As data centers increase in size, the need for a dedicated, scalable log analysis system becomes necessary. At Los Alamos National Laboratory, our clusters and support infrastructure are currently producing over seventy million logs a day. The large amount of log data makes it extremely difficult for system administrators, application developers, and researchers to find relevant events and perform detailed analysis. In order to manage this data and to run scalable analytics, we designed and deployed a data analytics cluster based on open source software. In this paper, we describe the design and trade-offs, the current uses of this system, and analytics utilizing the system that are aiding in the discovery of system issues. We have found that this data analytics cluster has helped us to proactively identify cluster issues more efficiently.