<span class="var-sub_title">Flowzilla: A Methodology for Detecting Data Transfer Anomalies in Research Networks</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Innovating the Network for Data Intensive Science (INDIS)


Flowzilla: A Methodology for Detecting Data Transfer Anomalies in Research Networks

Abstract: Research networks are designed to support high volume scientific data transfers that span multiple network links. Like any other network, research networks experience anomalies. Anomalies are deviations from profiles of normality in a science network’s traffic levels. Diagnosing anomalies is critical both for network operators and scientists. In this paper, we present Flowzilla, a general framework for detecting and quantifying anomalies on scientific data transfers of arbitrary size. Flowzilla incorporates Random Forest Regression for predicting the size of data transfers and utilizes an adaptive threshold mechanism for detecting outliers. Our results demonstrate that our framework achieves up to 92.5% detection accuracy. Furthermore, we are able to predict data transfer sizes up to 10 weeks after training with accuracy above 90%.

Archive Materials


Back to Innovating the Network for Data Intensive Science (INDIS) Archive Listing

Back to Full Workshop Archive Listing