Flowzilla: A Methodology for Detecting Data Transfer Anomalies in Research Networks
Abstract: Research networks are designed to support high volume scientific data transfers that span multiple network links. Like any other network, research networks experience anomalies. Anomalies are deviations from profiles of normality in a science network’s traffic levels. Diagnosing anomalies is critical both for network operators and scientists. In this paper, we present Flowzilla, a general framework for detecting and quantifying anomalies on scientific data transfers of arbitrary size. Flowzilla incorporates Random Forest Regression for predicting the size of data transfers and utilizes an adaptive threshold mechanism for detecting outliers. Our results demonstrate that our framework achieves up to 92.5% detection accuracy. Furthermore, we are able to predict data transfer sizes up to 10 weeks after training with accuracy above 90%.
Back to Innovating the Network for Data Intensive Science (INDIS) Archive Listing
Back to Full Workshop Archive Listing