## Innovating the Network for Data Intensive Science (INDIS)

**Fast Detection of Elephant Flows with Dirichlet-Categorical Inference**

**Abstract:** The problem of elephant flow detection is a longstanding research area with the goal of quickly identifying flows in a network that are large enough to affect the quality of service of smaller flows. Past work in this field has largely been either domain-specific, based on thresholds for a specific flow size metric, or required several hyperparameters, reducing their ease of adaptation to the great variety of traffic distributions present in real-world networks. In this paper, we present an approach to elephant flow detection that avoids these limitations, utilizing the rigorous framework of Bayesian inference. By observing packets sampled from the network, we use Dirichlet-Categorical inference to calculate a posterior distribution that explicitly captures our uncertainty about the sizes of each flow. We then use this posterior distribution to find the most likely subset of elephant flows under this probabilistic model. Our algorithm rapidly converges to the optimal sampling rate at a speed O(1/n), where n is the number of packet samples received, and the only hyperparameter required is the targeted detection likelihood, defined as the probability of correctly inferring all the elephant flows. Compared to the state-of-the-art based on static sampling rate, we show a reduction in error rate by a factor of 20 times. The proposed method of Dirichlet-Categorical inference provides a novel, powerful framework to elephant flow detection that is both highly accurate and probabilistically meaningful.

**Archive Materials**

Back to Innovating the Network for Data Intensive Science (INDIS) Archive Listing

Back to Full Workshop Archive Listing