CS 8001 INF: Information Security Seminar
Fall 2007
October 30, 2007
3:00 Klaus 1116 E
Nina Taft
Intel Research Berkeley
Anomaly Detection in Large Networks using Approximation Techniques
A tremendous enthusiasm for amassing enormous amounts of network
measurement data has spurred the development of numerous applications
that incorporate data mining techniques. In this talk we question the
hidden assumption in these applications that one needs to collect "all
the data all the time". We consider this question in the context of an
anomaly detection application. We study the popular "Subspace method
detector" that is based on PCA analysis. This method normally collects
data from many parts of the network, centralizes the data, and then
analyzes it to uncover anomalies. In our research, we ask whether we can
throw away some of the data. Can we still do anomaly detection
accurately without all the data?
To avoid backhauling large amounts of data through networks, we present a framework that couples filtering at local monitors with centralized detectors that can operate on approximate views of the global data (i.e. network state). We show that the errors made by the central detector - due to the use of approximate data - can be upper bounded using matrix perturbation theory. The challenge is to design the filtering parameters; these are determined by the bound on detection errors and the criteria being tracked for detection. Our approximate anomaly detector can detect anomalies with 80 to 90% less data than the original method, and incurs less than a 1% reduction in detection accuracy. Finally, we comment on issues and future directions for data reduction in the context of anomaly detection.

