Information Theoretic Novelty Detection
Maurizio Filippone (University of Glasgow)
Friday 12th November, 2010 15:00-16:00 325
In this talk, we present an approach to novelty (or outlier) detection problems in the case of limited availability of data to reliably estimate the statistics of the underlying model. The proposed method is based on quantifying the expected information content of a new observation in the null hypothesis that it has been generated from the same distribution as the available observed data. In the case of the Gaussian distribution, this approach is analytically tractable and closely related to classical statistical tests, as the expected information content is independent from the statistics of the generating distribution. Such a test naturally takes into account the variability of the statistics due to the finite sample effect, and thus it allows to control the false positive rate even when only a small set of observations is available. We first discuss the extension of this idea by proposing an approximation scheme to evaluate the information content of a new observation when the generating distribution is a mixture of Gaussians. We then present an extension to autoregressive time series with Gaussian noise. The experiments conducted on synthetic and real data show that this method leads to a tight control over the false positive rate while retaining a good accuracy in detecting important novelties.