Current Direction
Problem
It is often difficult to apply textbook machine learning statistics to practical problems. We hypothesize
that it is because most statistical techniques focus on probability density. However, interesting problems
are usually high dimensional, with patterns that only emerge when we consider multiple attributes
simultaneously. As volume increases exponentially with the number of dimensions, density is ill-defined in
high dimensions, causing the failure of traditional, density based techniques. Such failures are often
interpreted as high dimensions being cursed; however, we may simply be looking at the problem from the
wrong perspective.
Approach
We argue that in high dimensions, the sample space volume is so huge that coincidental similarity between
instances becomes almost impossible. We use this intuition to formulate our statistics in terms of
distance, rather than density; and show that it provides a tool for statistical analysis of many machine
learning problems.
Application
Formal statistical techniques are important because they allow us to design algorithms which can
accommodate unseen classes. This is necessary in problems like anomaly
detection, where anomalies are (by definition) instances of unknown classes.
These are important problems in computer vision and artificial intelligence; and are diffcult to address
within our current, data-driven learning techniques.
Recent Papers
"Shell Theory: A Statistical Model of Reality", PAMI 2021.
Shell theory attempts to formalize distance-based statistics and uses it to interpret normalization, a
widely used pre-preprocessing technique.
[paper],
[link],
[code].
"Locally Varying Distance Transform for Unsupervised Visual Anomaly Detection", ECCV 2022.
We use shell theory to develop a distance-based anomaly detector for high dimensions. It is remarkably
stable over a wide range of datasets and anomaly percentages.
[paper],
[link],
[code],
[video]
"Distance Based Image Classification:
A solution to generative classification’s conundrum?", IJCV, 2022.
This paper provides a statistical framework which
reduces the problem of image classification to a
modified nearest-neighbor algorithm that we term
the distance classification.
[paper],
[link],
[code].