Shell Theory:
A Statistical Model of Reality

-->

Introduction

The foundational assumption of machine learning is that the data under consideration is separable into classes; while intuitively reasonable, separability constraints have proven remarkably difficult to formulate mathematically. We believe this problem is rooted in the mismatch between existing statistical techniques and commonly encountered data; object representations are typically high dimensional but statistical techniques tend to treat high dimensions a degenerate case. To address this problem, we develop a dedicated statistical framework for machine learning in high dimensions. The framework derives from the observation that object relations form a natural hierarchy; this leads us to model objects as instances of a high dimensional, hierarchal generative processes. Using a distance based statistical technique, also developed in this paper, we show that in such generative processes, instances of each process in the hierarchy, are almost-always encapsulated by a distinctive-shell that excludes almost-all other instances. The result is shell theory, a statistical machine learning framework in which separability constraints (distinctive-shells) are formally derived from the assumed generative process. Paper, Github (Python Code)

Idea

Today, a great deal of data can be considered high dimensional. Images, typically represented by deep-learned features, reside in high-dimension spaces. In high dimensional space a unique phenomenon takes place. At the heart of shell theory lies the quirky relation between the radius and volumes of high dimensional hyper-spheres. Consider two \(k\) dimensional hyper-spheres, one of radius \(r\) and another of radius \(r-\Delta r\), where \(\Delta r\) is small but positive. The ratio of their volumes is:

\[\frac{\mbox{volume of small sphere}}{\mbox{volume of big sphere}} = \left( \frac{r-\Delta r}{r}\right )^k = x^k\]

where \(x\) is less than one. If \(k \to \infty\), the above ratio tends to zero. This implies that the small change in radius has induced a huge change in volume. As a result, almost-all of a high dimensional hyper-sphere's volume will lie near its surface (outer-shell).

If we conceptualize a high dimensional sample space as a hyper-sphere, data points (stochastically generated instances) will almost-surely lie on the hyper-sphere’s outer-shell, as this contains almost-all of the hyper-sphere’s volume. It is this phenomenon that gives rise to the shell based constraints from which shell theory derives its name.

This quirky relation between volume and radius explains why distinctive-shells are distinctive. If the objects in our reality were the result of a high dimensional hierarchical generative process, they would represent a very sparse sampling of the huge number of potential objects. The sparsity of this sampling would mean that sets of instances that are related by some common ancestor, would be separated from other instances by wide gulfs of emptiness. This creates naturally occurring boundaries that can be exploited for machine learning.

Shell theory provides the (previously missing) bridge between machine learning algorithms and our implicit assumptions on the nature of reality. This brings a refreshingly new perspective to many old problems. Examples discussed in this paper include:

  1. Surprisingly simple manifold discovery: Distinctive-shells are probably the often hypothesized semantic manifold. As distinctive-shells can be directly estimated from data statistics, complex manifold discovery algorithms can be replaced with four lines of Python code;
  2. Shell-Normalization: We show how the sensitivity of distinctive-shell’s can be tailored to specific problems through a data pre-processor which we term shell-normalization. From this perspective, traditional normalization is a special case of shell- normalization which is tailored to the multi-class classification problem.
  3. From One Class to Multiple Classes: Shell theory provides a framework whereby independently trained one-class learners can be fused to form a multi-class classifier. This unifies the two problems into a learning mechanics which is eerily similar to that employed by humans.
  4. Anomaly Detection:Shell theory shows how anomaly detec- tion can be formulated as a robust least squares shell-fitting. This allows for the development of algorithms that are simple, fast and reliable.
  5. Statistical Maximum Distance:The geometric maximum distance between two unit-normalized data-points is 2 . Shell theory makes the quirky prediction that there exists a “statistical maximum distance” of \(\sqrt{2}\) , which is much less than the geometric maximum.

Results

One-class Learning

missing
AUROC scores of one-class learners. The best algorithm in each group is highlighted in bold. Observe that shell learning is competitive with far more sophisticated manifold fitting techniques. Suffix "I" denotes raw images, "F" denotes image features and "FN" denotes features normalized with the mean of test data.

Anomaly Detection

Anomaly based ranking of airplane images crawled from the internet. Anomaly scores decrease from left to right, top to bottom. Shell based anomaly detection provides rankings which are noticeably more consistent.
missing
Anomaly detection using one-class SVM.
missing
Anomaly detection using the prposed shell theory with re-normalization.

Links:

[Online Early Access] Shell Theory: A Statistical Model of Reality

The early access version of our PAMI paper on Shell Theory. Citations are appreciated if you find it useful.


Dimensionality’s Blessing: Clustering Images by Underlying Distribution

The first work that started us on the investigation into high-dimensional statistics.