** Author: ** Robert Hecht-Nielsen (HNC Software Inc. and Department
of Electrical and Computer Engineering Institute for Neural Computation,
University of California, San Diego)

** Abstract: **
The a priori probability structure of any real-world data source can be
modeled as a distribution F on Âm. For convenience, only memoryless
sources will be considered here (although many of the results can be
applied or extended to sources with memory). Real-world data sources
almost always have a highly constrained internal structure F. This paper
describes a new universal model for real-world data sources, the data
manifold. Data manifolds have several useful attributes, most important
of which is a special, essentially unique, coordinate system: natural
coordinates. Natural coordinates have three desirable properties in
the limit of large data manifold dimension (also: any coordinates
possessing all these properties must be natural coordinates). First,
they conform to the intrinsic a priori probability structure of the data
in the sense that equal natural coordinate volumes have equal probability.
Second, when uniformly quantized, natural coordinates maximally preserve
information about the source data in comparison with all other possible
coordinate systems (i.e., they are Shannon-optimal source codes). Third,
at every point in the data space the tangent vectors of the individual
natural coordinate curves are mutually perpendicular (and thus
'statistically independent'). Although past attempts at defining coordinate
system representations for general data have yielded one of these attributes
at a time, natural coordinates are the first to combine all of them. It is
therefore argued that natural coordinates be adopted as the proper curvilinear
generalization to arbitrary sources of principal components coordinates for
Gaussian data sources. Replicator neural networks (a type of three hidden
layer multilayer perceptron) trained on randomly chosen data vectors from a
source will, in the limit of a specified refinement process, automatically
build a set of natural coordinates for that source. Being able to use natural
coordinates as a highly-reduced-dimensionality representation may someday prove
to be of value in process modeling, pattern recognition, complex system
control, etc. (although, at present, applications are limited because the
technology for training such large, deep networks is not well developed).
Finally, a new general (and widely-applicable) noise model is introduced
and it is noted that a replicator neural network trained on data vectors
contaminated with such removable noise (i.e., using each noise-contaminated
training vector as both the network's input and as its output target) will
(in the limit of a specified refinement process) automatically learn to
completely remove it. This may help explain a large body of anecdotal
observations about the 'data cleaning' capabilities of such networks.
This talk will present the precise definitions and results and outline the
proofs.