Natural Coordinate Generation, Optimal Source Coding, and Removable Noise Elimination Using Replicator Neural Networks

Author: Robert Hecht-Nielsen (HNC Software Inc. and Department of Electrical and Computer Engineering Institute for Neural Computation, University of California, San Diego)

Abstract: The a priori probability structure of any real-world data source can be modeled as a distribution F on 瀉. For convenience, only memoryless sources will be considered here (although many of the results can be applied or extended to sources with memory). Real-world data sources almost always have a highly constrained internal structure F. This paper describes a new universal model for real-world data sources, the data manifold. Data manifolds have several useful attributes, most important of which is a special, essentially unique, coordinate system: natural coordinates. Natural coordinates have three desirable properties in the limit of large data manifold dimension (also: any coordinates possessing all these properties must be natural coordinates). First, they conform to the intrinsic a priori probability structure of the data in the sense that equal natural coordinate volumes have equal probability. Second, when uniformly quantized, natural coordinates maximally preserve information about the source data in comparison with all other possible coordinate systems (i.e., they are Shannon-optimal source codes). Third, at every point in the data space the tangent vectors of the individual natural coordinate curves are mutually perpendicular (and thus 'statistically independent'). Although past attempts at defining coordinate system representations for general data have yielded one of these attributes at a time, natural coordinates are the first to combine all of them. It is therefore argued that natural coordinates be adopted as the proper curvilinear generalization to arbitrary sources of principal components coordinates for Gaussian data sources. Replicator neural networks (a type of three hidden layer multilayer perceptron) trained on randomly chosen data vectors from a source will, in the limit of a specified refinement process, automatically build a set of natural coordinates for that source. Being able to use natural coordinates as a highly-reduced-dimensionality representation may someday prove to be of value in process modeling, pattern recognition, complex system control, etc. (although, at present, applications are limited because the technology for training such large, deep networks is not well developed). Finally, a new general (and widely-applicable) noise model is introduced and it is noted that a replicator neural network trained on data vectors contaminated with such removable noise (i.e., using each noise-contaminated training vector as both the network's input and as its output target) will (in the limit of a specified refinement process) automatically learn to completely remove it. This may help explain a large body of anecdotal observations about the 'data cleaning' capabilities of such networks. This talk will present the precise definitions and results and outline the proofs.