Learning Probabilistic Generative Models using Autoencoders

Author: Zoubin Ghahramani , Sam Roweis and Geoffrey Hinton (University of Toronto)

Abstract: Since early in the modern history of neural networks we have known that principal component analysis (PCA) can be implemented using a linear autoencoder network (Baldi and Hornik, 1989). The data is fed both as the input and target of the network, and the network parameters are learned using the squared error cost function. I will show that Factor Analysis and Mixtures of Gaussians can also be implemented in this manner, albeit with a different cost function. The cost function is the usual squared error plus a regularizer, which has exactly the same form for Factor Analysis and for Mixtures of Gausssians. In general, autoencoders can be seen as a framework with which to implement gradient versions of the EM algorithm for learning probabilistic models: the lower (recognition) portion of the network computes or approximates the posterior distribution of the hidden units given the inputs; using this distribution, the upper (generation) portion of the network is trained to maximize the likelihood of the data (Hinton and Zemel, 1994).

Baldi and Hornik (1989) Neural networks and principal components analysis: Learning from examples without local minima. Neural Networks, 2: 53-58.

Hinton and Zemel (1994) Autoencoders, Minimum Description Length and Helmholtz Free Energy. NIPS 6:3-10.