** Author: **
Zoubin Ghahramani , Sam Roweis and Geoffrey Hinton
(University of Toronto)

** Abstract: **
Since early in the modern history of neural networks we have known
that principal component analysis (PCA) can be implemented using a
linear autoencoder network (Baldi and Hornik, 1989). The data is fed
both as the input and target of the network, and the network
parameters are learned using the squared error cost function. I will
show that Factor Analysis and Mixtures of Gaussians can also be
implemented in this manner, albeit with a different cost function. The
cost function is the usual squared error plus a regularizer, which has
exactly the same form for Factor Analysis and for Mixtures of
Gausssians. In general, autoencoders can be seen as a framework with
which to implement gradient versions of the EM algorithm for learning
probabilistic models: the lower (recognition) portion of the network
computes or approximates the posterior distribution of the hidden
units given the inputs; using this distribution, the upper
(generation) portion of the network is trained to maximize the
likelihood of the data (Hinton and Zemel, 1994).

Baldi and Hornik (1989) Neural networks and principal components
analysis: Learning from examples without local minima. Neural
Networks, 2: 53-58.

Hinton and Zemel (1994) Autoencoders, Minimum Description Length and
Helmholtz Free Energy. NIPS 6:3-10.