Supervised versus Unsupervised Learning by Feedforward Networks: Accuracy and Efficiency Results

Author: Nathalie Japkowicz

Abstract: The purpose of this presentation is to compare the accuracy and efficiency of a supervised classifier and its unsupervised counterpart on several domains. The supervised system is implemented using the standard discrimination-based multi-layer perceptron on positive and negative instances of the problem while the unsupervised system learns the domains by training an autoassociator to recognize instances of the problem. Efficiency is tested on a 2-D non-linear multi-modal artificial idealization of real-world domains whereas accuracy is tested on this domain and several of its extensions (in the past, real-world domains have been used to compare the two system's accuracy).

The results obtained in these experiments indicate that unsupervised learning can be much more efficient than supervised learning (up to 35 times more efficient) because the two systems spontaneously select different learning strategies. The unsupervised learning method uses a bottom-up strategy which is practically instantaneous whereas the supervised learning method uses a top-down strategy which yields a long initial latency period (more detail about this result are available here). Furthermore, the two approaches exhibit different accuracy strengths and weaknesses depending on the amount and type of specialization required by the domain on which they are tested. Five types of domains were isolated on which the unsupervised network is more accurate than the supervised one and one type of domain was found to be more appropriate for the supervised network than for the unsupervised one.

These results suggest the following two conclusions:

Althoug unsupervised learning is not the most natural approach to classification, it may present several advantages over supervised approaches: concept-learning requires only positive instances of the concept, it may be more efficient and more accurate than concept-learning by a supervised system.

It would be worthwhile to combine the autoassociator and the discrimination network using the 1991 mixture-of-experts framework of Jacobs, Jordan, Nowlan and Hinton. This framework uses a divide-and-conquer strategy which could be used to partition the input space into regions better suited to the autoassociator and regions better suited to the discrimination network, and to assign the appropriate network to each region. Such a scheme should improve the accuracy of both the supervised and the unsupervised methods.

Note: If time considerations prevent me from presenting both the efficiency and accuracy results during the workshop, only the accuracy results will be presented.