Using Objective Interestingness Measures in Classification rules


Isis Peña


Data mining concerns with the discovery of important data patterns that may in turn became knowledge. However, a data mining system can generate hundreds and even thousands of patters; then, an obvious question arises: are all those patterns interesting?


Interestingness measures play an important role in data mining. The goal of these measures is to select and rank the discovered patterns according to their potential interest to the user. For classification rules, that are used to predict the category to which a new example will belong, the most important role of interestingness measures is to act as heuristics to choose the attribute-value pairs for inclusion in the rule set.


Three popular algorithms that use interestingness measures in this way are evaluated: C4.5 that uses entropy, CN2 with Laplace and CART using Gini. We show experiments run over 6 different datasets, where the following criteria are evaluated: the predicted accuracy, the number of rules in the rule set and the complexity of the rules generated. We provide experimental results and we draw conclusions considering the best trade-off between the predicted accuracy and the number and complexity of the rules.