Aminul Islam


Indirect Semantic PMI Method for Determining the Semantic Similarity of Words


 

Abstract:
 

This talk presents a corpus-based method for calculating the semantic similarity of two words. The method, we have called indirect semantic PMI, uses Pointwise Mutual Information to sort lists of important neighbor words of the two words. Then we consider the words which are common in both list and aggregate the PMI values of those common words to calculate the relative semantic similarity. Our method was empirically evaluated using Miller and Charler’s 30-pair subset, Rubenstein and Goodenough’s 65 noun pairs, 80 synonym test questions from the Test of English as a Foreign Language (TOEFL), and 50 synonym test questions from a collection of English as a Second Language (ESL) tests. For Miller and Charler’s dataset, we got a correlation of 0.7435 with the human judges. For Rubenstein and Goodenough’s dataset we got a correlation of 0.7285. For the 80 TOEFL questions, the method correctly answered 76.25% of the questions, whereas 68% of the answers were correct for the 50 ESL questions, without using the context. The talk also discusses some potential applications of the new semantic similarity method as well as its impacts on measuring the semantic similarity of texts.