FadiAbuSheikha

Title: Analysis of Microblogs Data (Twitter) using Data Warehouse Techniques

Abstract
Twitter is an online social networking and microblogging service for short text messages that imposes a limit of 140 characters for each message. There are hundreds of millions of tweets posted everyday by users around the world regarding many different topics. Thus, there is a need to analyze these Twitter data by performing different natural language processing and data mining techniques, such as extracting important information, clustering into relevant topics/events, classify into specific classes, and retrieving relevant information based on user queries.. One of the important aspects is retrieving information from tweets based on free texts (e.g., questions, full sentences, titles, sets of keywords). Therefore, we have built a model that can retrieve information for texts using database and data warehouse techniques. We built our model using the Twitter2011 corpus from the TREC 2011 Microblog track provided by the Text REtrieval Conference (TREC). We tested our model using 50 queries that were provided by the TREC 2011 Microblog track. Our evaluation shows that our model is able to retrieve relevant information for the 50 queries achieving a Mean Average Precision of 44% and a Precision @30 of 91%.