Title: Analysis of Microblogs Data (Twitter) using Data
Warehouse Techniques
Abstract
Twitter is an online social networking and microblogging service for
short text messages that imposes a limit of 140 characters for each
message. There are hundreds of millions of tweets posted everyday by
users around the world regarding many different topics. Thus, there
is a need to analyze these Twitter data by performing different
natural language processing and data mining techniques, such as
extracting important information, clustering into relevant
topics/events, classify into specific classes, and retrieving
relevant information based on user queries.. One of the important
aspects is retrieving information from tweets based on free
texts (e.g., questions, full sentences, titles, sets of keywords).
Therefore, we have built a model that can retrieve information for
texts using database and data warehouse techniques. We built our
model using the Twitter2011 corpus from the TREC 2011 Microblog
track provided by the Text REtrieval Conference (TREC). We tested
our model using 50 queries that were provided by the TREC 2011
Microblog track. Our evaluation shows that our model is able to
retrieve relevant information for the 50 queries achieving a Mean
Average Precision of 44% and a Precision @30 of 91%.