Resources

Datasets developped at the University of Ottawa / NLP Lab

Emotion-stimulus data from the paper: Diman Ghazi, Diana Inkpen & Stan Szpakowicz (2015). Detecting Emotion Stimuli in Emotion-Bearing Sentences. Proceedings of the 16th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2015), Cairo, Egypt, pdf.
Hockey data from the paper: Josh Wesisbock, Herna Viktor, and Diana Inkpen. Using Performance Metrics to Forecast Success in the National Hockey League. In Proceedings of the Machine Learning and Data Mining for Sports Analytics workshop at ECML/PKDD 2013, Sept 2013, Prague, Czech Republic, pdf.
Hockey games textual reports from the paper: Josh Weissbock and Diana Inkpen. Combining Textual Pre-game Reports and Statistical Data for Predicting Success in the National Hockey League. In Proceedings of the 25th Canadian Conference on Artificial Intelligence (AI 2014), Montreal, QC, Canada, May 2014, pp. 251-262, pdf.
Twitter data annotated with topic labels from the paper: Josh Weissbock, Ahmed A. A. Esmin, and Diana Inkpen: Using External Information for Classifying Tweets. BRACIS 2013, Fortaleza, Brazil, pdf.
Flame dictionary (offenssive langauge exoporessions) from the paper: Amir H. Razavi, Diana Inkpen, Stan Matwin and Sasha Uritsky, "Offensive Language Detection Using Multi-level Classification", in Proceedings of the 23rd Canadian Conference on Artificial Intelligence (AI 2010), Ottawa, ON, Canada, May 2010, pp. 16-27, pdf draft. In the file, the expressions are marks with flame level 5 to 1, from higher to lower. Here is the annotated dataset of sentences, some collected by us, some from the related work paper indicated in our paper.
Twitter data with annotated location expressions at city, state/province, and country level, from the paper: Diana Inkpen, Ji Liu, Atefeh Farzindar, Farzaneh Kazemi and Diman Ghazi Location Detection and Disambiguation from Twitter Messages, Proceedings of the 16th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2015), LNCS 9042 Cairo, Egypt, pp. 321-332, 2015. pdf

Twitter data annotated with stance toward multipe targets from the paper: Parinaz Sobhani, Diana Inkpen & Xiaodan Zhu, A Dataset for Multi-Target Stance Classification, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017), Valencia, Spain, pp. 551-557, 2017. pdf

Metaphor annotation for poetry from the paper: Vaibhav Kesarwani, Diana Inkpen, Stan Szpakowicz, and Chris Tanasescu. Metaphor Detection in a Poetry Corpus. In Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2017), ACL 2017, Vancouver BC, Canada, August 2017, pdf.

Training and test data from the paper: Hanqing Zhou, Amal Zouaq, and Diana Inkpen. DBpedia Entity Type Detection using Entity Embeddings and N-Gram Models. In Proceedings of the International Conference on Knowledge Engineering and Semantic Web (KESW 2017), Szczecin, Poland, Nov 2017.

Twitter data annotated with depressions levels for tweets and for users from the paper: Zunaira Jamil, Diana Inkpen, Prasadith Buddhitha, and Kenton White. Monitoring Tweets for Depression to Detect At-risk Users. In Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology - From Linguistic Signal to Clinical Reality (CLPsych 2017), at ACL 2017, Vnacouver, BC, Aug 2017, pdf. we can make it available after you sign this data sharing agreement