Datasets developped at the University of Ottawa / NLP Lab
-
Emotion-stimulus
data from the paper:
Diman Ghazi, Diana Inkpen & Stan Szpakowicz (2015). Detecting Emotion
Stimuli in Emotion-Bearing Sentences. Proceedings of the 16th
International Conference on Intelligent Text Processing and Computational
Linguistics (CICLing 2015), Cairo, Egypt, pdf.
-
Hockey data from the paper:
Josh Wesisbock, Herna Viktor, and Diana Inkpen. Using Performance Metrics
to Forecast Success in the National Hockey League. In Proceedings of the
Machine Learning and Data Mining for Sports Analytics workshop at
ECML/PKDD 2013, Sept 2013, Prague, Czech Republic,
pdf.
- Hockey games textual reports from the paper:
Josh Weissbock and Diana Inkpen. Combining Textual Pre-game Reports and
Statistical Data for Predicting Success in the National Hockey League. In
Proceedings of the 25th Canadian Conference on Artificial Intelligence (AI
2014), Montreal, QC, Canada, May 2014, pp. 251-262, pdf.
- Twitter data annotated with topic labels
from the paper:
Josh Weissbock, Ahmed A. A. Esmin, and Diana Inkpen: Using External
Information for Classifying Tweets. BRACIS 2013, Fortaleza, Brazil,
pdf.
-
Flame dictionary (offenssive langauge exoporessions)
from
the paper: Amir H. Razavi, Diana Inkpen, Stan Matwin and Sasha Uritsky,
"Offensive Language Detection Using Multi-level Classification", in
Proceedings of the 23rd Canadian Conference on Artificial Intelligence (AI
2010), Ottawa, ON, Canada, May 2010, pp. 16-27, pdf
draft. In the file, the expressions are marks with flame level 5 to
1, from higher to lower. Here is the
annotated dataset of sentences, some collected by us, some from the
related work paper indicated in our paper.
-
Twitter data with annotated location expressions at city, state/province,
and country level, from the paper: Diana Inkpen, Ji Liu, Atefeh
Farzindar, Farzaneh Kazemi and Diman Ghazi Location Detection and
Disambiguation from Twitter Messages, Proceedings of the 16th
International Conference on Intelligent Text Processing and Computational
Linguistics (CICLing 2015), LNCS 9042 Cairo, Egypt, pp. 321-332, 2015.
pdf
- Twitter data annotated with stance toward
multipe targets from the paper:
Parinaz Sobhani, Diana Inkpen & Xiaodan Zhu, A Dataset for Multi-Target Stance Classification, Proceedings of the
15th Conference of the European Chapter of
the Association for Computational Linguistics (EACL 2017), Valencia,
Spain, pp. 551-557, 2017.
pdf
- Metaphor annotation
for poetry from the
paper:
Vaibhav Kesarwani, Diana Inkpen, Stan
Szpakowicz, and Chris Tanasescu. Metaphor Detection in a Poetry Corpus. In
Proceedings of the
Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage,
Social Sciences, Humanities and Literature (LaTeCH-CLfL 2017), ACL 2017,
Vancouver BC, Canada, August 2017, pdf.
- Training and test data from the
paper: Hanqing Zhou, Amal Zouaq, and Diana Inkpen. DBpedia Entity Type Detection using Entity Embeddings and N-Gram
Models.
In Proceedings of
the International Conference on Knowledge Engineering and Semantic Web
(KESW 2017), Szczecin, Poland, Nov 2017.
- Twitter data annotated with depressions
levels for tweets and for users from the paper: Zunaira Jamil, Diana
Inkpen, Prasadith Buddhitha, and Kenton White.
Monitoring Tweets for Depression to Detect At-risk Users. In Proceedings
of the Fourth Workshop on Computational Linguistics and Clinical
Psychology - From Linguistic Signal to Clinical Reality (CLPsych 2017),
at ACL 2017, Vnacouver, BC, Aug 2017, pdf.
we can make it available after you sign
this data sharing
agreement