CSI4107: Information Retrieval and the Internet
Instructor: Diana Inkpen
Office: SITE 5015
E-mail: firstname.lastname@example.org Telephone:
562-5800 ext. 6711
Meeting Times and Locations
Office Hours: TBA or by email appointment, in
Basic principles of Information Retrieval. Indexing methods. Query
processing. Linguistic aspects of Information Retrieval. Agents and
artificial intelligence approaches to Information Retrieval. Relation
Information Retrieval to the World Wide Web. Search engines. Servers
and clients. Browser and server side programming for Information
Pre-Requisites (CSI3103 or ELG3300), (CSI3125 or
CSI2115 or SEG2101) or permission from the instructor.
- The final marks are posted. If you want to see your exam, there
will be offce hours Fri, May 6, 11:30am-1pm, in SITE 5015. Here is
the solution to the exam.
- Exam preparation page
- Assignment 2 deadline was extened till Fri, Apr 1, 10pm.
- The midterm marks are posted. Here
is the solution.
- Midterm preparation page
- Assignment 1 deadline was extened till Thu, Feb 18, 10pm.
Evaluation Students will
be evaluated on:
- Two written and programming assignments / group project (15% each).
The programming language will be Java or any other.
The assignments will be submitted electronically
through Blackboard Learning. No late assignments are
- Midterm exam (15%)
- One in-class Presentation(15%)
(See Presentations schedule)
- Final exam (40%)
- Bonus points for class participation
late assignments are considered)
Introduction to Information Retrieval, by Christopher D. Manning,
Prabhakar Raghavan and Hinrich Schutze, Cambridge University
Press, 2008 (online version available)
Information Retrieval, by D. Grossman and O. Frieder, Springer,
2004 (second edition).
Another online book
Information Retrieval, by C. J. van Rijsbergen (1979)
Modern Information Retrieval, by Ricardo Baeza-Yates and
Berthier Ribeiro-Neto, 1999.
Companion website to this book.
Course notes (additional reading, pdf
to minor modifications) (The lecture
slides will be in pdf
format, you can read them with Acrobat Reader)
Credit: some of the
lecture notes are initially
designed by prof. Ray Mooney, University of Texas
Goals and history of IR. The impact of the web on IR. The
role of artificial intelligence (AI) in IR.
Internet and the WWW: History of Internet. TCP/IP. IP
addresses. WWW. HTTP. HTML. Web servers and clients.
Links: Top search
engines in US in 2010
Search engine watch
Boolean and vector-space retrieval models; ranked retrieval;
text-similarity metrics; TF-IDF (term frequency/inverse document
frequency) weighting; cosine similarity.
on Implementation of Vector Space Model
slides on cosine measure
discussed in class Solution
to the example.
Evaluation of IR: Performance metrics:
recall, precision, and F-measure; Evaluations on benchmark text
Example discussed in class Solution to example.
Query Operations and Languages:
feedback; Query expansion; Query languages.
Example discussed in class
Solution (do it by yourself first)
Dekang Lin's Demos
Image Information Retrieval
image retrieval ESP Game
for labeling images
Zipf's law; Porter stemmer; morphology; index term selection; using
thesauri. Metadata and markup languages (SGML, HTML, XML).
More slides on Web markup languages: HTML, XML,
XHTML, RDF, OWL Semantic Web and
Linked Data Links:
Semantic Web Linked Data video
term frequencies in Tom Sawyer
spidering; metacrawlers; directed spidering; link analysis (e.g. hubs
and authorities, Google PageRank); shopping agents.
Extra slides on Link Analysis: the
hubs and authorities algorithm, and the PageRank
Hubs and authorities example discussed
in class Solution (do it by
first) PageRank examples
Links: Google Tech
- Parallel architecture
Slides about the Google 1998 paper
Week 6: Feb 15-19
(Reading Week, no classes)
Feb 24, Midterm revision; Feb 26, in class:
Categorization algorithms: decision trees; Rocchio; k-nearest neighbor,
Weka data mining tool
Extra slides on Naive Bayes
Clustering algorithms: agglomerative clustering; k-means.
Applications to information filtering and organization.
Examples of text
classification and clustering discussed in class
Solution (do it by
Advanced IR Models:
Latent Semantic Indexing (LSI); Language Models.
Extra slides on LSI.
Language Models for Information
Question Answering :
Retrieving precise short answers to
natural language queries.
System Demos. Slides about IBM's Watson.
Links to IBM's Watson
IR Learning to Rank
Deep Learning for Natural Language