CSI4107: Information Retrieval and the Internet
Instructor: Diana Inkpen
Office: SITE 5015
E-mail: email@example.com Telephone:
562-5800 ext. 6711
Meeting Times and Locations
Office Hours: TBA or by email appointment, in
Basic principles of Information Retrieval. Indexing methods. Query
processing. Linguistic aspects of Information Retrieval. Agents and
artificial intelligence approaches to Information Retrieval. Relation
Information Retrieval to the World Wide Web. Search engines. Servers
and clients. Browser and server side programming for Information
Pre-Requisites (CSI3103 or ELG3300), (CSI3125 or
CSI2115 or SEG2101) or permission from the instructor.
- The final marks are posted in the Balckboard Learn system. Here is the
solution to exam. If you need to see
your exam, there will be office hours Monday April 29, 2-4pm.
- Exam preparation page
- Assignment 2 is posted.
- Midterm preparation page
- The deadline for A1 was extended till Feb 15. A new version of the
corpus for Assignment 1 was made available here. Please use this version instead of
en.zip because it is more complete.
- Assignment 1 is posted.
Evaluation Students will
be evaluated on:
- Two written and programming assignments / group project (30%). The
programming language will be Java.
The assignments will be submitted electronically
through Virtual Campus (the new version called Blackboard Learning). No
late assignments are
- Midterm exam (15%)
- One in-class Presentation(15%)
- Final exam (40%)
- Bonus points for class participation
late assignments are considered)
- Assignment 1, due Mon Feb 11,
extended till Fri Feb 15, 22:00.
- In-class presentation.
- Midterm (Fri, March 1, 14:30, in class) Solutions
- Assignment 2, due Mon, April 1,
extended till April 8, 22:00.
- Final exam (during exam period)
Introduction to Information Retrieval, by Christopher D. Manning,
Prabhakar Raghavan and Hinrich Schutze, Cambridge University
Press, 2008 (online version available)
Information Retrieval, by D. Grossman and O. Frieder, Springer,
2004 (second edition).
Another online book
Information Retrieval, by C. J. van Rijsbergen (1979)
Modern Information Retrieval, by Ricardo Baeza-Yates and
Berthier Ribeiro-Neto, 1999.
Companion website to this book.
Course notes (additional reading, pdf
to minor modifications) (The lecture
slides will be in pdf
format, you can read them with Acrobat Reader)
Credit: some of the
lecture notes are initially
designed by prof. Ray Mooney, University of Texas
Week 1: Jan 9, 11
Goals and history of IR. The impact of the web on IR. The
role of artificial intelligence (AI) in IR.
Internet and the WWW: History of Internet. TCP/IP. IP
addresses. WWW. HTTP. HTML. Web servers and clients.
Links: Top search
engines in US in 2010
Search engine watch
Week 2: Jan 16, 18
Boolean and vector-space retrieval models; ranked retrieval;
text-similarity metrics; TF-IDF (term frequency/inverse document
frequency) weighting; cosine similarity.
on Implementation of Vector Space Model
slides on cosine measure
discussed in class Solution
to the example.
Week 3: Jan 23, 25
Evaluation of IR: Performance metrics:
recall, precision, and F-measure; Evaluations on benchmark text
Example discussed in class Solution to example.
Jan 30, Feb 1
Query Operations and Languages:
feedback; Query expansion; Query languages.
Example discussed in class
Solution (do it by yourself first)
Dekang Lin's Demos
Week 5: Feb 6, 8
Image Information Retrieval
image retrieval ESP Game
for labeling images
Zipf's law; Porter stemmer; morphology; index term selection; using
thesauri. Metadata and markup languages (SGML, HTML, XML).
More slides on Web markup languages: HTML, XML,
XHTML, RDF, OWL Links:
term frequencies in Tom Sawyer
Week 6: Feb 13, 15
spidering; metacrawlers; directed spidering; link analysis (e.g. hubs
and authorities, Google PageRank); shopping agents.
Extra slides on Link Analysis: the
hubs and authorities algorithm, and the PageRank
Hubs and authorities example discussed
in class Solution (do it by
first) PageRank examples
Links: Google Tech
- Parallel architecture
Slides about the Google 1998 paper
Week 7: Feb 20, 22
(Reading Week, no classes)
Week 8: Feb 27, Mar 1
Wed, Feb 27, Midterm revision, Fri, Mar 1, in class:
Week 9: Mar 6, 8
Categorization algorithms: decision trees; Rocchio; k-nearest neighbor,
Naive Bayes. Links:
Weka data mining tool
Extra slides on Naive Bayes
Week 11: Mar
Clustering algorithms: agglomerative clustering; k-means.
Applications to information filtering and organization.
Examples of text
classification and clustering discussed in class
Solution (do it by
Week 10: Mar 13, 15
Advanced IR Models:
Latent Semantic Indexing (LSI); Language Models.
Extra slides on LSI.
Language Models for Information
Week 12: Mar 27, 29
Question Answering :
Retrieving precise short answers to
natural language queries.
System Demos. Slides about IBM's Watson.
Links to IBM's Watson
Apr 3, 5