Information Retrieval

Full year

Lecture time and place

Monday 11:00 A.M. – 12:50 P.M.; room G34, Owheo building.


Richard O'Keefe and Andrew Trotman
You'll find us in the Owheo building on the first floor.


If you have ever used a Web search engine, like Google or Yahoo, you will realise how helpful computers can be in finding information, and how frustrating. This paper will tell you we can use computers to find information in unstructured or semi-structured text, and why it is as hard as it is important to do better. We'll start from the basics of IR, such as "what the heck is a word, anyway?" and cover some recent research.

Much of the presentation will be directed reading; some of the key papers are so opaque that we shall also have some lectures.


Second year programming and data structures.


There will be an examination worth 60% and three practical assignments worth a total of 40%.

Lecture Schedule

Lectures are held on Mondays 11:00am-12:50pm (in room G34)
Week     Who     What
1 ASPT Introduction / Problems in IR
2 RAOK Properties of text (what is a word?)
3 RAOK XML, lexical analysis, and parsing
4 RAOK How mass storage works/random-access I/O
5 RAOK Text compression (dictionary compression)
6 RAOK Integer compression
  Mid Semester Break
7 RAOK Basic statistics for classification and evaluation
8 ASPT String Searching and ISAM and Inverted files
9 ASPT Efficiency and Relevance ranking
10 ASPT Evaluation (recall, precision, etc.)
11 ASPT Term conflation (stemming, thesaurus, soundex)
12 ASPT Relevance feedback
13 ASPT Distributed Information Retrieval
  Between Semester Break
  COSC 480/490 Project Presentations
14 NR Music IR NEW TIME:12->2pm
15 RAOK Clustering + classification
16 RAOK Query languages
17 ASPT Phrase searching / structured information retrieval 
18 ASPT INEX (Focused Retrieval & Link Discovery)
19 ASPT Ask IR Questions
  Mid Semester Break
20 Students COSC 463 presentations


Student Administration have asked us to add this note on Plagiarism:
"Students should make sure that all submitted work is their own. Plagiarism is a form of dishonest practice. Plagiarism is defined as copying or paraphrasing another's work, whether intentionally or otherwise, and presenting it as one's own (approved University Council, December 2004). In practice this means plagiarism includes any attempt in any piece of submitted work (e.g. an assignment or test) to present as one's own work the work of another (whether of another student or a published authority). Any student found responsible for plagiarism in any piece of work submitted for assessment shall be subject to the University's dishonest practice regulations which may result in various penalties, including forfeiture of marks for the piece of work submitted, a zero grade for the paper, or in extreme cases exclusion from the University."

Resources for assignment 1

You'll find everything in the 2010 directory