computer science



Information Retrieval

Semester 1

Concepts, principles, and algorithms in Information Retrieval and text processing.

This paper will enhance the understanding of concepts, principles and algorithms in Information Retrieval and bring students to the frontiers of research in the topic.

The objectives of the paper are:
(1) To examine the design and the internals of a search engine;
(2) To discuss Information Retrieval and Search Engine research issues;
(3) To build a search engine capable of searching gigabytes of data in a fraction of a second
(4) To understand how techniques such as compression affect performance;
(5) To study parsing and lexical analysis as it applies to simple natural language processing

This paper will cover those aspects of Information Retrieval necessary to understand and implement a simple relevance ranking search engine. It will start with parsing and simple natural language processing as it applies to indexing and then move on to the advanced data structures seen in searching the index. Methods of improving the performance of the search engine will be introduced. Such methods include relevance feedback, link-mining and so on.

Issues in quantitative analysis of search engines will be covered including the statistics necessary to determine whether one search engine out-performs another. Statistics will also be taught as it applies to Language Modelling and probabilistic relevance ranking. Scalability will also be covered.

By the end of the course the student will understand how and why search engines work, will have implemented a simple scalable search engine and will be familiar with current research in the topic.

For more information about this paper, contact Dr Andrew Trotman.