Introduction to the INEX 2005 Workshop on Element Retrieval Methodology

Andrew Trotman and Mounia Lalmas

Proceedings of the INEX 2005 Workshop on Element Retrieval Methodology pp. 1-3

 

ABSTRACT

With a wealth of documents originating in markup languages such as XML, it is appropriate to ask how this markup might be used in information retrieval.  One answer is to change the focus of retrieval from whole documents to document elements.

 

In document-centric IR the user searches whole documents and is returned a ranked list of documents that match their queries.  By contrast, in element retrieval document elements are returned – perhaps a chapter of a book, or a section of an academic paper.

 

Since 2002 the annual INEX workshop [2] has been examining element ranking algorithms for XML documents.  Most specifically, the IEEE collection of 12,107 documents.  Arguably progress has been made.

 

It is this “arguably” that has become the center of attention.  On the outset it would appear as though element retrieval is a simple derivation of document retrieval – but experience at INEX has shown this to be far from the truth. 

 

A document centric search engine makes a binary decision about the relevance of a given document – either it will appear in a result list or it will not.  It cannot “partly appear”.

An element centric search engine having decided a piece of text is relevant is faced with how to return that information.  Perhaps only a paragraph is relevant, or perhaps the sub-section, or the section, or it may be the entire document.  The same piece of text can be returned in many different ways.

 

When humans are making judgment decisions, they too, are faced with similar problems.  If a given paragraph is relevant, then surely a containing section is also relevant.  How much more so, or less so?

Combining these, how can the performance of a search engine be measured?

 

There are clearly methodological issues in element retrieval, and these need addressing.  It is these issues that are of interest at this workshop.

 

For many the most pressing issues is this: when there is no community accepted methodology it is not possible to claim any one system is better than any other.

 

[Return to Andrew’s Home]