SIGIR 2007 Workshop on Focused Retrieval

(Question Answering, Passage Retrieval, Element Retrieval)

27 July 2007

MOTIVATION

Standard document retrieval finds atomic documents, and leaves it to the end-user to then locate the relevant information inside the document. Focused retrieval, in a broad sense, tries to remove the onus on the end-user, by providing more direct access to relevant information. That is, focused retrieval is addressing information retrieval proper. Focused retrieval is becoming increasingly important in all areas of information retrieval. Question Answering has been examined by TREC, CLEF, and NTCIR for many years, and is arguably the ultimate goal of semantic web research for interrogative information needs. Passage retrieval has an even longer history and is currently examined by the genomics track at TREC, but is also important when searching long documents of any kind. Element retrieval (XML-IR) has been examined by INEX where it has been used to extract relevant sections from academic documents, the application to text book searching is obvious and such commercial systems already exist.

Although on initial inspection these focused retrieval paradigms appear quite different, they share much in common. As with traditional document-centric information retrieval, the user need is loose, linguistic variations are frequent, and answers are a ranked list of relevant results. Furthermore in focused retrieval, the size of the unit of information retrieved is variable and results within a single document may naturally overlap. These issues are unique to focused retrieval, and to date have not been examined as general problems. For example, the metrics used for passage retrieval at TREC and those for XML-IR at INEX were developed independently even though they arguably measure the same thing: an XML element is a passage.

This workshop will not address topics covered by the annual evaluation forums (TREC, CLEF, NTCIR and INEX). They examine the evaluation of systems against predefined performance criteria. This SIGIR workshop will focus on theory, methodology, and practive of focused retrieval, independent of the evaluation forums specifics.

This SIGIR workshop will provide an opportunity for IR researchers who have been working in different areas of focused retrieval to collaborate and to share ideas. It will also be ideal forum for those who may not be aware of work or progress in the field but wish to collaborate. It also will allow researchers who have researched the field to participate and exchange ideas and experience with the research community.

PARTICIPATION

Papers discussing focused retrieval, and related issues, are sought. Topics may include but are not limited to: Theory, Application, Interaction, and Experience. Note that relevance ranking algorithms and evaluation of them is explicitly excluded as that is a focus of the annual evaluation forums.

The workshop is open to all interested parties. Submission of an opinion paper is not required (but is encouraged). Places are limited and preference will be given to authors of accepted papers.

Theory

What is the perfect collection and what is the ideal way to interact with it? What makes a collection heterogeneous? How should queries of this collection be specified in either a formal or natural language? How are the various forms of focused retrieval inter-related?

Application

What applications of focused retrieval exist commercially or academically? Can user modeling of this application be used to identify un-addressed problems or formally define existing problems? How is focused retrieval used, and what is it used for? What is relevant information in this context? Is markup useful in focused retrieval (e.g. HTML, XHTML, XML, SGML, other)? How could the performance of focused retrieval be measured? What would the baseline be?

Interaction

How practical / effective are end user interfaces? How can system performance be measured and compared (quantitatively)? Who are the users of focused retrieval? What kind of functionality should these systems offer beyond those that are offered in standard text retrieval?

Experience

What can the experiments of TREC, CLEF, NTCIR and INEX bring to focused retrieval? How are the various forms of focused retrieval interrelated? How are QA, passage retrieval and Element Retrieval interrelated? How does document structure affect the complexity of the question answering problem?

Others

Any other topic related to focused retrieval is welcome. Discussion of relevance ranking algorithms is not - that is the focus of the annual evaluation forums.

GOALS

At this workshop we will discuss the issues of focused retrieval in an open forum in which (just for a moment) the performance of runs can be ignored. This face-to-face discussion is invaluable when considering the future direction of focused retrieval, when discussing collaborative experiments (such as interactive experiments), and for agreeing on standard methodology. Unification of task definitions, performance measurement and so on is best made in a forum by those without an immediate vested interest in the performance of their own search engine.

WORKSHOP FORMAT

We expect the workshop to connence with a guest lecture. We are planning for someone with commercial experience to discuss how they use or might use focused retrieval in their products.

Then authors of selected papers will be asked to present their work and to lead a discussion on it. 45 minutes will be allotted to each author with half of the time for presentation and half for discussion. Each workshop session will be 90 minutes with 2 presentations.

Notes will be taken and made available.

SUBMISSION

Papers discussing any of the above topics, or any other related topic (excluding relevance ranking) are sought. Contributions in English and formatted according to the ACM SIG Proceedings guidelines and not exceeding 8 (eight) pages should be formatted as a PDF and submitted electronicly. Details of the submission procedure will be released shortly.

SCHEDULE

June 11, 2007	Deadline for Submissions
	Prepare your PDF using the ACM format. Submit using EasyChair

June 18, 2007	Notification of Acceptance
	Details of accepted papers published online

June 25, 2007	Proceedings Published

July 27, 2007	SIGIR 2007 Workshop on Focused Retrieval