SIGIR 2008 Workshop on Focused Retrieval

(Question Answering, Passage Retrieval, Element Retrieval)

24 July 2008

MOTIVATION

Standard document retrieval finds atomic documents, and leaves it to the end-user to locate the relevant information inside the document. Focused retrieval removes this latter task from the end user by providing more direct access to relevant information. That is, focused retrieval addresses information retrieval and not simply document retrieval.

Focused retrieval is becoming increasingly important in all areas of information retrieval. Question Answering has been examined by TREC, CLEF, and NTCIR for many years, and is arguably the ultimate goal of semantic web research for interrogative information needs. Passage retrieval has an even longer history including INEX and the genomics track at TREC, but is also important when searching long documents of any kind. Element retrieval (XML-IR) has been examined by INEX where it has been used to extract relevant sections from academic documents, the application to text book searching is obvious and such commercial systems already exist.

On initial inspection these focused retrieval paradigms appear quite different but they have much in common. As with traditional document-centric information retrieval, with focused retrieval the user need is loose, linguistic variations are frequent, and answers are a ranked list of relevant results. Furthermore, in focused retrieval the size of the unit of information retrieved is variable and results within a single document may naturally overlap.

With respect to the document collection; the Wikipedia has been used for XML-IR and passage retrieval experiments and in 2008 QA@INEX will use it too. This convergence on a single collection makes it possible to ask how focused retrieval can be used to enhance the collection itself. New tasks such as automatic hypertext link identification (Link-the- Wiki) require NLP techniques such as those seen in QA to identify anchors as well as passage (or element) techniques to identify link-targets. Perhaps the focused techniques could be used for Wikipedia Spam detection or document mining?

This workshop will not address topics covered by the annual evaluation forums (TREC, CLEF, NTCIR, and INEX). They examine the evaluation of systems against predefined performance criteria. The workshop will focus on theory and methodology of focused retrieval, independent of the evaluation forums specifics.

THEMES

Papers discussing focused retrieval and related issues are sought. Topics may include but are not limited to: Theory, Experimentation, Application, Interaction, and Experience. Note that relevance ranking algorithms and system evaluations are explicitly excluded and will be rejected outright (they are the focus of the annual evaluation forums).

Theory

What is the perfect collection and what is the ideal way to interact with it? How should queries of this collection be specified in either a formal or natural language? How are the various forms of focused retrieval interrelated?

Experimentation

Is markup useful in focused retrieval (e.g. HTML, XHTML, XML, and SGML)? Are there important differences in the utility of different kinds of markup (e.g., procedural versus semantic markup, original versus automatically enriched markup)? How could the performance of focused retrieval be measured? What would the baseline be?

Application

What applications of focused retrieval exist commercially or academically? Can user modeling of this application be used to identify un-addressed problems or formally define existing problems? How is focused retrieval used, and what is it used for? What is relevant information in this context? Can focused retrieval techniques be deployed to solve other real-world problems (e.g., link detection in Wikipedia as explored at INEX)

Interaction

Who are the users of focused retrieval? What kind of functionality should these systems offer beyond those that are offered in standard text retrieval? How do users interact (search, browse, read) with sub-document retrieval systems? How important is contextual information in the hit-list, and in result display? How practical / effective are end user interfaces? How can interactive search be measured and compared quantitatively?

Experience

What can the experiments of TREC, CLEF, NTCIR and INEX bring to focused retrieval? How are the various forms of focused retrieval interrelated? Is XML-IR a subset of passage retrieval? How does document structure affect the complexity of the question answering problem?

Others

Any other topic related to focused retrieval is welcome. Discussion of relevance ranking algorithms is not - that is the focus of the annual evaluation forums.

GOALS

At this workshop we will discuss the issues of focused retrieval in an open forum in which (just for a moment) the performance of runs can be ignored. This face-to-face discussion is invaluable when considering the future direction of focused retrieval, when discussing collaborative experiments (such as interactive experiments), and for agreeing on standard methodology. Unification of task definitions, performance measurement and so on is best made in a forum by those without an immediate vested interest in the performance of their own search engine.

PLANNED ACTIVITIES

Authors of selected papers will present their work and lead a discussion on it. 30 minutes will be allotted to each author with half of the time for presentation and half for discussion. Each workshop session will be 90 minutes with 3 presentations.

We are also planning a panel discussion with panel members drawn from leaders in the fields of QA, Passage, and Element Retrieval.

SUBMISSION

Papers discussing any of the above topics, or any other related topic (excluding relevance ranking) are sought. Contributions in English and formatted according to the ACM SIG Proceedings guidelines and not exceeding 8 (eight) pages should be formatted as a PDF and submitted electronicly. Details of the submission procedure will be released shortly.

SCHEDULE

May 16, 2008	Deadline for Paper Submissions
	Prepare your PDF using the ACM format Submit online using EasyChair

June 6, 2008	Notification of Acceptance
	Details of accepted papers published online

June 20, 2008	Deadline for Camera Ready Copies

July 24, 2008	SIGIR 2008 Workshop on Focused Retrieval