SIGIR 2006 Workshop on XML Element Retrieval Methodology

10 August 2006

MOTIVATION

XML is rapidly becoming an accepted standard for storage, communication, and exchange of information. Most information in typical XML documents is expressed in natural language texts.

As with traditional information retrieval, the user need is loose, linguistic variations are frequent, and answers are a ranked list of relevant elements. As with database querying, structure is of importance and a simple list of keywords may not be sufficient to define a query. Structured query languages have been developed, but appear to be difficult to use. Furthermore, the size of the unit of information retrieved is variable and XML elements naturally overlap in the document tree. Therefore developing methods for XML-IR requires its own innovative solutions.

The annual INEX workshop examining element retrieval has shown improvements in retrieval precision, while at the same time raising new questions about element retrieval methodology, about potential applications and uses of the technology, and other theoretical and practical questions of XML-IR. The primary concern of the annual INEX workshop is the evaluation of systems against predefined performance criteria. This methodology workshop will address only issues that are not directly addressed by the INEX workshop. It will focus on the theory and methodology of XML-IR. Both have substantial open problems.

This workshop will provide a forum for those involved in conducting collaborative experiments (interactive experiments, use case studies, etc.) to discuss their collaborations and to pave the way forward.

Conducting the workshop at SIGIR provides an opportunity for IR researchers, who may not be aware of work and progress in the area of XML-IR, to participate. It will also allows researchers who have researched the area of XML-IR to participate and exchange ideas and experience with the other members of the community.

PARTICIPATION

Opinion papers discussing XML element retrieval, and related issues, are sought. Topics include but are not limited to Theory, Application, Interaction, Measurement, Judgment, and Experience. Note that relevance ranking algorithms are excluded (they are the focus of the annual INEX workshop).

This workshop is open to all interested parties. Submission of an opinion paper is not required (but is encouraged). Places are limited and preference will be given to authors of accepted papers.

Theory

What is the perfect collection and what is the ideal way to interact with it? Is a heterogeneous collection diverse in DTDs or diverse in content? How should queries of this collection be specified in either a formal or natural language? When a user provides feedback are they providing information about the element (e.g. too large) or the content of the element?

Application

Is there an existing application of element retrieval either commercially or academically? Can user modeling of this application be used to identify un-addressed problems or formally define existing problems? How is element retrieval used, and what is it used for? What is a relevant element in (a given) context, and how can performance be measured within this context?

Interaction

Methodologies for quantitative and qualitative experiments in measuring of the effectiveness of end-user interfaces and interactive XML IR. Who are the users of XML IR? What kind of functionality should XML IR systems offer beyond those that are offered in standard text retrieval? How practical/effective are end-user interfaces to XML collections? How can systems performance be measured and compared?

Measurement

What are current element retrieval metrics actually measuring? What would the ideal metric measure? There already exists a plethora of metrics so new metrics are not of interest, what is of interest is the identification of what should be measured.

Judgment

What can be determined from the existing INEX relevance judgments? What is overlap and how should it be dealt with? Is the graded relevance scale understood by the judges? How and when does a relevant element have no relevant descendants and what are the implications of this? How much information is needed for an element to become relevant (can a reference make an element relevant)? Note that the existing INEX online assessment software is not under discussion.

Experience

What can the experiments of TREC, CLEF, and NTCIR bring to element retrieval and INEX? How (if at all) is XML element identification like SDR segment identification? Is element retrieval a form of topic distillation, and how might experience in topic distillation be used for element retrieval?

Others

Any other topic related to XML element retrieval methodology is welcomed. Discussion of relevance ranking algorithms, metrics, and comparative system evaluations are not - they are the focus of the INEX Evaluation Workshop.

GOALS

At this workshop we will discuss the methodological issues of element retrieval in an open forum in which (just for a moment) the performance of runs can be ignored. This face-to-face discussion is invaluable when considering the future direction of element retrieval, when discussing collaborative experiments (such as interactive experiments), and for agreeing on standard methodology. Decisions on INEX tasks, tracks, performance measurement and so on are best made in a forum by those without an immediate vested interest in the performance of their own search engines.

Of course, we plan to produce high-quality proceedings such as that from the first workshop, and to produce a summary of events for a journal such as SIGIR Forum (as was done for the first workshop).

WORKSHOP FORMAT

On-topic contributions will be combined into a publicly-available proceedings. This will form a discussion document for the workshop (so please read it). From this the program committee will choose the most fiercely debated topics for discussion at the workshop.

The workshop will consist for 4 (four) sessions, each discussing one topic. Authors of selected papers will be asked to present their work and to lead a discussion on it. 45 minutes will be allotted to each author with half of the time for presentation and half for discussion. Each workshop session will be 90 minutes with 2 presentations.

Notes will be taken and made available.

SUBMISSION

Opinion papers discussing any of the above topics, or any other topic related to XML element retrieval methodology (excluding relevance ranking) are sought. Contributions formatted according to the ACM SIG Proceedings guidelines and not exceeding 8 (eight) pages should be submitted by email to Andrew Trotman and Shlomo Geva. Submissions should be formatted in PDF.

SCHEDULE

June 23, 2006	Deadline for Submissions
	Please prepare your PDF using the ACM format and email it to both Andrew Trotman and Shlomo Geva.

June 30, 2006	Notifications of Acceptance
	Details of accepted papers published online

July 7, 2006	Proceedings Published
	Workshop proceedings must be sent to SIGIR for publishing

August 10, 2006	SIGIR 2006 Workshop on Element Retrieval Methodology