Andrew
Trotman, Nils Pharo, Dylan Jenkinson
During a session
of the INEX 2006 workshop in Schloss Dagstuhl the first at-INEX experiment was run. Participants were asked to assess topics in
order to increase the number of multiple assessed topics available for analysis
(and in order to increase the number of assessors per topic). This contribution presents the experimental
set-up, the experiment, and an analysis of the results.
When examining the
agreement level across all assessors it is shown that each assessor both brings
new material, and disagrees with the there-to consensus. Extrapolation suggests that with 8 assessors,
there will be no content that they all agree is relevant, but they continue to
agree on which documents are reliant until 19 assessors are present. This suggests that relevance is in the mind
of the assessor and not a ground truth.
Additionally
examined are several problems encountered in conducting the experiment. These are explained in detail and
recommendations for change in the INEX methodology are made..