Darren
Wei Che Huang, Andrew Trotman, Shlomo Geva
Collaborative
knowledge management systems such as the Wikipedia are becoming ever more
popular – and these systems typically contain hypertext links between
documents. The Wikipedia offers both manual and automated link creation. In
fact several different systems providing links for Wikipedia documents now
exit. Problematically the quality of automatically generated links has never
been quantified. An evaluation method for Wikipedia link discovery approaches
is essential.
We introduce the
Link-the-Wiki task launched at INEX in 2007. 90 documents were orphaned from
the collection and participants were required to build systems that identified
the missing links. The different automated link discovery techniques used by
participants are outlined. Details of two successful techniques are given, one
using the titles of pre-existing documents to identify anchors and
destinations, the other using pre-existing links between documents to identify
possible links in new documents. In this paper, we mainly focus on the analysis
and assessment of Wikipedia link discovery and discuss possible future
evaluation techniques.
We examine one
system in further detail and conduct a scalability experiment in which 1% of
all Wikipedia documents were used and the performance studied in detail – link
discovery in this system is shown to be scalable.
Finally, potential
research directions for link discovery, assessment and evaluation are
discussed.