Optimal Structure Weighted Retrieval

Andrew Trotman

Proceedings of the 9th Australasian Document Computing Symposium (ADCS 2004)

 

ABSTRACT

Improving ranking functions for structured information retrieval has received much attention since the inception of XML.  Weighting document structures is one method providing significant improvement – but how good can these improvements be?

 

Optimal structure weighted retrieval occurs when each query is processed using the optimal set of weights for that query.  Optimal retrieval for a set of queries occurs when a set of weights optimized for that set of queries is used.  Measuring mean average precision for each of these will give a performance upper bound for document structure weighted retrieval.

 

In this investigation a near optimal set of weights is learned for TREC WSJ collection topics 101-200 using a genetic algorithm.  Weights are learned for vector space inner product, naïve probability and BM25 ranking functions and a performance upper bound is calculated.

 

The upper bound using a different set of weights for each query, gives mean average precision improvements of about 15% for BM25 and naïve probability; about 30% for inner product.  This suggests structure weighting might be useful for relevance feedback.  Optimal weights for the set of queries shows improvements of about 5% for naïve probability and inner product, but of only about 1% for BM25; suggesting this technique is not as effective for ad hoc retrieval. 

[Return to Andrew’s Home]