Andrew
Trotman
Information Retrieval
8(3):359-381
New ranking functions are discovered using genetic programming. The TREC WSJ collection was chosen as a training set. A baseline comparison function is chosen as the best of inner product, probability, cosine, and Okapi BM25. An elitist genetic algorithm with a population size 100 was run 13 times for 100 generations and the best performing algorithms chosen from these. The best learned function, when evaluated against the best baseline function (BM25), showed improvements in 82% of queries. Overall a 20% improvement in mean average precision is demonstrated. When the same function was tested on the cystic fibrosis collection, the function was also shown to outperform BM25.