Andrew
Trotman
Information Processing &
Management 40(4):619-632
Structured
document interchange formats such as XML and SGML are ubiquitous, however
information retrieval systems supporting structured searching are not. Structured searching can result in increased
precision. A search for the author
“Smith” in an unstructured corpus of documents specializing in iron-working
could have a lower precision than a structured search for “Smith as author” in
the same corpus.
Analysis
of XML retrieval languages identifies additional functionality that must be
supported including searching at, and broken across multiple nodes in the
document tree. A data structure is
developed to support structured document searching. Application of this structure to information
retrieval is then demonstrated. Document
ranking is examined and adapted specifically for structured searching.