computer science

OPENS DOORS

Computational Linguistics Research Group

Statistical Parsing

Statistical parsing is an extension of syntactic parsing that assosciates grammatical phrase structure rules with probabilities, estimating the relative frequency of each rule. This allows the parser to select the most likely rule, and solves the common problem that increasing grammar size causes increased ambiguity.

Research at Otago has concentrated on how statistics can be used to increase the robustness of a parsing system. A robust parser is one which is able to work even when the input string contains errors, such as misspelled words, omitted words or grammatical mistakes. It is our belief that a sudden drop in the probabilities of a subphrase in relation to the phrase it is embedded in might signify an error in the input string; we are currently investigating methods for automatically detecting and correcting such errors.

Research is still at a very early stage. Most of our work to date has been in designing a statistical parsing system that works on error-free text. To this end, we are engaged in reimplementing a well-known statistical parsing system built by Michael Collins.

Participating Members