Detection of Errors and Correction
in Corpus Annotation


Detmar Meurers is an associate professor in the Department of Linguistics at the the Ohio State University. His research interests focus on the intersection of linguistic insight and computational linguistics. He became interested in the detection of errors in corpus annotation after teaching a seminar on "Corpora and Linguistic Knowledge" in Spring 2002. Joined by Markus Dickinson, who had participated in the seminar, they started developing an automatic error detection method based on detecting variation of annotation across linguistically comparable instances. As reported in several papers since then (cf. publications), they have successfully applied this and related ideas to detect (and correct) errors in part-of-speech annotation as well as to continuous and discontinuous syntactic annotation - with current project work focusing on dependency annotation and data outside linguistics.

Markus Dickinson is an assistant professor at Indiana University in the Department of Linguistics. Much of his research has focused on the detection and correction of annotation errors across various levels of linguistic structure. Currently, his main interests lie in what annotation errors reveal about the design of annotation schemes and in how annotation can be optimized for data-driven natural language processing.

Adriane Boyd is a Ph.D. student in computational linguistics at Ohio State University. She is interested in syntactic annotation and data-driven parsing for freer word order languages.