Honors Thesis 2018 - Ethan Zhou

A Thesis on Character Identification

Ethan Zhou

Highest Honor in Computer Science


Abstract

Traditional coreference resolution systems use methods insufficient for completely resolving plural mentions, especially when applying conventional coreference concepts to different tasks such as character identification. This paper gives a comprehensive view of one of the least examined yet most difficult parts of entity resolution–particularly coreference resolution and entity linking. Since our approach to entity resolution focuses on its applicability to character identification, we use the character identification corpus from SemEval 2018 and expand the dataset in scope to include plural mention annotations. We then show the inadequacy of these concepts and show an innovative design to overcome the shortcomings of traditional coreference ideas for the character identification task in this paper. Our innovative design includes an all-new algorithm for coreference resolution that selectively creates clusters to handle all types of mentions, singular and plural, as well as a new joint deep learning approach to entity linking determine the entities for both singular and plural mentions as well. Using our novel design, we demonstrate that our coreference and entity linking models surpass more traditional models. To the extent of what we know, we are the first to extensively investigate plural mentions in the context of entity resolution.

Department / School

Computer Science / Emory University

Degree / Year

BS / Spring 2018

Committee

Jinho D. Choi, Computer Science, Emory University (Chair)
Lee Cooper, Biomedical Informatics, Emory University
Susan Tamasi, Linguistics, Emory University

Links

Anthology | Paper | Presentation