Creating datasets of manually annotated texts for relationships such as causality has been of interest to computational linguists. This thesis introduces the annotated Constructions of CAUSE, ENABLE, and PREVENT (CCEP) corpus to contribute to the field by systematizing the nuanced CAUSE, ENABLE, and PREVENT roles and enabling annotation of a wide variety of causal construction types. This corpus utilizes constructions as the basic unit of causal language, which is based on the linguistic paradigm entitled Construction Grammar (CxG) and manifests through the surface construction labeling (SCL) approach. In this project, I adapt a pre-identified bank of causal connectives (the Constructicon) from Dunietz, 2018, which are used as triggers for annotation instances. Through high inter-annotator performance demonstrated in the corpus of 150 doubly-annotated documents based on the CCEP guidelines, I (1) support Wolff et al., 2005’s causal aspectualization as psychologically real through high inter-annotator agreement of distinguishing such, (2) build upon previous annotation work that aim to embed this model of causation, and (3) provide a high quality dataset for understanding textual causality.
Linguistics / Emory University
BS / Spring 2022
Jinho D. Choi, Computer Science and QTM, Emory University (Chair)
Marjorie Pak, Linguistics, Emory University
Yun Kim, Linguistics,Emory University
David Zureick-Brown, Mathematics, Emory University
Anthology | Paper | Presentation
Angela Cao, Jinho Choi, Gregor Williamson, David Zureick-Brown, Marjorie Pak, Yun Kim