Exploring a Multi-Layered Cross-Genre Corpus of Document-Level Semantic Relations

Gregor Williamson, Angela Cao, Yingying Chen, Yuxin Ji, Liyan Xu, Jinho D. Choi


Abstract

This paper introduces a multi-layered cross-genre corpus, annotated for coreference resolu- tion, causal relations, and temporal relations, comprising a variety of genres, from news articles and children’s stories to Reddit posts. Our results reveal distinctive genre-specific characteristics at each layer of annotation, highlighting unique challenges for both annotators and machine learning models. Children’s stories feature linear temporal structures and clear causal relations. In contrast, news articles employ non-linear temporal sequences with minimal use of explicit causal or conditional language and few first-person pronouns. Lastly, Reddit posts are author-centered explanations of ongoing situations, with occasional meta-textual reference. Our annotation schemes are adapted from existing work to better suit a broader range of text types. We argue that our multi-layered cross-genre corpus not only reveals genre-specific semantic characteristics but also indicates a rich contextual interplay between the various layers of semantic information. Our MLCG corpus is shared under the open-source Apache 2.0 license.

Venue / Year

Information: Information Extraction and Language Discourse Processing / 2023

Links

Anthology | Paper | BibTeX