This thesis expands a previously constructed corpus and presents a robust deep learning architecture for a task in reading comprehension, passage completion, on multiparty dialog. Given a dialog in text and a passage containing factual descriptions about the dialog where mentions of the characters are replaced by blanks, the task is to fill the blanks with the most appropriate character names that reflect the contexts in the dialog. Previous researcher constructed a dataset by selecting transcripts from a TV show, generating passages for each dialog through crowdsourcing, and annotating mentions of characters in both the dialog and the passages. This work expands the previously constructed dataset following the same pipeline and fixes errors in the entire dataset. Given this dataset, a deep neural model is developed that integrates rich feature extraction from convolutional neural networks (CNN) into sequence modeling in recurrent neural networks (RNN), optimized by utterance and dialog level attentions. The model outperforms the previous state-of-the-art model on this task in a different genre using bidirectional LSTM, showing a 13.0+% improvement for longer dialogs. The analysis shows the effectiveness of the attention mechanisms and suggests a direction to machine comprehension on multiparty dialog.
Computer Science / Emory University
BS / Spring 2018
Jinho D. Choi, Computer Science and QTM, Emory University (Chair)
Ken Mandelberg, Computer Science, Emory University
Connie Roth, Physics, Emory University