Previous works related to automatic personality prediction focus on using traditional classification models with linguistic features, but neural networks with pre-trained word embeddings, which have achieved huge success in text classification, have never been introduced for the task. This research aims to present a novel approach to automatic personality prediction using convolutional neural networks (CNN) and long short-term memory (LSTM) networks with attention mechanism. Our models are experimented on both monologue corpus, Essays dataset, and new multiparty dialogue corpus, called Friends dataset. We first create the corpus, Friends dataset, by annotating personalities from the popular Big Five theory on the multiparty dialogues from the TV show, Friends, through crowdsourcing and make a comprehensive analysis of the annotation. Our annotated corpus comprises 4 seasons with an average inter-annotator agreement below 0.1. We also propose novel attention-based CNN and LSTM models to overcome the limitations of the basic CNN and LSTM by encoding long-term contextual information and providing a global view of the document. Our analysis shows word embeddings and attention mechanism can effectively improve the performance of our model on the essays dataset by ignoring noise in the corpus. Besides, our results show the challenges for human beings to agree on the task if only text is provided from dialogues. This explains the reason why all the models cannot perform well on the Friends dataset.
Linguistics / Emory University
BA / Spring 2018
Jinho D. Choi, Computer Science and QTM, Emory University (Chair)
Marjorie Pak, Linguistics, Emory University
Roberto Franzosi, Sociology, Emory University
Shun Yan Cheung, Computer Science, Emory University