Date: 2021-11-19 / 3:00 ~ 4:00 PM
Location: https://emory.zoom.us/j/99364825782
Recently, advances such as Blender 2.0 have been powered by a new form of dataset created by intelligent use of computational processes. The data for Blender, for example, took Reddit data and extracted millions of two-turn conversations, as well as “persona profiles” for the associated speakers.
Such approaches open the door for more sophisticated computational approaches to dataset creation. We take advantage of the numerous recent advancements in the field of Natural Language Processing, focusing their power on constructing a large, high quality, multi-turn, conversational dialogue dataset.
Evaluating the models that we use to create this dataset is crucial in understanding their performance and what might lead to an even better result. We compare Dialogue Rankers, Language Models (base and fine-tuned), and global optimization strategies to try to maximize our model's performance.