Date: 2022-02-11 / 4:00 ~ 5:00 PM
Location: MSC E306 (https://emory.zoom.us/j/99364825782)
Speaker diarization plays an important role in identifying parties engaged in conversations. Conventionally, this problem has been tackled by models utilizing audio features. In this presentation, we approach this problem by pseudo-generating a larger text conversation corpus simulating the characteristics in the available yet limited amount of target dataset, and neural models to achieve better performance than the available transcription service on the market. We present a contextual joint model that boosts the diarization performance by multi-task learning with integration of conversation context. The joint model consists of an utterance classification and a token classification task. We evaluate the model using F1 scores on both tasks.