This project develops advanced models for processing and analyzing audio data, focusing on segmenting and identifying distinct voices within noisy recordings. By leveraging cutting-edge techniques, we aim to enhance the understanding and organization of audio content. This will have broad applications in various fields, including transcription, archiving, and enhancing user experiences in voice-interactive systems.
Director
- Jinho Choi - Associate Professor at Emory University
Publication
- Aligning Speakers: Evaluating and Visualizing Text-based Speaker Diarization Using Efficient Multiple Sequence Alignment. Gong, C.; Wu, P.; and Choi, J. D. Proceedings of the IEEE International Conference on Tools with Artificial Intelligence (ICTAI), 2023.
- Discriminative Speech Recognition Rescoring with Bert. Xu, L.; Gu, Y.; Kolehmainen, J.; Khan, H.; Gandhe, A.; Rastrow, A.; Stolcke, A.; Bulyko, I. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.