This paper presents two innovative approaches: a synchronous multi-party dialogue system that engages in simultaneous interactions with multiple users, and multi-group simulations involving virtual user groups to evaluate the resilience of this system. Unlike most other chatbots that communicate with each user independently, our system facilitates information gathering from multiple users and executes 17 administrative tasks for group requests adeptly, by leveraging a state machine-based framework for complete control over dialogue flow and a large language model (LLM) for robust context understanding. Assessing such a unique dialogue system poses challenges, as it requires many groups of users to interact with the system concurrently for an extended duration. To address this, we simulate various virtual groups using an LLM, each comprising 10-30 users who may belong to multiple groups, to evaluate the efficacy of our system; each user is assigned a persona and allowed to interact freely without scripts. As a result, our system shows average success rates of 87% for task completion and 89% for natural language understanding. Comparatively, our virtual simulation, which has an average success rate of 80%, is juxtaposed with a group of 15 human users, depicting similar task diversity and error trends. To our knowledge, it is the first work to show the LLM's potential in both task execution and simulation of a synchronous dialogue system to fully automate administrative tasks.
Information: Feature Papers in Artificial Intelligence 2024 / 2024