Behaviorally Aware
Spoken Dialogue Generation

Vision and Learning Lab, Seoul National University
Under Review

*Indicates Equal Contribution

Abstract

Spoken dialogue systems often struggle with complex conversational behaviors such as turn-taking, interruptions, filler words, and backchannels, which are essential for natural and engaging human interactions. However, existing spoken language models and datasets frequently fail to explicitly model these behavioral traits, leading to less natural and effective communication. To address this challenge, we make two key contributions. First, we introduce Behavior-SD, a large-scale dataset containing over 100K spoken dialogues (2,044 hours) annotated with various conversational behaviors, synthesized via LLMs to model diverse full-duplex interactions. Second, we propose BeDLM, the first dialogue model capable of generating natural conversations conditioned on specific behavioral and narrative contexts, supporting simultaneous contributions from both speakers. Through human evaluations and behavior-adherence metrics, we demonstrate that BeDLM outperforms baseline models in generating natural, coherent, and behaviorally rich dialogues. Our work opens new possibilities for developing behaviorally-aware dialogue systems that more closely mimic human conversational dynamics, enhancing user engagement and communication effectiveness.

Behavior-SD

Framework Visualization Data Visualization

BeDLM

(Behavior-conditioned Dialogue Language Model)

Behavior-conditioned Dialogue Language Model

We introduce BeDLM, a spoken dialogue language model designed to generate simultaneous two-channel speech conditioned on conversational behaviors and a narrative. BeDLM is trained on Behavior-SD, a large-scale dataset containing over 100K spoken dialogues annotated with conversational behaviors and narratives.