Behaviorally Aware
Spoken Dialogue Generation

Vision and Learning Lab, Seoul National University
NAACL 2025

*Indicates Equal Contribution

Abstract

Spoken dialogue involves behaviors like turn-taking, interruptions, filler words, and backchannels, which make interactions more natural and engaging but are often overlooked in language models. These models struggle to explicitly model these behavioral traits, resulting in a less natural and personalized communication style that aligns with user needs. To address this challenge, we make two key contributions. First, we introduce Behavior-SD, a large-scale dataset containing over 100K spoken dialogues (2,164 hours) annotated with various conversational behaviors, synthesized via LLMs to model diverse full-duplex interactions. Second, we propose BeDLM, the first dialogue model capable of generating natural conversations conditioned on specific behavioral and narrative contexts, supporting simultaneous contributions from both speakers. Through human evaluations and behavior-adherence metrics, we demonstrate that BeDLM outperforms baseline models in generating natural, coherent, and behaviorally rich dialogues. Our work opens new possibilities for developing behaviorally-aware dialogue systems that more closely mimic human conversational dynamics, enhancing user engagement and communication effectiveness.

Behavior-SD

Framework Visualization Data Visualization

BeDLM

(Behavior-conditioned Dialogue Language Model)

Behavior-conditioned Dialogue Language Model

We introduce BeDLM, a spoken dialogue language model designed to generate simultaneous two-channel speech conditioned on conversational behaviors and a narrative. BeDLM is trained on Behavior-SD, a large-scale dataset containing over 100K spoken dialogues annotated with conversational behaviors and narratives.