Stable Audio Open

Evans, Zach; Parker, Julian D.; Carr, CJ; Zukowski, Zack; Taylor, Josiah; Pons, Jordi

doi:10.1109/icassp49660.2025.10888461

articleMar 12, 2025Closed access

Stable Audio Open

ZEZach Evans JDJulian D. Parker CCCJ Carr ZZZack Zukowski JTJosiah Taylor

Indexed incrossref

Abstract

Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build upon. Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model’s performance is competitive with the state-of-the-art across various metrics. Notably, the reported FDopenl3 results (measuring the realism of the generations) showcase its potential for high-quality stereo sound synthesis at 44.1kHz.

Citation impact

54

total citations

FWCI: 57.91
Percentile: 100%
References: 30

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Computer science

No related works found for this paper.