WaveNet: A Generative Model for Raw Audio

Oord, Aäron van den; Dieleman, Sander; Zen, Heiga; Simonyan, Karen; Vinyals, Oriol; Graves, Alexander; Kalchbrenner, Nal; Senior, Andrew; Kavukcuoglu, Koray

doi:10.48550/arxiv.1609.03499

preprintarXiv (Cornell University)Sep 12, 2016GREEN OA

WaveNet: A Generative Model for Raw Audio

AVAäron van den Oord SDSander Dieleman HZHeiga Zen KSKaren Simonyan OVOriol Vinyals

Indexed inarxivdatacite

Abstract

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that it can be efficiently trained on data with tens of thousands of samples per second of audio. When applied to text-to-speech, it yields state-of-the-art performance, with human listeners rating it as significantly more natural sounding than the best parametric and concatenative systems for both English and Mandarin. A single WaveNet can capture the characteristics of many different speakers with equal fidelity, and can switch between them by conditioning on…

Citation impact

3,609

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

9

Topics & keywords

Topics

Keywords

Generative grammar
Generative model
Computer science
Speech recognition
Artificial intelligence

UN Sustainable Development Goals

Reduced inequalities

No related works found for this paper.