preprintarXiv (Cornell University)Sep 12, 2016GREEN OA

WaveNet: A Generative Model for Raw Audio

Indexed inarxivdatacite

Abstract

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that it can be efficiently trained on data with tens of thousands of samples per second of audio. When applied to text-to-speech, it yields state-of-the-art performance, with human listeners rating it as significantly more natural sounding than the best parametric and concatenative systems for both English and Mandarin. A single WaveNet can capture the characteristics of many different speakers with equal fidelity, and can switch between them by conditioning on…

Citation impact

3,609
total citations
FWCI
Percentile
References
0
Citations per year

Authors

9

Topics & keywords

Keywords
  • Generative grammar
  • Generative model
  • Computer science
  • Speech recognition
  • Artificial intelligence
UN Sustainable Development Goals
  • Reduced inequalities
No related works found for this paper.