WaveNet: A Generative Model for Raw Audio
Abstract
This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that it can be efficiently trained on data with tens of thousands of samples per second of audio. When applied to text-to-speech, it yields state-of-the-art performance, with human listeners rating it as significantly more natural sounding than the best parametric and concatenative systems for both English and Mandarin. A single WaveNet can capture the characteristics of many different speakers with equal fidelity, and can switch between them by conditioning on…
Citation impact
2,472
total citations
- FWCI
- —
- Percentile
- —
- References
- 45
Citations per year
Authors
9Topics & keywords
Topics
Keywords
- Computer science
- Speech recognition
- Discriminative model
- Generative model
- Autoregressive model
- Parametric statistics
- Artificial neural network
- Probabilistic logic
UN Sustainable Development Goals
- Reduced inequalities
No related works found for this paper.