Statistical parametric speech synthesis using deep neural networks

Zen, Heiga; Senior, Andrew; Schuster, Mike

doi:10.1109/icassp.2013.6639215

articleMay 1, 2013Closed access

Statistical parametric speech synthesis using deep neural networks

HZHeiga Zen ASAndrew Senior MSMike Schuster

Google (United States)

Indexed incrossref

Abstract

Conventional approaches to statistical parametric speech synthesis typically use decision tree-clustered context-dependent hidden Markov models (HMMs) to represent probability densities of speech parameters given texts. Speech parameters are generated from the probability densities to maximize their output probabilities, then a speech waveform is reconstructed from the generated parameters. This approach is reasonably effective but has a couple of limitations, e.g. decision trees are inefficient to model complex context dependencies. This paper examines an alternative scheme that is based on a deep neural network (DNN). The relationship between input texts and their acoustic realizations is modeled by a DNN.…

Citation impact

831

total citations

FWCI: 89.95
Percentile: 100%
References: 49

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Hidden Markov model
Computer science
Parametric statistics
Artificial neural network
Speech recognition
Context (archaeology)
Decision tree
Waveform

UN Sustainable Development Goals

Peace, Justice and strong institutions

No related works found for this paper.