Neural Speech Synthesis with Transformer Network
University of Electronic Science and Technology of China · Microsoft Research Asia (China) · +1 more institution
Abstract
Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed and achieve state-of-theart performance, they still suffer from two problems: 1) low efficiency during training and inference; 2) hard to model long dependency using current recurrent neural networks (RNNs). Inspired by the success of Transformer network in neural machine translation (NMT), in this paper, we introduce and adapt the multi-head attention mechanism to replace the RNN structures and also the original attention mechanism in Tacotron2. With the help of multi-head self-attention, the hidden states in the encoder and decoder are constructed in parallel, which improves training efficiency. Meanwhile, any two inputs…
Citation impact
- FWCI
- 56.52
- Percentile
- 100%
- References
- 36
Authors
5Topics & keywords
- Computer science
- Transformer
- Inference
- Spectrogram
- Artificial neural network
- Encoder
- Machine translation
- Recurrent neural network
- Quality Education