Neural Speech Synthesis with Transformer Network

Li, Naihan; Liu, Shujie; Liu, Yanqing; Zhao, Sheng; Liu, Ming

doi:10.1609/aaai.v33i01.33016706

articleProceedings of the AAAI Conference on Artificial IntelligenceJul 17, 2019DIAMOND OA

Neural Speech Synthesis with Transformer Network

NLNaihan Li SLShujie Liu YLYanqing Liu SZSheng Zhao MLMing Liu

University of Electronic Science and Technology of China · Microsoft Research Asia (China) · +1 more institution

Indexed incrossref

Abstract

Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed and achieve state-of-theart performance, they still suffer from two problems: 1) low efficiency during training and inference; 2) hard to model long dependency using current recurrent neural networks (RNNs). Inspired by the success of Transformer network in neural machine translation (NMT), in this paper, we introduce and adapt the multi-head attention mechanism to replace the RNN structures and also the original attention mechanism in Tacotron2. With the help of multi-head self-attention, the hidden states in the encoder and decoder are constructed in parallel, which improves training efficiency. Meanwhile, any two inputs…

Citation impact

732

total citations

FWCI: 56.52
Percentile: 100%
References: 36

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Transformer
Inference
Spectrogram
Artificial neural network
Encoder
Machine translation
Recurrent neural network

UN Sustainable Development Goals

Quality Education

No related works found for this paper.