HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Indexed inarxivdatacite
Abstract
Several recent work on speech synthesis have employed generative adversarial networks (GANs) to produce raw waveforms. Although such methods improve the sampling efficiency and memory usage, their sample quality has not yet reached that of autoregressive and flow-based generative models. In this work, we propose HiFi-GAN, which achieves both efficient and high-fidelity speech synthesis. As speech audio consists of sinusoidal signals with various periods, we demonstrate that modeling periodic patterns of an audio is crucial for enhancing sample quality. A subjective human evaluation (mean opinion score, MOS) of a single speaker dataset indicates that our proposed method demonstrates similarity to human quality…
Citation impact
740
total citations
- FWCI
- —
- Percentile
- —
- References
- 23
Citations per year
Authors
3Topics & keywords
Topics
Keywords
- Computer science
- Autoregressive model
- Mean opinion score
- High fidelity
- Speech recognition
- Spectrogram
- Fidelity
- Acoustics
No related works found for this paper.