preprintarXiv (Cornell University)Oct 12, 2020GREEN OA

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Indexed inarxivdatacite

Abstract

Several recent work on speech synthesis have employed generative adversarial networks (GANs) to produce raw waveforms. Although such methods improve the sampling efficiency and memory usage, their sample quality has not yet reached that of autoregressive and flow-based generative models. In this work, we propose HiFi-GAN, which achieves both efficient and high-fidelity speech synthesis. As speech audio consists of sinusoidal signals with various periods, we demonstrate that modeling periodic patterns of an audio is crucial for enhancing sample quality. A subjective human evaluation (mean opinion score, MOS) of a single speaker dataset indicates that our proposed method demonstrates similarity to human quality…

Citation impact

740
total citations
FWCI
Percentile
References
23
Citations per year

Authors

3

Topics & keywords

Keywords
  • Computer science
  • Autoregressive model
  • Mean opinion score
  • High fidelity
  • Speech recognition
  • Spectrogram
  • Fidelity
  • Acoustics
No related works found for this paper.