HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Kong, Jungil; Kim, Jaehyeon; Bae, Jaekyoung

doi:10.48550/arxiv.2010.05646

preprintarXiv (Cornell University)Oct 12, 2020GREEN OA

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

JKJungil Kong JKJaehyeon Kim JBJaekyoung Bae

Indexed inarxivdatacite

Abstract

Several recent work on speech synthesis have employed generative adversarial networks (GANs) to produce raw waveforms. Although such methods improve the sampling efficiency and memory usage, their sample quality has not yet reached that of autoregressive and flow-based generative models. In this work, we propose HiFi-GAN, which achieves both efficient and high-fidelity speech synthesis. As speech audio consists of sinusoidal signals with various periods, we demonstrate that modeling periodic patterns of an audio is crucial for enhancing sample quality. A subjective human evaluation (mean opinion score, MOS) of a single speaker dataset indicates that our proposed method demonstrates similarity to human quality…

Citation impact

740

total citations

FWCI: —
Percentile: —
References: 23

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science
Autoregressive model
Mean opinion score
High fidelity
Speech recognition
Spectrogram
Fidelity
Acoustics

No related works found for this paper.