MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

Kumar, Kundan; Kumar, Rithesh; Boissière, T. de; Gestin, Lucas; Teoh, Wei Zhen; Sotelo, Jose; Brébisson, Alexandre de; Bengio, Yoshua; Courville, Aaron

doi:10.48550/arxiv.1910.06711

preprintarXiv (Cornell University)Oct 8, 2019GREEN OA

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

KKKundan Kumar RKRithesh Kumar TDT. de Boissière LGLucas Gestin WZWei Zhen Teoh

Indexed inarxivdatacite

Abstract

Previous works (Donahue et al., 2018a; Engel et al., 2019a) have found that generating coherent raw audio waveforms with GANs is challenging. In this paper, we show that it is possible to train GANs reliably to generate high quality coherent waveforms by introducing a set of architectural changes and simple training techniques. Subjective evaluation metric (Mean Opinion Score, or MOS) shows the effectiveness of the proposed approach for high quality mel-spectrogram inversion. To establish the generality of the proposed techniques, we show qualitative results of our model in speech synthesis, music domain translation and unconditional music synthesis. We evaluate the various components of the model through…

Citation impact

598

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

9

Topics & keywords

Topics

Keywords

Computer science
Spectrogram
Autoregressive model
Generality
Generative grammar
Algorithm
Mean opinion score
Waveform

UN Sustainable Development Goals

Reduced inequalities

No related works found for this paper.