High Fidelity Neural Audio Compression

Défossez, Alexandre; Copet, Jade; Synnaeve, Gabriel; Adi, Yossi

doi:10.48550/arxiv.2210.13438

preprintarXiv (Cornell University)Oct 24, 2022GREEN OA

High Fidelity Neural Audio Compression

ADAlexandre Défossez JCJade Copet GSGabriel Synnaeve YAYossi Adi

Indexed inarxivdatacite

Abstract

We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion. We simplify and speed-up the training by using a single multiscale spectrogram adversary that efficiently reduces artifacts and produce high-quality samples. We introduce a novel loss balancer mechanism to stabilize training: the weight of a loss now defines the fraction of the overall gradient it should represent, thus decoupling the choice of this hyper-parameter from the typical scale of the loss. Finally, we study how lightweight Transformer models can be used to further compress the obtained…

Citation impact

280

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Computer science
High fidelity
Encoder
Codec
Sound quality
Spectrogram
Fidelity
Speech recognition

UN Sustainable Development Goals

Sustainable cities and communities

No related works found for this paper.