preprintarXiv (Cornell University)Oct 24, 2022GREEN OA

High Fidelity Neural Audio Compression

Indexed inarxivdatacite

Abstract

We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion. We simplify and speed-up the training by using a single multiscale spectrogram adversary that efficiently reduces artifacts and produce high-quality samples. We introduce a novel loss balancer mechanism to stabilize training: the weight of a loss now defines the fraction of the overall gradient it should represent, thus decoupling the choice of this hyper-parameter from the typical scale of the loss. Finally, we study how lightweight Transformer models can be used to further compress the obtained…

Citation impact

280
total citations
FWCI
Percentile
References
0
Citations per year

Authors

4

Topics & keywords

Keywords
  • Computer science
  • High fidelity
  • Encoder
  • Codec
  • Sound quality
  • Spectrogram
  • Fidelity
  • Speech recognition
UN Sustainable Development Goals
  • Sustainable cities and communities
No related works found for this paper.