High Fidelity Neural Audio Compression
Indexed inarxivdatacite
Abstract
We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion. We simplify and speed-up the training by using a single multiscale spectrogram adversary that efficiently reduces artifacts and produce high-quality samples. We introduce a novel loss balancer mechanism to stabilize training: the weight of a loss now defines the fraction of the overall gradient it should represent, thus decoupling the choice of this hyper-parameter from the typical scale of the loss. Finally, we study how lightweight Transformer models can be used to further compress the obtained…
Citation impact
280
total citations
- FWCI
- —
- Percentile
- —
- References
- 0
Citations per year
Authors
4Topics & keywords
Topics
Keywords
- Computer science
- High fidelity
- Encoder
- Codec
- Sound quality
- Spectrogram
- Fidelity
- Speech recognition
UN Sustainable Development Goals
- Sustainable cities and communities
No related works found for this paper.