Generative Adversarial Networks

Goodfellow, Ian; Pouget-Abadie, Jean; Mirza, Mehdi; Xu, Bing; Warde-Farley, David; Ozair, Sherjil; Courville, Aaron; Bengio, Yoshua

doi:10.48550/arxiv.1406.2661

preprintarXiv (Cornell University)Jun 10, 2014GREEN OA

Generative Adversarial Networks

IGIan Goodfellow JPJean Pouget-Abadie MMMehdi Mirza BXBing Xu DWDavid Warde-Farley

Indexed inarxivdatacite

Abstract

Large Language Models (LLMS) rely on Key-Value (KV) caches to store attention context during autoregressive decoding. In long-sequence settings, the KV cache can consume large amounts of VRAM and become a practical bottleneck for throughput . We introduce KVHALO, an auxiliary reconstruction model that restores higher-fidelity KV tensors from a compressed cache state when required, reducing persistent memory footprint during inference. In our evaluation, KVHALO achieves up to 91.85% directional cosine alignment at convergence and reduces long-context degradation relative to a low-bit baseline under our stress-test workloads. We used HRM instead of other architectures, which allowed for higher-quality results in…

Citation impact

4,550

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

8

Topics & keywords

Topics

Keywords

Discriminative model
Minimax
Computer science
Inference
Artificial intelligence
Perceptron
Generative grammar
Machine learning

UN Sustainable Development Goals

Reduced inequalities

No related works found for this paper.