The Embedding Hypothesis: From Fourier Circuits to No-Q Attention

Nathan, Rigoni,

doi:10.5281/zenodo.19273886

preprintZenodo (CERN European Organization for Nuclear Research)Mar 28, 2026GREEN OA

The Embedding Hypothesis: From Fourier Circuits to No-Q Attention

RNRigoni, Nathan

Indexed indatacite

Abstract

The token embedding layer is the geometric foundation of transformer attention. We develop this claim through four stages. First, we show that prescribing near-Nyquist frequency modes in the embedding gradient, Prescribed Fourier Frequency Training (PFFT) achieves a 92.7% reduction in epochs-to-grokking (57 vs. 782) on modular arithmetic, with a 97.9% reduction in the memorization phase. PFFT works by simultaneously preserving the embedding's geometric authority and reducing gradient noise. Second, the Sounding Hammer diagnostic reveals that gradient-domain Fourier steering cannot safely transfer to language model embeddings: BPE vocabulary gradients are spectrally flat (ρ=0.42), causing catastrophic BPC…

Citation impact

325

total citations

FWCI: —
Percentile: —
References: 49

Citations per year

Authors

1

RN
Rigoni, NathanCorresponding

Topics & keywords

Topics

Keywords

Artificial neural network
Computer science
Artificial intelligence
Psychology

UN Sustainable Development Goals

Decent work and economic growth

No related works found for this paper.