The Representational Budget: Scale, RL, and Multimodal Alignment Compete for Geometric Potential in Transformers

Indexed indatacite

Abstract

We introduce the spectral slope S(ℓ)—the log-linear decay rate of PCA eigenvalues computed from hidden-state representations at layer ℓ—as a cheap, per-layer diagnostic scalar for Transformer geometry. Across four rounds of experiments on 13 models from 5 architecture families (0.6B–30B parameters, dense and MoE, with varying RL intensity and modality count), we find that (1) per-layer spectral expansion ΔS/L decays monotonically with log N within the Qwen3 family (r=−0.968, p=0.007); (2) output-layer participation ratio PR tracks RL training intensity from 13.3 (base) to 4.3 (extreme RL); (3) chain-of-thought reasoning reverses RL-induced compression at runtime; (4) MoE routing increases aggregate spectral…

Citation impact

6
total citations
FWCI
Percentile
References
0
Too recent for citation history.

Authors

1

Topics & keywords

Keywords
  • Transformer
  • Eigenvalues and eigenvectors
  • Spectral shape analysis
  • Monotonic function
  • Scalar (mathematics)
  • Topology (electrical circuits)
  • Pattern recognition (psychology)
No related works found for this paper.