The Representational Budget: Scale, RL, and Multimodal Alignment Compete for Geometric Potential in Transformers
Indexed indatacite
Abstract
We introduce the spectral slope S(ℓ)—the log-linear decay rate of PCA eigenvalues computed from hidden-state representations at layer ℓ—as a cheap, per-layer diagnostic scalar for Transformer geometry. Across four rounds of experiments on 13 models from 5 architecture families (0.6B–30B parameters, dense and MoE, with varying RL intensity and modality count), we find that (1) per-layer spectral expansion ΔS/L decays monotonically with log N within the Qwen3 family (r=−0.968, p=0.007); (2) output-layer participation ratio PR tracks RL training intensity from 13.3 (base) to 4.3 (extreme RL); (3) chain-of-thought reasoning reverses RL-induced compression at runtime; (4) MoE routing increases aggregate spectral…
Citation impact
6
total citations
- FWCI
- —
- Percentile
- —
- References
- 0
Too recent for citation history.
Authors
1Topics & keywords
Topics
Keywords
- Transformer
- Eigenvalues and eigenvectors
- Spectral shape analysis
- Monotonic function
- Scalar (mathematics)
- Topology (electrical circuits)
- Pattern recognition (psychology)
No related works found for this paper.