Systematic Mapping of LLM Knowledge Boundaries Across 67 Scientific Domains

Sanchez, Bryan

doi:10.5281/zenodo.19055582

preprintZenodo (CERN European Organization for Nuclear Research)Mar 16, 2026GREEN OA

Systematic Mapping of LLM Knowledge Boundaries Across 67 Scientific Domains

BSBryan Sanchez

Indexed indatacite

Abstract

I map the knowledge boundaries of a 4-billion-parameter language model across 67 scientific domains using a log-probability oracle that compares P(truth) against P(best distractor) for 1,038 verified facts. The baseline model answers correctly on only 22.9% of facts. Failures concentrate in mathematical physics (NS regularity: mean margin −46.7), computational domains (0% in 15 domains), and facts involving specific quantitative relationships. Three systematic failure patterns account for most errors: token-length bias, frozen priors, and domain-specific wrong beliefs. Orthogonal adapter routing repairs all failures to 100%. The 1,038-fact verified dataset across 67 domains is released as an artifact.

Citation impact

6

total citations

FWCI: —
Percentile: —
References: 0

Too recent for citation history.

Authors

1

BS
Bryan SanchezCorresponding

Topics & keywords

Topics

Keywords

Margin (machine learning)
Oracle
Baseline (sea)
Adapter (computing)
Bridging (networking)

UN Sustainable Development Goals

Peace, Justice and strong institutions

No related works found for this paper.