Assessing the (In)Ability of LLMs to Reason in Interval Temporal Logic

Pietro, Bellodi,; Pietro, Casavecchia,; Alberto, Paparella,; Guido, Sciavicco,; Eduard, Stan, Ionel

doi:10.4230/lipics.time.2025.4

preprintDagstuhl Research Online Publication ServerJan 1, 2025GREEN OA

Assessing the (In)Ability of LLMs to Reason in Interval Temporal Logic

BPBellodi, PietroCPCasavecchia, PietroPAPaparella, AlbertoSGSciavicco, GuidoSIStan, Ionel Eduard

University of Ferrara · University of Milano-Bicocca

Indexed indatacite

Abstract

The logical reasoning skills of Large Language Models (LLMs) is poorly understood and often overstated. Current evaluation suites rely on algebraic or commonsense puzzles that mix reasoning with symbolic manipulation and/or provide static datasets that quickly saturate or leak into pretraining corpora. In purely logical terms, the most relevant reasoning skill is the meta-mathematical task of valid formula recognition, which is at the foundation of higher-level reasoning tasks (including deduction and minimization of assertions, to name just a few). In the current landscape of LLMs benchmarking, puzzles are most often stated in propositional or first-order logic, with a few exceptions for point-based temporal…

Citation impact

102

total citations

FWCI: 33.38
Percentile: 100%
References: 0

Citations per year

Authors

5

BP
Bellodi, PietroCorresponding
University of Ferrara
CP
Casavecchia, Pietro
University of Ferrara
PA
Paparella, Alberto
University of Ferrara
SG
Sciavicco, Guido
University of Ferrara
SI
Stan, Ionel Eduard
University of Milano-Bicocca

Topics & keywords

Topics

Keywords

Computer science
Benchmark (surveying)
Generalization
Benchmarking
Set (abstract data type)
Artificial intelligence
Human intelligence
Task (project management)

UN Sustainable Development Goals

Quality Education

No related works found for this paper.