Assessing the (In)Ability of LLMs to Reason in Interval Temporal Logic
University of Ferrara · University of Milano-Bicocca
Abstract
The logical reasoning skills of Large Language Models (LLMs) is poorly understood and often overstated. Current evaluation suites rely on algebraic or commonsense puzzles that mix reasoning with symbolic manipulation and/or provide static datasets that quickly saturate or leak into pretraining corpora. In purely logical terms, the most relevant reasoning skill is the meta-mathematical task of valid formula recognition, which is at the foundation of higher-level reasoning tasks (including deduction and minimization of assertions, to name just a few). In the current landscape of LLMs benchmarking, puzzles are most often stated in propositional or first-order logic, with a few exceptions for point-based temporal…
Citation impact
- FWCI
- 33.38
- Percentile
- 100%
- References
- 0
Authors
5- BPBellodi, PietroCorresponding
University of Ferrara
- CPCasavecchia, Pietro
University of Ferrara
- PAPaparella, Alberto
University of Ferrara
- SGSciavicco, Guido
University of Ferrara
- SIStan, Ionel Eduard
University of Milano-Bicocca
Topics & keywords
- Computer science
- Benchmark (surveying)
- Generalization
- Benchmarking
- Set (abstract data type)
- Artificial intelligence
- Human intelligence
- Task (project management)
- Quality Education