preprintDagstuhl Research Online Publication ServerJan 1, 2025GREEN OA

Assessing the (In)Ability of LLMs to Reason in Interval Temporal Logic

BPBellodi, PietroCPCasavecchia, PietroPAPaparella, AlbertoSGSciavicco, GuidoSIStan, Ionel Eduard

University of Ferrara · University of Milano-Bicocca

Indexed indatacite

Abstract

The logical reasoning skills of Large Language Models (LLMs) is poorly understood and often overstated. Current evaluation suites rely on algebraic or commonsense puzzles that mix reasoning with symbolic manipulation and/or provide static datasets that quickly saturate or leak into pretraining corpora. In purely logical terms, the most relevant reasoning skill is the meta-mathematical task of valid formula recognition, which is at the foundation of higher-level reasoning tasks (including deduction and minimization of assertions, to name just a few). In the current landscape of LLMs benchmarking, puzzles are most often stated in propositional or first-order logic, with a few exceptions for point-based temporal…

Citation impact

102
total citations
FWCI
33.38
Percentile
100%
References
0
Citations per year

Authors

5
  • BP
    Bellodi, PietroCorresponding

    University of Ferrara

  • CP
    Casavecchia, Pietro

    University of Ferrara

  • PA
    Paparella, Alberto

    University of Ferrara

  • SG
    Sciavicco, Guido

    University of Ferrara

  • SI
    Stan, Ionel Eduard

    University of Milano-Bicocca

Topics & keywords

Keywords
  • Computer science
  • Benchmark (surveying)
  • Generalization
  • Benchmarking
  • Set (abstract data type)
  • Artificial intelligence
  • Human intelligence
  • Task (project management)
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.