How Is ChatGPT’s Behavior Changing Over Time?

Chen, Lingjiao; Zaharia, Matei; Zou, James

doi:10.1162/99608f92.5317da47

articleHarvard Data Science ReviewMar 12, 2024HYBRID OA

How Is ChatGPT’s Behavior Changing Over Time?

LCLingjiao Chen MZMatei Zaharia JZJames Zou

Stanford University · University of California, Berkeley

Indexed incrossrefdoaj

Abstract

GPT-3.5 and GPT-4 are the two most widely used large language model (LLM) services. However, when and how these models are updated over time is opaque. Here, we evaluate the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 on several diverse tasks: 1) math problems, 2) sensitive/dangerous questions, 3) opinion surveys, 4) multi-hop knowledge-intensive questions, 5) generating code, 6) US Medical License tests, and 7) visual reasoning. We find that the performance and behavior of both GPT-3.5 and GPT-4 can vary greatly over time. For example, GPT-4 (March 2023) was reasonable at identifying prime vs. composite numbers (84% accuracy) but GPT-4 (June 2023) was poor on these same questions (51% accuracy).…

Citation impact

274

total citations

FWCI: 26.39
Percentile: 100%
References: 0

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science
Task (project management)
Psychology
Engineering

No related works found for this paper.