preprintarXiv (Cornell University)Feb 8, 2023GREEN OA

A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

Indexed inarxivdatacite

Abstract

This paper proposes a framework for quantitatively evaluating interactive LLMs such as ChatGPT using publicly available data sets. We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks. We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset. We find that ChatGPT outperforms LLMs with zero-shot learning on most tasks and even outperforms fine-tuned models on some tasks. We find that it is better at understanding non-Latin script languages than generating them. It is able to generate multimodal content from textual prompts, via an intermediate code…

Citation impact

353
total citations
FWCI
Percentile
References
0
Citations per year

Authors

13

Topics & keywords

Keywords
  • Computer science
  • Semantic reasoner
  • Artificial intelligence
  • Natural language processing
  • Set (abstract data type)
  • Automatic summarization
  • Machine learning
  • Programming language
No related works found for this paper.