A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

Bang, Yejin; Cahyawijaya, Samuel; Lee, Nayeon; Dai, Wenliang; Su, Dan; Wilie, Bryan; Lovenia, Holy; Ji, Ziwei; Yu, Tiezheng; Chung, Willy; Do, Quyet V.; Xu, Yan; Fung, Pascale

doi:10.48550/arxiv.2302.04023

preprintarXiv (Cornell University)Feb 8, 2023GREEN OA

A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

YBYejin Bang SCSamuel Cahyawijaya NLNayeon Lee WDWenliang Dai DSDan Su

Indexed inarxivdatacite

Abstract

This paper proposes a framework for quantitatively evaluating interactive LLMs such as ChatGPT using publicly available data sets. We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks. We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset. We find that ChatGPT outperforms LLMs with zero-shot learning on most tasks and even outperforms fine-tuned models on some tasks. We find that it is better at understanding non-Latin script languages than generating them. It is able to generate multimodal content from textual prompts, via an intermediate code…

Citation impact

353

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

13

Topics & keywords

Topics

Keywords

Computer science
Semantic reasoner
Artificial intelligence
Natural language processing
Set (abstract data type)
Automatic summarization
Machine learning
Programming language

No related works found for this paper.