preprintarXiv (Cornell University)Oct 25, 2024GREEN OA

GPT-4o System Card

OOpenAI::AMA. M. HurstALAdam LererAPAdam P. Goucher
Indexed inarxivdatacite

Abstract

GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50\% cheaper in the API. GPT-4o is especially better at vision and audio understanding…

Citation impact

120
total citations
FWCI
Percentile
References
0
Citations per year

Authors

420

Topics & keywords

Keywords
  • Computer science
No related works found for this paper.