GPT-4o System Card
Indexed inarxivdatacite
Abstract
GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50\% cheaper in the API. GPT-4o is especially better at vision and audio understanding…
Citation impact
120
total citations
- FWCI
- —
- Percentile
- —
- References
- 0
Citations per year
Authors
420- OOpenAICorresponding
- ::
- AMA. M. Hurst
- ALAdam Lerer
- APAdam P. Goucher
Topics & keywords
Topics
Keywords
- Computer science
No related works found for this paper.