preprintarXiv (Cornell University)Oct 16, 2022GREEN OA

LAION-5B: An open large-scale dataset for training next generation image-text models

Indexed inarxivdatacite

Abstract

Groundbreaking language-vision architectures like CLIP and DALL-E proved the utility of training on large amounts of noisy image-text data, without relying on expensive accurate labels used in standard vision unimodal supervised learning. The resulting models showed capabilities of strong text-guided image generation and transfer to downstream tasks, while performing remarkably at zero-shot classification with noteworthy out-of-distribution robustness. Since then, large-scale language-vision models like ALIGN, BASIC, GLIDE, Flamingo and Imagen made further improvements. Studying the training and capabilities of such models requires datasets containing billions of image-text pairs. Until now, no datasets of…

Citation impact

1,036
total citations
FWCI
Percentile
References
0
Citations per year

Authors

16

Topics & keywords

Keywords
  • Computer science
  • Robustness (evolution)
  • Artificial intelligence
  • Image (mathematics)
  • Scale (ratio)
  • Modal
  • Machine learning
  • Data mining
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.