LAION-5B: An open large-scale dataset for training next generation image-text models
Indexed inarxivdatacite
Abstract
Groundbreaking language-vision architectures like CLIP and DALL-E proved the utility of training on large amounts of noisy image-text data, without relying on expensive accurate labels used in standard vision unimodal supervised learning. The resulting models showed capabilities of strong text-guided image generation and transfer to downstream tasks, while performing remarkably at zero-shot classification with noteworthy out-of-distribution robustness. Since then, large-scale language-vision models like ALIGN, BASIC, GLIDE, Flamingo and Imagen made further improvements. Studying the training and capabilities of such models requires datasets containing billions of image-text pairs. Until now, no datasets of…
Citation impact
1,036
total citations
- FWCI
- —
- Percentile
- —
- References
- 0
Citations per year
Authors
16Topics & keywords
Topics
Keywords
- Computer science
- Robustness (evolution)
- Artificial intelligence
- Image (mathematics)
- Scale (ratio)
- Modal
- Machine learning
- Data mining
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.