articleJun 1, 2023Closed access

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding

Salesforce (United States) · The University of Texas at Austin · +1 more institution

Indexed incrossref

Abstract

The recognition capabilities of current state-of-the-art 3D models are limited by datasets with a small number of annotated data and a pre-defined set of categories. In its 2D counterpart, recent advances have shown that similar problems can be significantly alleviated by employing knowledge from other modalities, such as language. Inspired by this, leveraging multimodal information for 3D modality could be promising to improve 3D understanding under the restricted data regime, but this line of research is not well studied. Therefore, we introduce ULIP to learn a unified representation of image, text, and 3D point cloud by pre-training with object triplets from the three modalities. To overcome the shortage of…

Citation impact

202
total citations
FWCI
23.11
Percentile
100%
References
84
Citations per year

Authors

9

Topics & keywords

Keywords
  • Computer science
  • Point cloud
  • Representation (politics)
  • Artificial intelligence
  • Contextual image classification
  • Point (geometry)
  • Modality (human–computer interaction)
  • Modalities
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.