ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding

Xue, Le; Gao, Mingfei; Chen, Xing; Martín-Martín, Roberto; Wu, Jiajun; Xiong, Caiming; Xu, Ran; Niebles, Juan Carlos; Savarese, Silvio

doi:10.1109/cvpr52729.2023.00120

articleJun 1, 2023Closed access

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding

LXLe Xue MGMingfei Gao XCXing Chen RMRoberto Martín-Martín JWJiajun Wu

Salesforce (United States) · The University of Texas at Austin · +1 more institution

Indexed incrossref

Abstract

The recognition capabilities of current state-of-the-art 3D models are limited by datasets with a small number of annotated data and a pre-defined set of categories. In its 2D counterpart, recent advances have shown that similar problems can be significantly alleviated by employing knowledge from other modalities, such as language. Inspired by this, leveraging multimodal information for 3D modality could be promising to improve 3D understanding under the restricted data regime, but this line of research is not well studied. Therefore, we introduce ULIP to learn a unified representation of image, text, and 3D point cloud by pre-training with object triplets from the three modalities. To overcome the shortage of…

Citation impact

202

total citations

FWCI: 23.11
Percentile: 100%
References: 84

Citations per year

Authors

9

Topics & keywords

Topics

Keywords

Computer science
Point cloud
Representation (politics)
Artificial intelligence
Contextual image classification
Point (geometry)
Modality (human–computer interaction)
Modalities

UN Sustainable Development Goals

Quality Education

No related works found for this paper.