Cross-view Transformers for real-time Map-view Semantic Segmentation

Zhou, Brady; Krähenbühl, Philipp

doi:10.1109/cvpr52688.2022.01339

article2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Jun 1, 2022Closed access

Cross-view Transformers for real-time Map-view Semantic Segmentation

BZBrady Zhou PKPhilipp Krähenbühl

The University of Texas at Austin

Indexed incrossref

Abstract

We present cross-view transformers, an efficient attention-based model for map-view semantic segmentation from multiple cameras. Our architecture implicitly learns a mapping from individual camera views into a canonical map-view representation using a camera-aware cross-view attention mechanism. Each camera uses positional embeddings that depend on its intrinsic and extrinsic calibration. These embeddings allow a transformer to learn the mapping across different views without ever explicitly modeling it geometrically. The architecture consists of a convolutional image encoder for each view and cross-view transformer layers to infer a map-view semantic segmentation. Our model is simple, easily parallelizable,…

Citation impact

283

total citations

FWCI: 15.81
Percentile: 100%
References: 63

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Computer science
Encoder
Segmentation
Transformer
Inference
Artificial intelligence
Computer vision
Architecture

UN Sustainable Development Goals

Sustainable cities and communities

No related works found for this paper.

Funding

NS
National Science Foundation
Award: IIS-1845485,IIS-2006820