BEVFormer: Learning Bird’s-Eye-View Representation From LiDAR-Camera via Spatiotemporal Transformers

Nanjing University · Shanghai Artificial Intelligence Laboratory · +2 more institutions

PubMed
Indexed incrossrefpubmed

Abstract

Multi-modality fusion strategy is currently the de-facto most competitive solution for 3D perception tasks. In this work, we present a new framework termed BEVFormer, which learns unified BEV representations from multi-modality data with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, we design spatial cross-attention that each BEV query extracts the spatial features from both point cloud and camera input, thus completing multi-modality information fusion under BEV space. For…

Citation impact

162
total citations
FWCI
209.84
Percentile
100%
References
103
Citations per year

Authors

8

Topics & keywords

Keywords
  • Lidar
  • Artificial intelligence
  • Computer vision
  • Computer science
  • Transformer
  • Representation (politics)
  • Pattern recognition (psychology)
  • Remote sensing
No related works found for this paper.

Funding