BEVFormer: Learning Bird’s-Eye-View Representation From LiDAR-Camera via Spatiotemporal Transformers

Li, Zhiqi; Wang, Wenhai; Li, Hongyang; Xie, Enze; Sima, Chonghao; Lü, Tong; Qiao, Yu; Dai, Jifeng

doi:10.1109/tpami.2024.3515454

articleIEEE Transactions on Pattern Analysis and Machine IntelligenceDec 11, 2024GREEN OA

BEVFormer: Learning Bird’s-Eye-View Representation From LiDAR-Camera via Spatiotemporal Transformers

ZLZhiqi Li WWWenhai Wang HLHongyang Li EXEnze Xie CSChonghao Sima

Nanjing University · Shanghai Artificial Intelligence Laboratory · +2 more institutions

PubMed

Indexed incrossrefpubmed

Abstract

Multi-modality fusion strategy is currently the de-facto most competitive solution for 3D perception tasks. In this work, we present a new framework termed BEVFormer, which learns unified BEV representations from multi-modality data with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, we design spatial cross-attention that each BEV query extracts the spatial features from both point cloud and camera input, thus completing multi-modality information fusion under BEV space. For…

Citation impact

162

total citations

FWCI: 209.84
Percentile: 100%
References: 103

Citations per year

Authors

8

Topics & keywords

Topics

Keywords

Lidar
Artificial intelligence
Computer vision
Computer science
Transformer
Representation (politics)
Pattern recognition (psychology)
Remote sensing

No related works found for this paper.

Funding

NN
National Natural Science Foundation of China
Awards: 62372223, 62376134, 62206172, U24A20330