BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
Massachusetts Institute of Technology
Abstract
Multi-sensor fusion is essential for an accurate and reliable autonomous driving system. Recent approaches are based on point-level fusion: augmenting the LiDAR point cloud with camera features. However, the camera-to-LiDAR projection throws away the semantic density of camera features, hindering the effectiveness of such methods, especially for semantic-oriented tasks (such as 3D scene segmentation). In this paper, we propose BEVFusion, an efficient and generic multi-task multi-sensor fusion framework. It unifies multi-modal features in the shared bird's-eye view (BEV) representation space, which nicely preserves both geometric and semantic information. To achieve this, we diagnose and lift the key efficiency…
Citation impact
- FWCI
- 112.08
- Percentile
- 100%
- References
- 79
Authors
7Topics & keywords
- Computer science
- Artificial intelligence
- Point cloud
- Computer vision
- Segmentation
- Lidar
- Object detection
- Pooling