Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction
National Engineering Research Center for Information Technology in Agriculture · Tsinghua University
Abstract
Modern methods for vision-centric autonomous driving perception widely adopt the bird's-eye-view (BEV) representation to describe a 3D scene. Despite its better efficiency than voxel representation, it has difficulty describing the fine-grained 3D structure of a scene with a single plane. To address this, we propose a tri-perspective view (TPV) representation which accompanies BEV with two additional perpendicular planes. We model each point in the 3D space by summing its projected features on the three planes. To lift image features to the 3D TPV space, we further propose a transformer-based TPV encoder (TPVFormer) to obtain the TPV features effectively. We employ the attention mechanism to aggregate the…
Citation impact
- FWCI
- 30.95
- Percentile
- 100%
- References
- 77
Authors
5- YHYuanhui HuangCorresponding
National Engineering Research Center for Information Technology in Agriculture
- WZWenzhao Zheng
National Engineering Research Center for Information Technology in Agriculture
- YZYunpeng Zhang
National Engineering Research Center for Information Technology in Agriculture
- JZJie Zhou
National Engineering Research Center for Information Technology in Agriculture
- JLJiwen Lu
Tsinghua University
Topics & keywords
- Computer science
- Artificial intelligence
- Computer vision
- Voxel
- Lidar
- Perspective (graphical)
- Encoder
- Segmentation