LASA
Instance Reconstruction from Real Scans using A Large-scale Aligned Shape Annotation Dataset

CVPR 2024

Haolin Liu^1,2, Chongjie Ye^1,2, Yinyu Nie³, Yingfan He^1,2, Xiaoguang Han^2,1†
¹FNii, CUHKSZ ²SSE, CUHKSZ
³Technical University of Munich

^*Equally contributed ^†Corresponding Author

Abstract

Instance shape reconstruction from a 3D scene involves recovering the full geometries of multiple objects at the semantic instance level. Many methods leverage data-driven learning due to the intricacies of scene complexity and significant indoor occlusions. Training these methods often requires a large-scale, high-quality dataset with aligned and paired shape annotations with real-world scans. Existing datasets are either synthetic or misaligned, restricting the performance of data-driven methods on real data. To this end, we introduce LASA, a Large-scale Aligned Shape Annotation Dataset comprising 10,412 high-quality CAD annotations aligned with 920 real-world scene scans from ArkitScenes, created manually by professional artists. On this top, we propose a novel Diffusion-based Cross-Modal Shape Reconstruction (DisCo) method. It is empowered by a hybrid feature aggregation design to fuse multi-modal inputs and recover high-fidelity object geometries. Besides, we present an Occupancy-Guided 3D Object Detection (OccGOD) method and demonstrate that our shape annotations provide scene occupancy clues that can further improve 3D object detection. Supported by LASA, extensive experiments show that our methods achieve state-of-the-art performance in both instance-level scene reconstruction and 3D object detection tasks.

Method

Pipeline of our DisCo. Firstly, a triplane VAE model is trained to encode shape into triplane latent space (top-left). Subsequently, a triplane diffusion model is trained in this latent space for conditional shape reconstruction (top-right). A novel Hybrid Feature Aggregation Layer is proposed to effectively aggregate and align local features in both partial points cloud and multi-view images (bottom).

LASA
Instance Reconstruction from Real Scans using A Large-scale Aligned Shape Annotation Dataset

CVPR 2024

Haolin Liu^1,2, Chongjie Ye^1,2, Yinyu Nie³, Yingfan He^1,2, Xiaoguang Han^2,1†
¹FNii, CUHKSZ ²SSE, CUHKSZ
³Technical University of Munich

^*Equally contributed ^†Corresponding Author

Video

Abstract

Dataset

Aligned Annotations

Method

Diffusion-based Reconstruction

LASA Instance Reconstruction from Real Scans using A Large-scale Aligned Shape Annotation Dataset

CVPR 2024

Haolin Liu1,2*, Chongjie Ye1,2*, Yinyu Nie3, Yingfan He1,2, Xiaoguang Han2,1† 1 FNii, CUHKSZ 2 SSE, CUHKSZ 3 Technical University of Munich

*Equally contributed †Corresponding Author

Video

Abstract

Dataset

Aligned Annotations

Method

Diffusion-based Reconstruction

LASA
Instance Reconstruction from Real Scans using A Large-scale Aligned Shape Annotation Dataset

Haolin Liu^1,2, Chongjie Ye^1,2, Yinyu Nie³, Yingfan He^1,2, Xiaoguang Han^2,1†
¹FNii, CUHKSZ ²SSE, CUHKSZ
³Technical University of Munich

^*Equally contributed ^†Corresponding Author