Neural Implicit Representation for Building Digital Twins of
Unknown Articulated Objects


CVPR 2024

Abstract

overview

We tackle the problem of building digital twins of unknown articulated objects from two RGBD scans of the object at different articulation states. We decompose the problem into two stages, each addressing distinct aspects. Our method first reconstructs object-level shape at each state, then recovers the underlying articulation model including part segmentation and joint articulations that associate the two states. By explicitly modeling point-level correspondences and exploiting cues from images, 3D reconstructions, and kinematics, our method yields more accurate and stable results compared to prior work. It also handles more than one movable part and does not rely on any object shape or structure priors.

Demo: Interacting with the Multi-Part Digital Twin in Simulation




Our reconstructed digital twin can be readily imported into simulation environments and interacted with. The above video shows an example interaction sequence in Issac Gym, played at 4x speed.

Qualitative Results on PARIS Two-Part Object Dataset




Visualization of reconstruction results on PARIS Two-Part Object Dataset from PARIS, PARIS* (PARIS augmented with depth supervision), and our method. We run each method 10 times with different random initializations and show typical trials with performance closest to the average.

Qualitative Results on SAPIEN Multi-Part Objects




Visualization of reconstruction results on SAPIEN Multi-Part Objects from PARIS*-m (PARIS augmented with depth supervision and extended to handle multiple movable parts) and our method. We run each method 10 times with different random initializations and show typical trials with performance closest to the average.

Method Overview

Given multi-view RGB-D observations of an articulated object at two different joint configurations, we aim to reconstruct its per-part geometry and articulation model.
We decompose this problem into two stages with distinct focuses. Our method performs per-state object-level reconstruction in the first stage, then recovers the articulation model in the second stage. We parameterize the articulation model with a part segmentation field and per-part rigid transformations, from which we explicitly derive a point correspondence field that associates the two reconstructions. The point correspondence field can be effectively supervised with a set of losses, including consistency loss, matching loss, and collision loss, leveraging cues from 3D reconstruction, image feature matching, and kinematics.

Acknowledgements


The website template was borrowed from Michaƫl Gharbi.