Rendering articulated objects while controlling their poses is critical to applications such as virtual reality or animation for movies. Manipulating the pose of an object, however, requires the understanding of its underlying structure, that is, its joints and how they interact with each other. Unfortunately, assuming the structure to be known, as existing methods do, precludes the ability to work on new object categories. We propose to learn both the appearance and the structure of previously unseen articulated objects by observing them move from multiple views, with no additional supervision, such as joints annotations, or information about the structure. Our insight is that adjacent parts that move relative to each other must be connected by a joint. To leverage this observation, we model the object parts in 3D as ellipsoids, which allows us to identify joints. We combine this explicit representation with an implicit one that compensates for the approximation introduced. We show that our method works for different structures, from quadrupeds, to single-arm robots, to humans.
Here we show novel view synthesis results. For a GT frame (left), we reconstruct the appearance (middle), and part segmentation (right) from novel viewpoints.
Here we show re-posing results. We manually manipulate the pose from a GT frame (left), and render it from novel viewpoints (middle). These poses were never seen in training. We also show the corresponding part segmentation (right).
@article{noguchi2021watch,
title={Watch It Move: {U}nsupervised Discovery of {3D} Joints for Re-Posing of Articulated Objects},
author={Noguchi, Atsuhiro and Iqbal, Umar and Tremblay, Jonathan and Harada, Tatsuya and Gallo, Orazio},
journal={arXiv preprint arXiv:2112.11347},
year={2021}
}
We adapted the template for this website from StyleGAN3.