NVIDIA Research Toronto AI Lab
Extracting Triangular 3D Models, Materials, and Lighting From Images

Extracting Triangular 3D Models, Materials, and Lighting From Images

2University of Toronto
3Vector Institute
CVPR 2022 (Oral)

We learn topology, materials, and environment map lighting jointly from 2D supervision. We directly optimize topology of a triangle mesh, learn materials through volumetric texturing, and leverage differentiable split sum environment lighting. Our output representation is a triangle mesh with spatially varying 2D textures and a high dynamic range environment map, which can be used unmodified in standard game engines. Spot model by Keenan Crane.


We present an efficient method for joint optimization of topology, materials and lighting from multi-view image observations. Unlike recent multi-view reconstruction approaches, which typically produce entangled 3D representations encoded in neural networks, we output triangle meshes with spatially-varying materials and environment lighting that can be deployed in any traditional graphics engine unmodified. We leverage recent work in differentiable rendering, coordinate-based networks to compactly represent volumetric texturing, alongside differentiable marching tetrahedrons to enable gradient-based optimization directly on the surface mesh. Finally, we introduce a differentiable formulation of the split sum approximation of environment lighting to efficiently recover all-frequency lighting. Experiments show our extracted models used in advanced scene editing, material decomposition, and high quality view interpolation, all running at interactive rates in triangle-based renderers (rasterizers and path tracers).

Video illustrating our training progress, scene editing examples and automatic LOD. All examples enabled by our explicit decomposition into a triangle mesh, PBR materials and HDR environment light, directly compatible with traditional graphics engines. Feel free to download the video, native resolution: 1024x1024 pixels.

3D model reconstruction and intrinsic decomposition from images

Our reconstruction from 100 images. We reconstruct a triangle mesh, PBR materials stored in 2D textures, and an HDR environment map. Materials scene from the NeRF synthetic dataset.

Scene manipulation with the reconstructed models

We reconstruct a triangular mesh with unknown topology, spatially-varying materials, and lighting from a set of multi-view images. We show examples of scene manipulation using off-the-shelf modeling tools, enabled by our reconstructed 3D model. Dataset from NeRD: Neural Reflectance Decomposition from Image Collections.

All-frequency environment lighting

Environment lighting approximated with Spherical Gaussians using 128 lobes vs. Split Sum. The training set consists of 256 path traced images with Monte Carlo sampled environment lighting using a high resolution HDR probe. We assume known geometry and optimize materials and lighting using identical settings for both methods. Reported image metrics are the arithmetic mean of the 16 (novel) views in the test set. Note that the split sum approximation is able to capture high frequency lighting. Probe from Poly Haven.


            author    = {Munkberg, Jacob and Hasselgren, Jon and Shen, Tianchang and Gao, Jun and Chen, Wenzheng 
                         and Evans, Alex and M\"uller, Thomas and Fidler, Sanja},
            title     = "{Extracting Triangular 3D Models, Materials, and Lighting From Images}",
            booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
            month     = {June},
            year      = {2022},
            pages     = {8280-8290}


Extracting Triangular 3D Models, Materials, and Lighting From Images

Jacob Munkberg, Jon Hasselgren, Tianchang Shen, Jun Gao, Wenzheng Chen, Alex Evans, Thomas Müller, Sanja Fidler

description Preprint
description arXiv version
description Video
insert_comment BibTeX