Monocular Identity-Conditioned Facial Reflectance Reconstruction

MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University

CVPR 2024

Abstract

Recent 3D face reconstruction methods have made remarkable advancements, yet there remain huge challenges in monocular high-quality facial reflectance reconstruction. Existing methods rely on a large amount of light-stage captured data to learn facial reflectance models. However, the lack of subject diversity poses challenges in achieving good generalization and widespread applicability.

In this paper, we learn the reflectance prior in image space rather than UV space and present a framework named ID2Reflectance. Our framework can directly estimate the reflectance maps of a single image while using limited reflectance data for training. Our key insight is that reflectance data shares facial structures with RGB faces, which enables obtaining expressive facial prior from inexpensive RGB data thus reducing the dependency on reflectance data. We first learn a high-quality prior for facial reflectance. Specifically, we pretrain multi-domain facial feature codebooks and design a codebook fusion method to align the reflectance and RGB domains. Then, we propose an identity-conditioned swapping module that injects facial identity from the target image into the pre-trained autoencoder to modify the identity of the source reflectance image. Finally, we stitch multi-view swapped reflectance images to obtain renderable assets. Extensive experiments demonstrate that our method exhibits excellent generalization capability and achieves state-of-the-art facial reflectance reconstruction results for in-the-wild faces.

Pipeline

Overview of the proposed method. Our core insight is to build a facial reflectance prior in image space by using limited captures and to recover the reflectance maps for any unconstrained face. We first train multi-domain facial codebooks using a large amount of RGB data and limited reflectance data. Then, given an input unconstrained face, we extract the identity feature from the pre-trained ArcFace model. This feature is fed into the swapper module, which guides the decoder to perform identity injection for all domains. We finally stitch three-view identity-conditioned reflectance images to acquire high-quality rendering assets and renderable 3D faces.

BibTeX

@InProceedings{Ren_2024_CVPR, author = {Ren, Xingyu and Deng, Jiankang and Cheng, Yuhao and Guo, Jia and Ma, Chao and Yan, Yichao and Zhu, Wenhan and Yang, Xiaokang}, title = {Monocular Identity-Conditioned Facial Reflectance Reconstruction}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, }

Monocular Identity-Conditioned Facial Reflectance Reconstruction

Abstract

Video

Pipeline

BibTeX