| Sign up

PDF (16.3 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Research Article | Open Access

Hybrid mesh-neural representation for 3D transparent object reconstruction

Jiamin Xu^{¹^,²}, Zihan Zhu^{²^,³}, Hujun Bao^², Weiwei Xu^²()

1

College of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, China

2

State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, China

3

College of Computer Science, ETH Zürich, Zürich 8092, Switzerland

Show Author Information

Graphical Abstract

View original image Download original image

Abstract

In this study, we propose a novel method to reconstruct the 3D shapes of transparent objects using images captured by handheld cameras under natural lighting conditions. It combines the advantages of an explicit mesh and multi-layer perceptron (MLP) network as a hybrid representation to simplify the capture settings used in recent studies. After obtaining an initial shape through multi-view silhouettes, we introduced surface-based local MLPs to encode the vertex displacement field (VDF) for reconstructing surface details. The design of local MLPs allowed representation of the VDF in a piecewise manner using two-layer MLP networks to support the optimization algorithm. Defining local MLPs on the surface instead of on the volume also reduced the search space. Such a hybrid representation enabled us to relax the ray–pixel correspondences that represent the light path constraint to our designed ray–cell correspondences, which significantly simplified the implementation of a single-image-based environment-matting algorithm. We evaluated our representation and reconstruction algorithm on several transparent objects based on ground truth models. The experimental results show that our method produces high-quality reconstructions that are superior to those of state-of-the-art methods using a simplified data-acquisition setup.

Keywords

transparent object 3D reconstruction environment matting neural rendering

References

[1]

Kutulakos, K. N.; Steger, E. A theory of refractive and specular 3D shape by light-path triangulation. International Journal of Computer Vision Vol. 76, No. 1, 13–29, 2008.

Crossref Google Scholar

[2]

Wu, B.; Zhou, Y.; Qian, Y.; Cong, M.; Huang, H. Full 3D reconstruction of transparent objects. ACM Transactions on Graphics Vol. 37, No. 4, Article No. 103, 2018.

Crossref Google Scholar

[3]

Ihrke, I.; Kutulakos, K. N.; Lensch, H. P. A.; Magnor, M.; Heidrich, W. Transparent and specular object reconstruction. Computer Graphics Forum Vol. 29, No. 8, 2400–2426, 2010.

Crossref Google Scholar

[4]

Lyu, J.; Wu, B.; Lischinski, D.; Cohen-Or, D.; Huang, H. Differentiable refraction-tracing for mesh reconstruction of transparent objects. ACM Transactions on Graphics Vol. 39, No. 6, Article No. 195, 2020.

Crossref Google Scholar

[5]

Li, Z.; Yeh, Y. Y.; Chandraker, M. Through the looking glass: Neural 3D reconstruction of transparent shapes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1259–1268, 2020.

[6]

Mildenhall, B.; Srinivasan, P. P.; Tancik, M.; Barron, J. T.; Ramamoorthi, R.; Ng, R. NeRF: Representing scenes as neural radiance fields for view synthesis. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12346. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 405–421, 2020.

[7]

Yariv, L.; Kasten, Y.; Moran, D.; Galun, M.; Atzmon, M.; Basri, R.; Lipman, Y. Multiview neural surface reconstruction by disentangling geometry and appearance. arXiv preprint arXiv: 2003.09852, 2020.

[8]

Oechsle, M.; Peng, S.; Geiger, A. UNISURF: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 5569–5579, 2021.

[9]

Chen, A.; Wu, M.; Zhang, Y.; Li, N.; Lu, J.; Gao, S.; Yu, J. Deep surface light fields. Proceedings of the ACM on Computer Graphics and Interactive Techniques Vol. 1, No. 1, Article No. 14, 2018.

Crossref Google Scholar

[10]

Cui, Z.; Gu, J.; Shi, B.; Tan, P.; Kautz, J. Polarimetric multi-view stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 369–378, 2017.

[11]

Huynh, C. P.; Robles-Kelly, A.; Hancock, E. Shape and refractive index recovery from single-view polarisation images. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1229–1236, 2010.

[12]

Miyazaki, D.; Ikeuchi, K. Inverse polarization raytracing: Estimating surface shapes of transparent objects. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 910–917, 2005.

[13]

Tanaka, K.; Mukaigawa, Y.; Kubo, H.; Matsushita, Y.; Yagi, Y. Recovering transparent shape from time-of-flight distortion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4387–4395, 2016.

[14]

Trifonov, B.; Bradley, D.; Heidrich, W. Tomographic reconstruction of transparent objects. In: Proceedings of the ACM SIGGRAPH Sketches, 55–es, 2006.

[15]

Chen, T.; Goesele, M.; Seidel, H. P. Mesostructure from specularity. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1825–1832, 2006.

[16]

Morris, N. J. W.; Kutulakos, K. N. Reconstructing the surface of inhomogeneous transparent scenes by scatter-trace photography. In: Proceedings of the IEEE 11th International Conference on Computer Vision, 1–8, 2007.

[17]

Wetzstein, G.; Roodnick, D.; Heidrich, W.; Raskar, R. Refractive shape from light field distortion. In: Proceedings of the International Conference on Computer Vision, 1180–1186, 2011.

[18]

Schwartzburg, Y.; Testuz, R.; Tagliasacchi, A.; Pauly, M. High-contrast computational caustic design. ACM Transactions on Graphics Vol. 33, No. 4, Article No. 74, 2014.

Crossref Google Scholar

[19]

Shan, Q.; Agarwal, S.; Curless, B. Refractive height fields from single and multiple images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 286–293, 2012.

[20]

Yue, Y.; Iwasaki, K.; Chen, B. Y.; Dobashi, Y.; Nishita, T. Poisson-based continuous surface generation for goal-based caustics. ACM Transactions on Graphics Vol. 33, No. 3, Article No. 31, 2014.

Crossref Google Scholar

[21]

Morris, N. J. W.; Kutulakos, K. N. Dynamic refraction stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 33, No. 8, 1518–1531, 2011.

Crossref Google Scholar

[22]

Qian, Y.; Gong, M.; Yang, Y. H. Stereo-based 3D reconstruction of dynamic fluid surfaces by global optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6650–6659, 2017.

[23]

Zhang, M.; Lin, X.; Gupta, M.; Suo, J.; Dai, Q. Recovering scene geometry under wavy fluid via distortion and defocus analysis. In: Computer Vision – ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 234–250, 2014.

[24]

Tsai, C. Y.; Veeraraghavan, A.; Sankaranarayanan, A. C. What does a single light-ray reveal about a transparent object? In: Proceedings of the IEEE International Conference on Image Processing, 606–610, 2015.

[25]

Chari, V.; Sturm, P. A theory of refractive photo-light-path triangulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1438–1445, 2013.

[26]

Qian, Y.; Gong, M.; Yang, Y. H. 3D reconstruction of transparent objects with position-normal consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4369–4377, 2016.

[27]

Zongker, D. E.; Werner, D. M.; Curless, B.; Salesin, D. H. Environment matting and compositing. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, 205–214, 1999.

[28]

Stets, J.; Li, Z.; Frisvad, J. R.; Chandraker, M. Single-shot analysis of refractive shape using convolutional neural networks. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 995–1003, 2019.

[29]

Sajjan, S.; Moore, M.; Pan, M.; Nagaraja, G.; Lee, J.; Zeng, A.; Song, S. Clear grasp: 3D shape estimation of transparent objects for manipulation. In: Proceedings of the IEEE International Conference on Robotics and Automation, 3634–3642, 2020.

[30]

Loper, M. M.; Black, M. J. OpenDR: An approximate differentiable renderer. In: Computer Vision – ECCV 2014. Lecture Notes in Computer Science, Vol. 8695. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 154–169, 2014.

[31]

Kato, H.; Ushiku, Y.; Harada, T. Neural 3D mesh renderer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3907–3916, 2018.

[32]

Li, T. M.; Aittala, M.; Durand, F.; Lehtinen, J. Differentiable Monte Carlo ray tracing through edge sampling. ACM Transactions on Graphics Vol. 37, No. 6, Article No. 222, 2018.

Crossref Google Scholar

[33]

Liu, S.; Chen, W.; Li, T.; Li, H. Soft rasterizer: A differentiable renderer for image-based 3D reasoning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 7707–7716, 2019.

[34]

Laine, S.; Hellsten, J.; Karras, T.; Seol, Y.; Lehtinen, J.; Aila, T. Modular primitives for high-performance differentiable rendering. ACM Transactions on Graphics Vol. 39, No. 6, Article No. 194, 2020.

Crossref Google Scholar

[35]

Wang, P.; Liu, L.; Liu, Y.; Theobalt, C.; Komura, T.; Wang, W. NeuS: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In: Proceedings of the 35th International Conference on Neural Information Processing Systems, Article No. 2081, 27171–27183, 2024.

[36]

Yariv, L.; Gu, J.; Kasten, Y.; Lipman, Y. Volume rendering of neural implicit surfaces. In: Proceedings of the 35th International Conference on Neural Information Processing Systems, Article No. 367, 4805–4815, 2024.

[37]

Nimier-David, M.; Vicini, D.; Zeltner, T.; Jakob, W. Mitsuba 2: A retargetable forward and inverse renderer. ACM Transactions on Graphics Vol. 38, No. 6, Article No. 203, 2019.

Crossref Google Scholar

[38]

Li, T. M.; Aittala, M.; Durand, F.; Lehtinen, J. Differentiable Monte Carlo ray tracing through edge sampling. ACM Transactions on Graphics Vol. 37, No. 6, Article No. 222, 2018.

Crossref Google Scholar

[39]

Li, T. M.; Lehtinen, J.; Ramamoorthi, R.; Jakob, W.; Durand, F. Anisotropic Gaussian mutations for metropolis light transport through Hessian-Hamiltonian dynamics. ACM Transactions on Graphics Vol. 34, No. 6, Article No. 209, 2015.

Crossref Google Scholar

[40]

Luan, F.; Zhao, S.; Bala, K.; Gkioulekas, I. Langevin Monte Carlo rendering with gradient-based adaptation. ACM Transactions on Graphics Vol. 39, No. 4, Article No. 140, 2020.

Crossref Google Scholar

[41]

Zhang, C.; Miller, B.; Yan, K.; Gkioulekas, I.; Zhao, S. Path-space differentiable rendering. ACM Transactions on Graphics Vol. 39, No. 4, Article No. 143, 2020.

Crossref Google Scholar

[42]

Zhang, C.; Wu, L.; Zheng, C.; Gkioulekas, I.; Ramamoorthi, R.; Zhao, S. A differential theory of radiative transfer. ACM Transactions on Graphics Vol. 38, No. 6, Article No. 227, 2019.

Crossref Google Scholar

[43]

Bangaru, S. P.; Li, T. M.; Durand, F. Unbiased warped-area sampling for differentiable rendering. ACM Transactions on Graphics Vol. 39, No. 6, Article No. 245, 2020.

Crossref Google Scholar

[44]

Mescheder, L.; Oechsle, M.; Niemeyer, M.; Nowozin, S.; Geiger, A. Occupancy networks: Learning 3D reconstruction in function space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4455–4465, 2019.

[45]

Chen, Z.; Zhang, H. Learning implicit fields for generative shape modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5932–5941, 2019.

[46]

Park, J. J.; Florence, P.; Straub, J.; Newcombe, R.; Lovegrove, S. DeepSDF: Learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 165–174, 2019.

[47]

Atzmon, M.; Lipman, Y. SAL: Sign agnostic learning of shapes from raw data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2562–2571, 2020.

[48]

Porter, T.; Duff, T. Compositing digital images. ACM SIGGRAPH Computer Graphics Vol. 18, No. 3, 253–259, 1984.

Crossref Google Scholar

[49]

Levin, A.; Lischinski, D.; Weiss, Y. A closed-form solution to natural image matting. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 30, No. 2, 228–242, 2008.

Crossref Google Scholar

[50]

Chuang, Y. Y.; Zongker, D. E.; Hindorff, J.; Curless, B.; Salesin, D. H.; Szeliski, R. Environment matting extensions: Towards higher accuracy and real-time capture. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, 121–130, 2000.

[51]

Peers, P.; Dutré, P. Wavelet environment matting. In: Proceedings of the 14th Eurographics Workshop on Rendering, 157–166, 2003.

[52]

Qian, Y.; Gong, M.; Yang, Y. H. Frequency-based environment matting by compressive sensing. In: Proceedings of the IEEE International Conference on Computer Vision, 3532–3540, 2015.

[53]

Duan, Q.; Cai, J.; Zheng, J. Compressive environment matting. The Visual Computer Vol. 31, No. 12, 1587–1600, 2015.

Crossref Google Scholar

[54]

Chen, G.; Han, K.; Wong, K. Y K. TOM-net: Learning transparent object matting from a single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9233–9241, 2018.

[55]

Wexler, Y.; Fitzgibbon, A. W.; Zisserman, A. Image-based environment matting. In: Proceedings of the ACM SIGGRAPH Conference Abstracts and Applications, 198, 2002.

[56]

Kutulakos, K. N.; Seitz, S. M. A theory of shape by space carving. International Journal of Computer Vision Vol. 38, No. 3, 199–218, 2000.

Crossref Google Scholar

[57]

Cohen-Steiner, D.; Alliez, P.; Desbrun, M. Variational shape approximation. In: Proceedings of the ACM SIGGRAPH Papers, 905–914, 2004.

[58]

Crane, K.; Weischedel, C.; Wardetzky, M. The heat method for distance computation. Communications of the ACM Vol. 60, No. 11, 90–99, 2017.

Crossref Google Scholar

[59]

Li, T. M. Differentiable visual computing. arXiv preprint arXiv: 1904.12228, 2019.

[60]

Olson, E. AprilTag: A robust and flexible visual fiducial system. In: Proceedings of the IEEE International Conference on Robotics and Automation, 3400–3407, 2011.

[61]

CapturingReality. Reality capture. 2016. Available at http://capturingreality.com

[62]

Schnabel, R.; Wahl, R.; Klein, R. Efficient RANSAC for point-cloud shape detection. Computer Graphics Forum Vol. 26, No. 2, 214–226, 2007.

Crossref Google Scholar

[63]

Ravi, N.; Reizenstein, J.; Novotny, D.; Gordon, T.; Lo, W. Y.; Johnson, J.; Gkioxari, G. Accelerating 3D deep learning with PyTorch3D. arXiv preprint arXiv: 2007.08501, 2020.

[64]

Born, M.; Wolf, E.; Hecht, E. Principles of optics: Electromagnetic theory of propagation, interference and diffraction of light. Physics Today Vol. 53, No. 10, 77–78, 2000.

Crossref Google Scholar

[65]

Zitová, B.; Flusser, J. Image registration methods: A survey. Image and Vision Computing Vol. 21, No. 11, 977–1000, 2003.

Crossref Google Scholar

[66]

Nicolet, B.; Jacobson, A.; Jakob, W. Large steps in inverse rendering of geometry. ACM Transactions on Graphics Vol. 40, No. 6, Article No. 248, 2021.

Crossref Google Scholar

[67]

Mehta, I.; Chandraker, M.; Ramamoorthi, R. A level set theory for neural implicit evolution under explicit flows. In: Computer Vision – ECCV 2022. Lecture Notes in Computer Science, Vol. 13662. Avidan, S.; Brostow, G.; Cissé, M.; Farinella, G. M.; Hassner, T. Eds. Springer Cham, 711–729, 2022.

[68]

Kingma, D. P.; Ba, J.; Hammad, M. M. Adam: A method for stochastic optimization. arXiv preprint arXiv: 1412.6980, 2014.

[69]

Loshchilov, I.; Hutter, F. SGDR: Stochastic gradient descent with warm restarts. arXiv preprint arXiv: 1608.03983, 2016.

[70]

Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic differentiation in PyTorch. In: Proceedings of the 31st Conference on Neural Information Processing Systems, 2017.

[71]

Zhou, Q. Y.; Park, J.; Koltun, V. Open3D: A modern library for 3D data processing. arXiv preprint arXiv: 1801.09847, 2018.

[72]

Remelli, E.; Lukoianov, A.; Richter, S. R.; Guillard, B.; Bagautdinov, T.; Baque, P.; Fua, P. MeshSDF: Differentiable iso-surface extraction. arXiv preprint arXiv: 2006.03997, 2020.

Computational Visual Media

Volume 11 Issue 1,
February 2025

Pages 123-140

DOI: 10.26599/CVM.2025.9450328

Cite this article:

Xu J, Zhu Z, Bao H, et al. Hybrid mesh-neural representation for 3D transparent object reconstruction. Computational Visual Media, 2025, 11(1): 123-140. https://doi.org/10.26599/CVM.2025.9450328

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号