PDF (16.3 MB)
Collect
Submit Manuscript
Research Article | Open Access

Hybrid mesh-neural representation for 3D transparent object reconstruction

College of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, China
State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, China
College of Computer Science, ETH Zürich, Zürich 8092, Switzerland
Show Author Information

Graphical Abstract

View original image Download original image

Abstract

In this study, we propose a novel method to reconstruct the 3D shapes of transparent objects using images captured by handheld cameras under natural lighting conditions. It combines the advantages of an explicit mesh and multi-layer perceptron (MLP) network as a hybrid representation to simplify the capture settings used in recent studies. After obtaining an initial shape through multi-view silhouettes, we introduced surface-based local MLPs to encode the vertex displacement field (VDF) for reconstructing surface details. The design of local MLPs allowed representation of the VDF in a piecewise manner using two-layer MLP networks to support the optimization algorithm. Defining local MLPs on the surface instead of on the volume also reduced the search space. Such a hybrid representation enabled us to relax the ray–pixel correspondences that represent the light path constraint to our designed ray–cell correspondences, which significantly simplified the implementation of a single-image-based environment-matting algorithm. We evaluated our representation and reconstruction algorithm on several transparent objects based on ground truth models. The experimental results show that our method produces high-quality reconstructions that are superior to those of state-of-the-art methods using a simplified data-acquisition setup.

References

[1]

Kutulakos, K. N.; Steger, E. A theory of refractive and specular 3D shape by light-path triangulation. International Journal of Computer Vision Vol. 76, No. 1, 13–29, 2008.

[2]

Wu, B.; Zhou, Y.; Qian, Y.; Cong, M.; Huang, H. Full 3D reconstruction of transparent objects. ACM Transactions on Graphics Vol. 37, No. 4, Article No. 103, 2018.

[3]

Ihrke, I.; Kutulakos, K. N.; Lensch, H. P. A.; Magnor, M.; Heidrich, W. Transparent and specular object reconstruction. Computer Graphics Forum Vol. 29, No. 8, 2400–2426, 2010.

[4]

Lyu, J.; Wu, B.; Lischinski, D.; Cohen-Or, D.; Huang, H. Differentiable refraction-tracing for mesh reconstruction of transparent objects. ACM Transactions on Graphics Vol. 39, No. 6, Article No. 195, 2020.

[5]
Li, Z.; Yeh, Y. Y.; Chandraker, M. Through the looking glass: Neural 3D reconstruction of transparent shapes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1259–1268, 2020.
[6]
Mildenhall, B.; Srinivasan, P. P.; Tancik, M.; Barron, J. T.; Ramamoorthi, R.; Ng, R. NeRF: Representing scenes as neural radiance fields for view synthesis. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12346. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 405–421, 2020.
[7]
Yariv, L.; Kasten, Y.; Moran, D.; Galun, M.; Atzmon, M.; Basri, R.; Lipman, Y. Multiview neural surface reconstruction by disentangling geometry and appearance. arXiv preprint arXiv: 2003.09852, 2020.
[8]
Oechsle, M.; Peng, S.; Geiger, A. UNISURF: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 5569–5579, 2021.
[9]

Chen, A.; Wu, M.; Zhang, Y.; Li, N.; Lu, J.; Gao, S.; Yu, J. Deep surface light fields. Proceedings of the ACM on Computer Graphics and Interactive Techniques Vol. 1, No. 1, Article No. 14, 2018.

[10]
Cui, Z.; Gu, J.; Shi, B.; Tan, P.; Kautz, J. Polarimetric multi-view stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 369–378, 2017.
[11]
Huynh, C. P.; Robles-Kelly, A.; Hancock, E. Shape and refractive index recovery from single-view polarisation images. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1229–1236, 2010.
[12]
Miyazaki, D.; Ikeuchi, K. Inverse polarization raytracing: Estimating surface shapes of transparent objects. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 910–917, 2005.
[13]
Tanaka, K.; Mukaigawa, Y.; Kubo, H.; Matsushita, Y.; Yagi, Y. Recovering transparent shape from time-of-flight distortion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4387–4395, 2016.
[14]
Trifonov, B.; Bradley, D.; Heidrich, W. Tomographic reconstruction of transparent objects. In: Proceedings of the ACM SIGGRAPH Sketches, 55–es, 2006.
[15]
Chen, T.; Goesele, M.; Seidel, H. P. Mesostructure from specularity. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1825–1832, 2006.
[16]
Morris, N. J. W.; Kutulakos, K. N. Reconstructing the surface of inhomogeneous transparent scenes by scatter-trace photography. In: Proceedings of the IEEE 11th International Conference on Computer Vision, 1–8, 2007.
[17]
Wetzstein, G.; Roodnick, D.; Heidrich, W.; Raskar, R. Refractive shape from light field distortion. In: Proceedings of the International Conference on Computer Vision, 1180–1186, 2011.
[18]

Schwartzburg, Y.; Testuz, R.; Tagliasacchi, A.; Pauly, M. High-contrast computational caustic design. ACM Transactions on Graphics Vol. 33, No. 4, Article No. 74, 2014.

[19]
Shan, Q.; Agarwal, S.; Curless, B. Refractive height fields from single and multiple images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 286–293, 2012.
[20]

Yue, Y.; Iwasaki, K.; Chen, B. Y.; Dobashi, Y.; Nishita, T. Poisson-based continuous surface generation for goal-based caustics. ACM Transactions on Graphics Vol. 33, No. 3, Article No. 31, 2014.

[21]

Morris, N. J. W.; Kutulakos, K. N. Dynamic refraction stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 33, No. 8, 1518–1531, 2011.

[22]
Qian, Y.; Gong, M.; Yang, Y. H. Stereo-based 3D reconstruction of dynamic fluid surfaces by global optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6650–6659, 2017.
[23]
Zhang, M.; Lin, X.; Gupta, M.; Suo, J.; Dai, Q. Recovering scene geometry under wavy fluid via distortion and defocus analysis. In: Computer Vision – ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 234–250, 2014.
[24]
Tsai, C. Y.; Veeraraghavan, A.; Sankaranarayanan, A. C. What does a single light-ray reveal about a transparent object? In: Proceedings of the IEEE International Conference on Image Processing, 606–610, 2015.
[25]
Chari, V.; Sturm, P. A theory of refractive photo-light-path triangulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1438–1445, 2013.
[26]
Qian, Y.; Gong, M.; Yang, Y. H. 3D reconstruction of transparent objects with position-normal consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4369–4377, 2016.
[27]
Zongker, D. E.; Werner, D. M.; Curless, B.; Salesin, D. H. Environment matting and compositing. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, 205–214, 1999.
[28]
Stets, J.; Li, Z.; Frisvad, J. R.; Chandraker, M. Single-shot analysis of refractive shape using convolutional neural networks. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 995–1003, 2019.
[29]
Sajjan, S.; Moore, M.; Pan, M.; Nagaraja, G.; Lee, J.; Zeng, A.; Song, S. Clear grasp: 3D shape estimation of transparent objects for manipulation. In: Proceedings of the IEEE International Conference on Robotics and Automation, 3634–3642, 2020.
[30]
Loper, M. M.; Black, M. J. OpenDR: An approximate differentiable renderer. In: Computer Vision – ECCV 2014. Lecture Notes in Computer Science, Vol. 8695. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 154–169, 2014.
[31]
Kato, H.; Ushiku, Y.; Harada, T. Neural 3D mesh renderer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3907–3916, 2018.
[32]

Li, T. M.; Aittala, M.; Durand, F.; Lehtinen, J. Differentiable Monte Carlo ray tracing through edge sampling. ACM Transactions on Graphics Vol. 37, No. 6, Article No. 222, 2018.

[33]
Liu, S.; Chen, W.; Li, T.; Li, H. Soft rasterizer: A differentiable renderer for image-based 3D reasoning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 7707–7716, 2019.
[34]

Laine, S.; Hellsten, J.; Karras, T.; Seol, Y.; Lehtinen, J.; Aila, T. Modular primitives for high-performance differentiable rendering. ACM Transactions on Graphics Vol. 39, No. 6, Article No. 194, 2020.

[35]
Wang, P.; Liu, L.; Liu, Y.; Theobalt, C.; Komura, T.; Wang, W. NeuS: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In: Proceedings of the 35th International Conference on Neural Information Processing Systems, Article No. 2081, 27171–27183, 2024.
[36]
Yariv, L.; Gu, J.; Kasten, Y.; Lipman, Y. Volume rendering of neural implicit surfaces. In: Proceedings of the 35th International Conference on Neural Information Processing Systems, Article No. 367, 4805–4815, 2024.
[37]

Nimier-David, M.; Vicini, D.; Zeltner, T.; Jakob, W. Mitsuba 2: A retargetable forward and inverse renderer. ACM Transactions on Graphics Vol. 38, No. 6, Article No. 203, 2019.

[38]

Li, T. M.; Aittala, M.; Durand, F.; Lehtinen, J. Differentiable Monte Carlo ray tracing through edge sampling. ACM Transactions on Graphics Vol. 37, No. 6, Article No. 222, 2018.

[39]

Li, T. M.; Lehtinen, J.; Ramamoorthi, R.; Jakob, W.; Durand, F. Anisotropic Gaussian mutations for metropolis light transport through Hessian-Hamiltonian dynamics. ACM Transactions on Graphics Vol. 34, No. 6, Article No. 209, 2015.

[40]

Luan, F.; Zhao, S.; Bala, K.; Gkioulekas, I. Langevin Monte Carlo rendering with gradient-based adaptation. ACM Transactions on Graphics Vol. 39, No. 4, Article No. 140, 2020.

[41]

Zhang, C.; Miller, B.; Yan, K.; Gkioulekas, I.; Zhao, S. Path-space differentiable rendering. ACM Transactions on Graphics Vol. 39, No. 4, Article No. 143, 2020.

[42]

Zhang, C.; Wu, L.; Zheng, C.; Gkioulekas, I.; Ramamoorthi, R.; Zhao, S. A differential theory of radiative transfer. ACM Transactions on Graphics Vol. 38, No. 6, Article No. 227, 2019.

[43]

Bangaru, S. P.; Li, T. M.; Durand, F. Unbiased warped-area sampling for differentiable rendering. ACM Transactions on Graphics Vol. 39, No. 6, Article No. 245, 2020.

[44]
Mescheder, L.; Oechsle, M.; Niemeyer, M.; Nowozin, S.; Geiger, A. Occupancy networks: Learning 3D reconstruction in function space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4455–4465, 2019.
[45]
Chen, Z.; Zhang, H. Learning implicit fields for generative shape modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5932–5941, 2019.
[46]
Park, J. J.; Florence, P.; Straub, J.; Newcombe, R.; Lovegrove, S. DeepSDF: Learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 165–174, 2019.
[47]
Atzmon, M.; Lipman, Y. SAL: Sign agnostic learning of shapes from raw data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2562–2571, 2020.
[48]

Porter, T.; Duff, T. Compositing digital images. ACM SIGGRAPH Computer Graphics Vol. 18, No. 3, 253–259, 1984.

[49]

Levin, A.; Lischinski, D.; Weiss, Y. A closed-form solution to natural image matting. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 30, No. 2, 228–242, 2008.

[50]
Chuang, Y. Y.; Zongker, D. E.; Hindorff, J.; Curless, B.; Salesin, D. H.; Szeliski, R. Environment matting extensions: Towards higher accuracy and real-time capture. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, 121–130, 2000.
[51]
Peers, P.; Dutré, P. Wavelet environment matting. In: Proceedings of the 14th Eurographics Workshop on Rendering, 157–166, 2003.
[52]
Qian, Y.; Gong, M.; Yang, Y. H. Frequency-based environment matting by compressive sensing. In: Proceedings of the IEEE International Conference on Computer Vision, 3532–3540, 2015.
[53]

Duan, Q.; Cai, J.; Zheng, J. Compressive environment matting. The Visual Computer Vol. 31, No. 12, 1587–1600, 2015.

[54]
Chen, G.; Han, K.; Wong, K. Y K. TOM-net: Learning transparent object matting from a single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9233–9241, 2018.
[55]
Wexler, Y.; Fitzgibbon, A. W.; Zisserman, A. Image-based environment matting. In: Proceedings of the ACM SIGGRAPH Conference Abstracts and Applications, 198, 2002.
[56]

Kutulakos, K. N.; Seitz, S. M. A theory of shape by space carving. International Journal of Computer Vision Vol. 38, No. 3, 199–218, 2000.

[57]
Cohen-Steiner, D.; Alliez, P.; Desbrun, M. Variational shape approximation. In: Proceedings of the ACM SIGGRAPH Papers, 905–914, 2004.
[58]

Crane, K.; Weischedel, C.; Wardetzky, M. The heat method for distance computation. Communications of the ACM Vol. 60, No. 11, 90–99, 2017.

[59]
Li, T. M. Differentiable visual computing. arXiv preprint arXiv: 1904.12228, 2019.
[60]
Olson, E. AprilTag: A robust and flexible visual fiducial system. In: Proceedings of the IEEE International Conference on Robotics and Automation, 3400–3407, 2011.
[61]
CapturingReality. Reality capture. 2016. Available at http://capturingreality.com
[62]

Schnabel, R.; Wahl, R.; Klein, R. Efficient RANSAC for point-cloud shape detection. Computer Graphics Forum Vol. 26, No. 2, 214–226, 2007.

[63]
Ravi, N.; Reizenstein, J.; Novotny, D.; Gordon, T.; Lo, W. Y.; Johnson, J.; Gkioxari, G. Accelerating 3D deep learning with PyTorch3D. arXiv preprint arXiv: 2007.08501, 2020.
[64]

Born, M.; Wolf, E.; Hecht, E. Principles of optics: Electromagnetic theory of propagation, interference and diffraction of light. Physics Today Vol. 53, No. 10, 77–78, 2000.

[65]

Zitová, B.; Flusser, J. Image registration methods: A survey. Image and Vision Computing Vol. 21, No. 11, 977–1000, 2003.

[66]

Nicolet, B.; Jacobson, A.; Jakob, W. Large steps in inverse rendering of geometry. ACM Transactions on Graphics Vol. 40, No. 6, Article No. 248, 2021.

[67]
Mehta, I.; Chandraker, M.; Ramamoorthi, R. A level set theory for neural implicit evolution under explicit flows. In: Computer Vision – ECCV 2022. Lecture Notes in Computer Science, Vol. 13662. Avidan, S.; Brostow, G.; Cissé, M.; Farinella, G. M.; Hassner, T. Eds. Springer Cham, 711–729, 2022.
[68]
Kingma, D. P.; Ba, J.; Hammad, M. M. Adam: A method for stochastic optimization. arXiv preprint arXiv: 1412.6980, 2014.
[69]
Loshchilov, I.; Hutter, F. SGDR: Stochastic gradient descent with warm restarts. arXiv preprint arXiv: 1608.03983, 2016.
[70]
Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic differentiation in PyTorch. In: Proceedings of the 31st Conference on Neural Information Processing Systems, 2017.
[71]
Zhou, Q. Y.; Park, J.; Koltun, V. Open3D: A modern library for 3D data processing. arXiv preprint arXiv: 1801.09847, 2018.
[72]
Remelli, E.; Lukoianov, A.; Richter, S. R.; Guillard, B.; Bagautdinov, T.; Baque, P.; Fua, P. MeshSDF: Differentiable iso-surface extraction. arXiv preprint arXiv: 2006.03997, 2020.
Computational Visual Media
Pages 123-140
Cite this article:
Xu J, Zhu Z, Bao H, et al. Hybrid mesh-neural representation for 3D transparent object reconstruction. Computational Visual Media, 2025, 11(1): 123-140. https://doi.org/10.26599/CVM.2025.9450328
Metrics & Citations  
Article History
Copyright
Rights and Permissions
Return