Multi-scale hash encoding based neural geometry representation

Zhi Deng; Haoyao Xiao; Yining Lang; Hao Feng; Juyong Zhang

doi:10.1007/s41095-023-0340-x

| Sign up

PDF (6.5 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Research Article | Open Access

Multi-scale hash encoding based neural geometry representation

Zhi Deng^¹, Haoyao Xiao^¹, Yining Lang^², Hao Feng^², Juyong Zhang^¹()

1School of Mathematical Sciences, University of Scienceand Technology of China, Hefei 230026, China

2Alibaba Artificial Intelligence Governance Laboratory, Alibaba Group, Hangzhou 310017, China

Show Author Information

Graphical Abstract

View original image Download original image

Abstract

Recently, neural implicit function-based representation has attracted more and more attention, and has been widely used to represent surfaces using differentiable neural networks. However, surface reconstruction from point clouds or multi-view images using existing neural geometry representations still suffer from slow computation and poor accuracy. To alleviate these issues, we propose a multi-scale hash encoding-based neural geometry representation which effectively and efficiently represents the surface as a signed distance field. Our novel neural network structure carefully combines low-frequency Fourier position encoding with multi-scale hash encoding. The initialization of the geometry network and geometry features of the rendering module are accordingly redesigned. Our experiments demonstrate that the proposed representation is at least 10 times faster for reconstructing point clouds with millions of points. It also significantly improves speed and accuracy of multi-view reconstruction. Our code and models are available at https://github.com/Dengzhi-USTC/Neural-Geometry-Reconstruction.

Keywords

neural geometry representation hash encoding point cloud reconstruction multi-view reconstruction

References

[1]

Wang,

; Liu,

; Theobalt,

; Komura,

; Wang,

NeuS: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In: Proceedings of the 35th Conference on Neural Information Processing Systems, 27171–27183, 2021.

[2]

Sitzmann,

; Martel,

; Bergman,

; Lindell,

; Wetzstein,

Implicit neural representations with periodic activation functions. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, Article No. 626, 7462–7473, 2020.

[3]

Lorensen,

W. E.

; Cline,

H. E.

Marching cubes: A high resolution 3D surface construction algorithm. ACM SIGGRAPH Computer Graphics Vol. 21, No. 4, 163–169, 1987.

Crossref Google Scholar

[4]

Gropp,

; Yariv,

; Haim,

; Atzmon,

; Lipman,

Implicit geometric regularization for learning shapes. In: Proceedings of the 37th International Conference on Machine Learning, 3789–3799, 2020.

[5]

Yariv,

; Kasten,

; Moran,

; Galun,

; Atzmon,

; Basri,

; Lipman,

Multiview neural surface reconstruction by disentangling geometry and appearance. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, Article No. 210, 2492–2502, 2020.

[6]

Rahaman,

; Baratin,

; Arpit,

; Draxler,

; Lin,

; Hamprecht,

F. A.

; Bengio,

; Courville,

A. C.

On the spectral bias of neural networks. In: Proceedings of the 36th International Conference on Machine Learning, 5301–5310, 2019.

[7]

Mildenhall,

; Srinivasan,

P. P.

; Tancik,

; Barron,

J. T.

; Ramamoorthi,

; Ng,

NeRF: representing scenes as neural radiance fields for view synthesis. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12346. Vedaldi,

; Bischof,

; Brox,

; Frahm,

J. M.

Eds. Springer Cham, 405–421, 2020.

Crossref

[8]

Tancik,

; Srinivasan,

P. P.

; Mildenhall,

; Fridovich-Keil,

; Raghavan,

; Singhal,

; Ramamoorthi,

; Barron,

J. T.

; Ng,

Fourier features let networks learn high frequency functions in low dimensional domains. In: Proceedings of the 34th International Conference on Neural Information Processing System, 7537–7547, 2020.

[9]

Hertz,

; Perel,

; Giryes,

; Sorkine-Hornung,

; Cohen-Or,

SAPE: Spatially-adaptive progressive encoding for neural optimization. In: Proceedings of the 35th Conference on Neural Information Processing Systems, 8820–8832, 2021.

[10]

Wang,

P. S.

; Liu,

; Yang,

Y. Q.

; Tong,

Spline positional encoding for learning 3D implicit signed distance fields. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence, 1091–1097, 2021.

Crossref

[11]

Müller,

; Evans,

; Schied,

; Keller,

Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics Vol. 41, No. 4, Article No. 102, 2022.

Crossref Google Scholar

[12]

Atzmon,

; Lipman,

SAL: Sign agnostic learning of shapes from raw data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2562–2571, 2020.

Crossref

[13]

Park,

J. J.

; Florence,

; Straub,

; Newcombe,

; Lovegrove,

DeepSDF: Learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 165–174, 2019.

Crossref

[14]

Liu,

S. L.

; Guo,

H. X.

; Pan,

; Wang,

P. S.

; Tong,

; Liu,

Deep implicit moving least-squares functions for 3D reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1788–1797, 2021.

Crossref

[15]

Chibane,

; Mir,

; Pons-Moll,

Neural unsigned distance fields for implicit function learning. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, Article No. 1816, 21638–21652, 2020.

[16]

Mescheder,

; Oechsle,

; Niemeyer,

; Nowozin,

; Geiger,

Occupancy networks: Learning 3D reconstruction in function space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4455–4465, 2019.

Crossref

[17]

Chen,

Z. Q.

; Zhang,

Learning implicit fields for generative shape modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5932–5941, 2019.

Crossref

[18]

Xiao,

Y. P.

; Lai,

Y. K.

; Zhang,

F. L.

; Li,

C. P.

; Gao,

A survey on deep geometry learning: From a representation perspective. Computational Visual Media Vol. 6, No. 2, 113–133, 2020.

Crossref Google Scholar

[19]

Peng,

S. Y.

; Niemeyer,

; Mescheder,

; Pollefeys,

; Geiger,

Convolutional occupancy networks. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12348. Vedaldi,

; Bischof,

; Brox,

; Frahm,

J. M.

Eds. Springer Cham, 523–540, 2020.

Crossref

[20]

Jiang,

C. Y.

; Sud,

; Makadia,

; Huang,

J. W.

; NieBner,

; Funkhouser,

Local implicit grid representations for 3D scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6000–6009, 2020.

Crossref

[21]

Chabra,

; Lenssen,

J. E.

; Ilg,

; Schmidt,

; Straub,

; Lovegrove,

; Newcombe,

Deep local shapes: Learning local SDF priors for detailed 3D reconstruction. In: Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science, Vol. 12374. Vedaldi,

; Bischof,

; Brox,

; Frahm,

J. M.

Eds. Springer Cham, 608–625, 2020.

Crossref

[22]

Genova,

; Cole,

; Sud,

; Sarna,

; Funkhouser,

Local deep implicit functions for 3D shape. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4856–4865, 2020.

Crossref

[23]

Liang,

; Sun,

; Vijaykumar,

CoordX: Accelerating implicit neural representation with a split MLP architecture. arXiv preprint arXiv:2201.12425, 2022.

Google Scholar

[24]

Chan,

E. R.

; Lin,

C. Z.

; Chan,

M. A.

; Nagano,

; Pan,

B. X.

; de Mello,

; Gallo,

; Guibas,

; Tremblay,

; Khamis,

; et al. Efficient geometry-aware 3D generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16102–16112, 2022.

Crossref

[25]

Martel,

J. N. P.

; Lindell,

D. B.

; Lin,

C. Z.

; Chan,

E. R.

; Monteiro,

; Wetzstein,

Acorn: Adaptive coordinate networks for neural scene representation. ACM Transactions on Graphics Vol. 40, No. 4, Article No. 58, 2021.

Crossref Google Scholar

[26]

Takikawa,

; Litalien,

; Yin,

K. X.

; Kreis,

; Loop,

; Nowrouzezahrai,

; Jacobson,

; McGuire,

; Fidler,

Neural geometric level of detail: Real-time rendering with implicit 3D shapes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11353–11362, 2021.

Crossref

[27]

Nießner,

; Zollhöfer,

; Izadi,

; Stamminger,

Real-time 3D reconstruction at scale using voxel hashing. ACM Transactions on Graphics Vol. 32, No. 6, Article No. 169, 2013.

Crossref Google Scholar

[28]

Klingensmith,

; Dryanovski,

; Srinivasa,

; Xiao,

J. Z.

Chisel: Real time large scale 3D reconstruction onboard a mobile device using spatially hashed signed distance fields. In: Proceedings of the Robotics: Science and Systems, 2015.

Crossref

[29]

Gao,

; Zhong,

C. L.

; Xiang,

; Hong,

; Guo,

Y. D.

; Zhang,

J. Y.

Reconstructing personalized semantic facial NeRF models from monocular video. ACM Transactions on Graphics Vol. 41, No. 6, Article No. 200, 2022.

Crossref Google Scholar

[30]

Carr,

J. C.

; Beatson,

R. K.

; Cherrie,

J. B.

; Mitchell,

T. J.

; Fright,

W. R.

; McCallum,

B. C.

; Evans,

T. R.

Reconstruction and representation of 3D objects with radial basis functions. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, 67–76, 2001.

Crossref

[31]

Walder,

; Schölkopf,

; Chapelle,

Implicit surfaces with globally regularised and compactly supported basis functions. In: Proceedings of the 19th International Conference on Neural Information Processing System, 273–280, 2006.

Crossref

[32]

Kazhdan,

; Bolitho,

; Hoppe,

Poisson surface reconstruction. In: Proceedings of the 4th Eurographics Symposium on Geometry Processing, 61–70, 2006.

[33]

Berger,

; Tagliasacchi,

; Seversky,

L. M.

; Alliez,

; Guennebaud,

; Levine,

J. A.

; Sharf,

; Silva,

C. T.

A survey of surface reconstruction from point clouds. Computer Graphics Forum Vol. 36, No. 1, 301–329, 2017.

Crossref Google Scholar

[34]

Erler,

; Guerrero,

; Ohrhallinger,

; Mitra,

N. J.

; Wimmer,

Points2Surf learning implicit surfaces from point clouds. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12350. Vedaldi,

; Bischof,

; Brox,

; Frahm,

J. M.

Eds. Springer Cham, 108–124, 2020.

Crossref

[35]

Atzmon,

; Lipman,

SALD: Sign agnostic learning with derivatives. In: Proceedings of the 9th International Conference on Learning Representations, 2021.

[36]

Ma.

; Han,

; Liu,

Y. S.

; Zwicker,

Neural-pull: Learning signed distance functions from point clouds by learning to pull space onto surfaces. In: Proceedings of the 38th International Conference on Machine Learning, 7246–7257, 2021.

[37]

Chen,

Z. Q.

; Tagliasacchi,

; Funkhouser,

; Zhang,

Neural dual contouring. ACM Transactions on Graphics Vol. 41, No. 4, Article No. 104, 2022.

Crossref Google Scholar

[38]

Furukawa,

; Ponce,

Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 32, No. 8, 1362–1376, 2010.

Crossref Google Scholar

[39]

Langguth,

; Sunkavalli,

; Hadap,

; Goesele,

Shading-aware multi-view stereo. In: Computer Vision – ECCV 2016. Lecture Notes in Computer Science, Vol. 9907. Leibe,

; Matas,

; Sebe,

; Welling,

Eds. Springer Cham, 469–485, 2016.

Crossref

[40]

Schönberger,

J. L.

; Zheng,

E. L.

; Frahm,

J. M.

; Pollefeys,

Pixelwise view selection for unstructured multi-view stereo. In: Computer Vision – ECCV 2016. Lecture Notes in Computer Science, Vol. 9907. Leibe,

; Matas,

; Sebe,

; Welling,

Eds. Springer Cham, 501–518, 2016.

Crossref

[41]

Furukawa,

; Hernández,

Multi-view stereo: A tutorial. Foundations and Trends® in Computer Graphics and Vision Vol. 9, Nos. 1–2, 1–148, 2015.

Crossref Google Scholar

[42]

Kar,

; Häne,

; Malik,

Learning a multi-view stereo machine. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 364–375, 2017.

[43]

Wang,

; Galliani,

; Vogel,

; Speciale,

; Pollefeys,

PatchmatchNet: Learned multi-view patchmatch stereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14189–14198, 2021.

Crossref

[44]

Chen,

; Han,

S. F.

; Xu,

; Su,

Point-based multi-view stereo network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 1538–1547, 2019.

Crossref

[45]

Yang,

; Mao,

; Alvarez,

J. M.

; Liu,

Cost volume pyramid based depth inference for multi-view stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 44, No. 9, 4748–4760, 2022.

Google Scholar

[46]

Yao,

; Luo,

Z. X.

; Li,

S. W.

; Shen,

T. W.

; Fang,

; Quan,

Recurrent MVSNet for high-resolutionmulti-view stereo depth inference. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5520–5529, 2019.

Crossref

[47]

Peng,

; Wang,

R. J.

; Wang,

Z. Y.

; Lai,

Y. W.

; Wang,

R. G.

Rethinking depth estimation for multi-view stereo: A unified representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8635–8644, 2022.

Crossref

[48]

Xu,

H. F.

; Zhang,

J. Y.

AANet: Adaptive aggregation network for efficient stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1956–1965, 2020.

Crossref

[49]

Yang,

Z. P.

; Ren,

Z. L.

; Shan,

; Huang,

Q. X.

MVS2D: Efficient multiview stereo via attention-driven 2D convolutions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8564–8574, 2022.

Crossref

[50]

Cheng,

; Zhong,

; Harandi,

; Dai,

; Chang,

; Li,

; Drummond,

; Ge,

Hierarchical neural architecture search for deep stereo matching. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, Article No. 1858, 22158–22169, 2020.

[51]

Wang,

; Galliani,

; Vogel,

; Speciale,

; Pollefeys,

PatchmatchNet: Learned multi-view patchmatch stereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14189–14198, 2021.

Crossref

[52]

Niemeyer,

; Mescheder,

; Oechsle,

; Geiger,

Differentiable volumetric rendering: Learning implicit 3D representations without 3D supervision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3501–3512, 2020.

Crossref

[53]

Wang,

X. Y.

; Guo,

Y. D.

; Yang,

Z. Q.

; Zhang,

J. Y.

Prior-guided multi-view 3D head reconstruction. IEEE Transactions on Multimedia Vol. 24, 4028–4040, 2022.

Crossref Google Scholar

[54]

Yariv,

; Gu,

; Kasten,

; Lipman,

Volume rendering of neural implicit surfaces. In: Proceedings of the 35th Conference on Neural Information Processing Systems, 4805–4815, 2021.

[55]

Oechsle,

; Peng,

S. Y.

; Geiger,

UNISURF: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 5569–5579, 2021.

Crossref

[56]

Wei,

; Liu,

S. H.

; Rao,

Y. M.

; Zhao,

; Lu,

J. W.

; Zhou,

NerfingMVS: Guided optimization of neural radiance fields for indoor multi-view stereo. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 5590–5599, 2021.

Crossref

[57]

Sun,

J. M.

; Xie,

Y. M.

; Chen,

L. H.

; Zhou,

X. W.

; Bao,

H. J.

NeuralRecon: Real-time coherent 3D reconstruction from monocular video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 15593–15602, 2021.

Crossref

[58]

Zhang,

J. Y.

; Yao,

; Quan,

Learning signed distance field for multi-view surface reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 6505–6514, 2021.

Crossref

[59]

Huang,

J. H.

; Huang,

S. S.

; Song,

H. X.

; Hu,

S. M.

DI-fusion: Online implicit 3D reconstruction with deep priors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8928–8937, 2021.

Crossref

[60]

Jensen,

; Dahl,

; Vogiatzis,

; Tola,

; Aanæs,

Large scale multi-view stereopsis evaluation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 406–413, 2014.

Crossref

[61]

Yao,

; Luo,

Z. X.

; Li,

S. W.

; Zhang,

J. Y.

; Ren,

Y. F.

; Zhou,

; Fang,

; Quan,

BlendedMVS: A large-scale dataset for generalized multi-view stereo networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1787–1796, 2020.

Crossref

[62]

Yang,

H. T.

; Zhu,

; Wang,

Y. R.

; Huang,

M. K.

; Shen,

; Yang,

R. G.

; Cao,

FaceScape: A large-scale high quality 3D face dataset and detailed riggable 3D face prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 598–607, 2020.

Crossref

[63]

Wardetzky,

; Mathur,

; Kälberer,

; Grinspun,

Discrete Laplace operators: No free lunch. In: Proceedings of the 5th Eurographics Symposium on Geometry Processing, 33–37, 2007.

[64]

Paszke,

; Gross,

; Massa,

; Lerer,

; Bradbury,

; Chanan,

; Killeen,

; Lin,

; Gimelshein,

; Antiga,

; et al. PyTorch: An imperative style, high-performance deep learning library. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, Article No. 721, 8026–8037, 2019.

[65]

Tiwary,

; Klinghoffer,

; Raskar,

Towards learning neural representations from shadows. In: Computer Vision – ECCV 2022. Lecture Notes in Computer Science, Vol. 13693. Avidan,

; Brostow,

; Cissé,

; Farinella,

G. M.

; Hassner,

Eds. Springer Cham, 300–316, 2022.

Crossref

[66]

Suhail,

; Esteves,

; Sigal,

; Makadia,

Light field neural rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8259–8269, 2022.

Crossref

[67]

Cai,

; Feng,

; Wang,

; Zhang,

Neural surface reconstruction of dynamic scenes with monocular RGB-D camera. In: Proceedings of the 36th Conference on Neural Information Processing Systems, 967–981, 2022.

[68]

Jiang,

B. Y.

; Hong,

; Bao,

H. J.

; Zhang,

J. Y.

SelfRecon: Self reconstruction your digital avatar from monocular video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5595–5605, 2022.

Crossref

[69]

Deng,

; Liu,

; Pan,

; Jabi,

; Zhang,

J. Y.

; Deng,

B. L.

Sketch2PQ: Freeform planar quadrilateral mesh design via a single sketch. IEEE Transactions on Visualization and Computer Graphics Vol. 29, No. 9, 3826–3839, 2023.

Crossref Google Scholar

Computational Visual Media

Volume 10 Issue 3,
June 2024

Pages 453-470

DOI: 10.1007/s41095-023-0340-x

Cite this article:

Deng Z, Xiao H, Lang Y, et al. Multi-scale hash encoding based neural geometry representation. Computational Visual Media, 2024, 10(3): 453-470. https://doi.org/10.1007/s41095-023-0340-x