Pyramid-VAE-GAN: Transferring hierarchical latent variables for image inpainting

Huiyuan Tian; Li Zhang; Shijian Li; Min Yao; Gang Pan

doi:10.1007/s41095-022-0331-3

| Sign up

PDF (9.8 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Research Article | Open Access

Pyramid-VAE-GAN: Transferring hierarchical latent variables for image inpainting

Huiyuan Tian^¹, Li Zhang^{¹^,²}(), Shijian Li^¹, Min Yao^¹, Gang Pan^¹

1College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China

2Advanced Technology Research Institute, Zhejiang University, Hangzhou 310027, China

Show Author Information

Graphical Abstract

View original image Download original image

Abstract

Significant progress has been made in image inpainting methods in recent years. However, they are incapable of producing inpainting results with reasonable structures, rich detail, and sharpness at the same time. In this paper, we propose the Pyramid-VAE-GAN network for image inpainting to address this limitation. Our network is built on a variational autoencoder (VAE) backbone that encodes high-level latent variables to represent complicated high-dimensional prior distributions of images. The prior assists in reconstructing reasonable structures when inpainting. We also adopt a pyramid structure in our model to maintain rich detail in low-level latent variables. To avoid the usual incompatibility of requiring both reasonable structures and rich detail, we propose a novel cross-layer latent variable transfer module. This transfers information about long-range structures contained in high-level latent variables to low-level latent variables representing more detailed information. We further use adversarial training to select the most reasonable results and to improve the sharpness of the images. Extensive experimental results on multiple datasets demonstrate the superiority of our method. Our code is available at https://github.com/thy960112/Pyramid-VAE-GAN.

Keywords

image inpainting variational autoencoder (VAE)latent variable transfer (LTN);pyramid structure generative model

References

[1]

Bertalmio,

; Sapiro,

; Caselles,

; Ballester,

Image inpainting. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, 417–424, 2000.

Crossref

[2]

Wang,

; Zhang,

Y. P.

; Zhang,

L. F.

Dynamic selection network for image inpainting. IEEE Transactions on Image Processing Vol. 30, 1784–1798, 2021.

Crossref Google Scholar

[3]

Li,

J. Y.

; Wang,

; Zhang,

L. F.

; Du,

; Tao,

D. C.

Recurrent feature reasoning for image inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7757–7765, 2020.

Crossref

[4]

Wan,

Z. Y.

; Zhang,

J. B.

; Chen,

D. D.

; Liao,

High-fidelity pluralistic image completion with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 4672–4681, 2021.

Crossref

[5]

Lu,

; Niu,

S. Z.

A detection approach using LSTM-CNN for object removal caused by exemplar-based image inpainting. Electronics Vol. 9, No. 5, 858, 2020.

Crossref Google Scholar

[6]

Shetty,

; Fritz,

; Schiele,

Adversarial scene editing: Automatic object removal from weak supervision. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, 7717–7727, 2018.

[7]

Barnes,

; Shechtman,

; Finkelstein,

; Goldman,

D. B.

PatchMatch: A randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics Vol. 28, No. 3, Article No. 24, 2009.

Crossref Google Scholar

[8]

Pathak,

; Krähenbühl,

; Donahue,

; Darrell,

; Efros,

A. A.

Context encoders: Feature learning by inpainting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2536–2544, 2016.

Crossref

[9]

Yu,

J. H.

; Lin,

; Yang,

J. M.

; Shen,

X. H.

; Lu,

; Huang,

T. S.

Generative image inpainting with contextual attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5505–5514, 2018.

Crossref

[10]

Wu,

; Xu,

; Hall,

A survey of image synthesis and editing with generative adversarial networks. Tsinghua Science and Technology Vol. 22, No. 6, 660–674, 2017.

Crossref Google Scholar

[11]

Xue,

; Guo,

Y. C.

; Zhang,

; Xu,

; Zhang,

S. H.

; Huang,

X. L.

Deep image synthesis from intuitive user input: A review and perspectives. Computational Visual Media Vol. 8, No. 1, 3–31, 2022.

Crossref Google Scholar

[12]

Zeng,

X. X.

; Wu,

Z. L.

; Peng,

X. J.

; Qiao,

Joint 3D facial shape reconstruction and texture completion from a single image. Computational Visual Media Vol. 8, No. 2, 239–256, 2022.

Crossref Google Scholar

[13]

Wu,

; Li,

R. L.

; Zhang,

F. L.

; Liu,

J. C.

; Wang,

; Shamir,

; Hu,

S. M.

Deep portrait image completion and extrapolation. IEEE Transactions on Image Processing Vol. 29, 2344–2355, 2020.

Crossref Google Scholar

[14]

Liu,

H. Y.

; Wan,

Z. Y.

; Huang,

; Song,

Y. B.

; Han,

X. T.

; Liao,

PD-GAN: Probabilistic diverse GAN for image inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9367–9376, 2021.

Crossref

[15]

Chen,

Y. T.

; Zhang,

H. P.

; Liu,

L. W.

; Chen,

; Zhang,

; Yang,

; Xia,

R. L.

; Xie,

J. B.

Research on image inpainting algorithm of improved GAN based on two-discriminations networks. Applied Intelligence Vol. 51, No. 6, 3460–3474, 2021.

Crossref Google Scholar

[16]

Goodfellow,

; Pouget-Abadie,

; Mirza,

; Xu,

; Warde-Farley,

; Ozair,

; Courville,

; Bengio,

Generative adversarial networks. Communications of the ACM Vol. 63, No. 11, 139–144, 2020.

Crossref Google Scholar

[17]

Zeng,

Y. H.

; Fu,

J. L.

; Chao,

H. Y.

; Guo,

B. N.

Learning pyramid-context encoder network for high-quality image inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1486–1494, 2019.

Crossref

[18]

Kingma,

D. P.

; Welling,

Auto-encoding variational bayes. In: Proceedings of the International Conference on Learning Representations, 2014.

[19]

Karras,

; Aila,

T. M.

; Laine,

; Lehtinen,

Progressive growing of GANs for improved quality, stability, and variation. In: Proceedings of the International Conference on Learning Representations, 2018.

[20]

Krause,

; Stark,

; Jia,

; Li,

F. F.

3D object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, 554–561, 2013.

Crossref

[21]

Cimpoi,

; Maji,

; Kokkinos,

; Mohamed,

; Vedaldi,

Describing textures in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3606–3613, 2014.

Crossref

[22]

Tyleček,

; Šára,

Spatial pattern templates for recognition of objects with regular structure. In: Pattern Recognition. GCPR 2013. Lecture Notes in Computer Science, Vol. 8142. Weickert,

; Hein,

; Schiele,

Eds. Springer Berlin Heidelberg, 364–374, 2013.

Crossref

[23]

Barnes,

; Zhang,

F. L.

A survey of the state-of-the-art in patch-based synthesis. Computational Visual Media Vol. 3, No. 1, 3–20, 2017.

Crossref Google Scholar

[24]

Fukushima,

; Miyake,

Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition. In: Competition and Cooperation in Neural Nets. Lecture Notes in Biomathematics, Vol. 45. Amari,

; Arbib,

M. A.

Eds. Springer Berlin Heidelberg, 267–285, 1982.

Crossref

[25]

LeCun,

; Boser,

; Denker,

J. S.

; Henderson,

; Howard,

R. E.

; Hubbard,

; Jackel,

L. D.

Backpropagation applied to handwritten zip code recognition. Neural Computation Vol. 1, No. 4, 541–551, 1989.

Crossref Google Scholar

[26]

Peng,

J. L.

; Liu,

; Xu,

S. C.

; Li,

H. Q.

Generating diverse structure for image inpainting with hierarchical VQ-VAE. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10770–10779, 2021.

Crossref

[27]

Vahdat,

; Kautz,

NVAE: A deep hierarchical variational autoencoder. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, Article No. 1650, 19667–19679, 2020.

[28]

Ramesh,

; Pavlov,

; Goh,

; Gray,

; Voss,

; Radford,

; Chen,

; Sutskever,

Zero-shot text-to-image generation. In: Proceedings of the 38th International Conference on Machine Learning, Vol. 139, 8821–8831, 2021.

[29]

Bowman,

S. R.

; Vilnis,

; Vinyals,

; Dai,

A. M.

; Jozefowicz,

; Bengio,

Generating sentences from a continuous space. arXiv preprint arXiv:1511.06349.2015.

Google Scholar

[30]

Frazer,

; Notin,

; Dias,

; Gomez,

; Min,

J. K.

; Brock,

; Gal,

; Marks,

D. S.

Disease variant prediction with deep generative models of evolutionary data. Nature Vol. 599, No. 7883, 91–95, 2021.

Crossref Google Scholar

[31]

Salimans,

; Kingma,

D. P.

; Welling,

Markov Chain Monte Carlo and variational inference: Bridging the gap. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, Vol. 37, 1218–1226, 2015.

[32]

Rezende,

D. J.

; Mohamed,

; Wierstra,

Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of the 31st International Conference on International Conference on Machine Learning, Vol. 32, II-1278–II-1286, 2014.

[33]

Kulkarni,

T. D.

; Whitney,

W. F.

; Kohli,

; Tenenbaum,

J. B.

Deep convolutional inverse graphics network. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, 2539–2547, 2015.

[34]

Sun,

R. Q.

; Huang,

; Zhu,

H. L.

; Ma,

L. Z.

Mask-aware photorealistic facial attribute manipulation. Computational Visual Media Vol. 7, No. 3, 363–374, 2021.

Crossref Google Scholar

[35]

Walker,

; Doersch,

; Gupta,

; Hebert,

An uncertain future: Forecasting from static images using variational autoencoders. In: Computer Vision – ECCV 2016. Lecture Notes in Computer Science, Vol. 9911. Leibe,

; Matas,

; Sebe,

; Welling,

Eds. Springer Cham, 835–851, 2016.

Crossref

[36]

Sohn,

; Yan,

X. C.

; Lee,

Learning structured output representation using deep conditional generative models. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, Vol. 2, 3483–3491, 2015.

[37]

Gao,

; Hou,

X. S.

; Qin,

; Chen,

J. X.

; Liu,

; Zhu,

; Zhang,

; Shao,

Zero-VAE-GAN: Generating unseen features for generalized and transductive zero-shot learning. IEEE Transactions on Image Processing Vol. 29, 3665–3680, 2020.

Crossref Google Scholar

[38]

Zheng,

C. X.

; Cham,

T. J.

; Cai,

J. F.

Pluralistic image completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1438–1447, 2019.

Crossref

[39]

Gonzalez,

R. C.

; Woods,

R. E.

Digital Image Processing, 4th edn. Pearson, 2018.

[40]

Lim,

J. H.

; Ye,

J. C.

Geometric GAN. arXiv preprint arXiv:1705.02894, 2017.

Google Scholar

[41]

Fu,

M. C.

Stochastic gradient estimation. In: Handbook of Simulation Optimization. International Series in Operations Research & Management Science, Vol. 216. Fu,

Ed. Springer New York, 105–147, 2015.

Crossref

[42]

Devroye,

Sample-based non-uniform random variate generation. In: Proceedings of the 18th Conference on Winter Simulation, 260–265, 1986.

Crossref

[43]

Doersch,

Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908, 2016.

Google Scholar

[44]

Iizuka,

; Simo-Serra,

; Ishikawa,

Globally and locally consistent image completion. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 107, 2017.

Crossref Google Scholar

[45]

Liu,

G. L.

; Reda,

F. A.

; Shih,

K. J.

; Wang,

T. C.

; Tao,

; Catanzaro,

Image inpainting for irregular holes using partial convolutions. In: Computer Vision –ECCV 2018. Lecture Notes in Computer Science, Vol. 11215. Ferrari,

; Hebert,

; Sminchisescu,

; Weiss,

Eds. Springer Cham, 89–105, 2018.

Crossref

[46]

Yu,

J. H.

; Lin,

; Yang,

J. M.

; Shen,

X. H.

; Lu,

; Huang,

Free-form image inpainting with gated convolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 4470–4479, 2019.

Crossref

[47]

Cortes,

; Vapnik,

Support-vector networks. Machine Learning Vol. 20, No. 3, 273–297, 1995.

Crossref Google Scholar

[48]

Wang,

; Simoncelli,

E. P.

; Bovik,

A. C.

Multiscale structural similarity for image quality assessment. In: Proceedings of the 37th Asilomar Conference on Signals, Systems & Computers, 1398–1402, 2003.

[49]

Szeliski,

Computer Vision: Algorithms and Applications. Springer London, 2011.

Crossref

[50]

Heusel,

; Ramsauer,

; Unterthiner,

; Nessler,

; Hochreiter,

GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 6629–6640, 2017.

Computational Visual Media

Volume 9 Issue 4,
December 2023

Pages 827-841

DOI: 10.1007/s41095-022-0331-3

Cite this article:

Tian H, Zhang L, Li S, et al. Pyramid-VAE-GAN: Transferring hierarchical latent variables for image inpainting. Computational Visual Media, 2023, 9(4): 827-841. https://doi.org/10.1007/s41095-022-0331-3