A Survey of Image Synthesis and Editing with Generative Adversarial Networks

Xian Wu; Kun Xu; Peter Hall

doi:10.23919/TST.2017.8195348

| Sign up

PDF (3.7 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Figures (6)

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Tables (1)

Table 1

Open Access

A Survey of Image Synthesis and Editing with Generative Adversarial Networks

Xian Wu, Kun Xu(), Peter Hall

TNList and the Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.

Department of Computer Science, University of Bath, Bath, UK.

Show Author Information

Abstract

This paper presents a survey of image synthesis and editing with Generative Adversarial Networks (GANs). GANs consist of two deep networks, a generator and a discriminator, which are trained in a competitive way. Due to the power of deep networks and the competitive training manner, GANs are capable of producing reasonable and realistic images, and have shown great capability in many image synthesis and editing applications. This paper surveys recent GAN papers regarding topics including, but not limited to, texture synthesis, image inpainting, image-to-image translation, and image editing.

Keywords

image synthesis image editing constrained image synthesis generative adversarial networks image-to-image translation

References

[1]

Efros

A. A.

and Leung

T. K.

, Texture synthesis by non-parametric sampling, in Proc. 7th IEEE Int. Conf. Computer Vision, Kerkyra, Greece, 1999, pp. 1033-1038.

Application	Input	Output	Characteristics	Loss function	Resolution	Code
Texture Synthesis
MGAN^[32]	image	texture	real-time	$L_{adv} + L_{f}$	arbitrary	T
PSGAN^[35]	noise tensor	texture	periodical texture	$L_{adv}$	arbitrary	Th
Image Super-Resolution
SRGAN^[39]	image	image	high upscaling factor	$L_{adv} + L_{f}$	arbitrary	TF
FCGAN^[40]	face	face		$L_{adv} + L_{1}$	$128 \times 128$
Image Inpainting
Context Encoder^[41]	image+holes	image		$L_{adv} + L_{2}$	$128 \times 128$	T+TF
Yang et al.^[44]	image+holes	image	high-quality but slow	$L_{2} + L_{t} + L_{TV}$ (update)	$512 \times 512$	T
Iizuka et al.^[45]	image+holes	image	arbitrary holes	$L_{adv} + L_{2}$	$256 \times 256$
Yeh et al.^[47]	image+holes	image	high missing rate	$L_{adv} + L_{1}$ (update)	$64 \times 64$	TF
Li et al.^[48]	face+holes	face	semantic regularization	$L_{adv} + L_{2} + L_{seg}$	$128 \times 128$	C
Face Aging
CAAE^[49]	face+age	face	smooth interpolation	$L_{adv} + L_{2} + L_{TV}$	$128 \times 128$	TF
Age-cGAN^[50]	face+age	face	identity preserved	$L_{ip}$ (update)
Face Frontalization
DR-GAN^[51]	face+pose	face	arbitrary rotation	$L_{adv}$	$96 \times 96$
FF-GAN^[52]	face	face	3DMM coefficients	$L_{adv} + L_{1} + L_{TV} + L_{ip} + L_{sym}$	$100 \times 100$
TP-GAN^[54]	face	face	two pathway	$L_{adv} + L_{1} + L_{TV} + L_{ip} + L_{sym}$	$128 \times 128$
Human Image Synthesis
VariGAN^[56]	human+view	human	coarse-to-fine	$L_{adv} + L_{1}$	$128 \times 128$
$P G^{2}$ ^[58]	human+pose	human	coarse-to-fine	$L_{adv} + L_{1}$	$256 \times 256$
Image-to-Image Translation
pix2pix^[59]	image	image	general framework	$L_{adv} + L_{1}$	$256 \times 256$	T+PT
cycleGAN^[60]	image	image	unpaired data	$L_{adv} + L_{cyc}$	$256 \times 256$	T+PT+TF
AIGN^[62]	image	image	unpaired data	$L_{adv} + L_{cyc}$	$128 \times 128$
Text-to-Image
GAN-INT-CLS^[63]	text	image		$L_{adv}$	$64 \times 64$	T
GAWWM^[64]	text+location	image	location-controllable	$L_{adv}$	$128 \times 128$	T
StackGAN^[65]	text	image	high-quality	$L_{adv}$	$256 \times 256$	TF+PT
TAC-GAN^[66]	text	image	diversity	$L_{adv}$	$128 \times 128$	TF
Sketch-to-Image
Scribbler^[69]	sketch(+color)	image	guided colorization	$L_{adv} + L_{2} + L_{f} + L_{TV}$	$128 \times 128$
TextureGAN^[70]	sketch+texture+color	image	texture-controllable	$L_{adv} + L_{2} + L_{f} + L_{t}$	$128 \times 128$
Magic Pencil^[71]	sketch	image	multi-class	$L_{adv} + L_{cls}$	$64 \times 64$
Auto-painter^[72]	sketch	cartoon		$L_{adv} + L_{1} + L_{f} + L_{TV}$	$512 \times 512$
Image Editing
iGAN^[80]	image+manipulation	image	interpolation sequence	$L_{data} + L_{smooth} + L_{adv}$ (update)	$64 \times 64$	Th
IAN^[81]	image+manipulation	image	fine reconstruction	$L_{adv} + L_{1} + L_{f} + D_{KL}$	$64 \times 64$	Th
Cao et al.^[82]	grayscale image	image	diversity	$L_{adv} + L_{1}$	$64 \times 64$	TF
GP-GAN^[83]	composited image	image	coarse-to-fine	$L_{adv} + L_{2}$		Ch
Video Generation
VGAN^[84]	noise vector	video	two streams	$L_{adv}$	$64 \times 64$	T
TGAN^[85]	noise vector	video	temporal generator	$L_{adv}$	$64 \times 64$	Ch
MoCoGAN^[86]	noise vector	video	unfixed-length	$L_{adv}$	$64 \times 64$	PT