TNList and the Department of Computer Science and Technology, Tsinghua University, Beijing100084, China.
Department of Computer Science, University of Bath, Bath, UK.
Show Author Information
Hide Author Information
Abstract
This paper presents a survey of image synthesis and editing with Generative Adversarial Networks (GANs). GANs consist of two deep networks, a generator and a discriminator, which are trained in a competitive way. Due to the power of deep networks and the competitive training manner, GANs are capable of producing reasonable and realistic images, and have shown great capability in many image synthesis and editing applications. This paper surveys recent GAN papers regarding topics including, but not limited to, texture synthesis, image inpainting, image-to-image translation, and image editing.
No abstract is available for this article. Click the button above to view the PDF directly.
References
[1]
EfrosA. A. and LeungT. K., Texture synthesis by non-parametric sampling, in Proc. 7th IEEE Int. Conf. Computer Vision, Kerkyra, Greece, 1999, pp. 1033-1038.
KwatraV., SchödlA., EssaI., TurkG., and BobickA., Graphcut textures: Image and video synthesis using graph cuts, in Proc. ACM SIGGRAPH 2003 Papers, San Diego, CA, USA, 2003, pp. 277-286.
CriminisiA., PerezP., and ToyamaK., Object removal by exemplar-based inpainting, in Proc. 2003 IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), Madison, WI, USA, 2003.
HertzmannA., JacobsC. E., OliverN., CurlessB., and SalesinD. H., Image analogies, in Proc. 28th Annu. Conf. Computer Graphics and Interactive Techniques, Los Angeles, CA, USA, 2001, pp. 327-340.
BarnesC., ZhangF. L., LouL. M., WuX., and HuS. M., PatchTable: Efficient patch queries for large datasets and applications, ACM Trans. Graph., vol. 34, no. 4, p. 97, 2015.
KrizhevskyA., SutskeverI., and HintonG. E., ImageNet classification with deep convolutional neural networks, in Proc. 25th Int. Conf. Neural Information Processing Systems, Lake Tahoe, NV, USA, 2012, pp. 1097-1105.
[15]
SimonyanK. and ZissermanA., Very deep convolutional networks for large-scale image recognition, in Int. Conf. Learning Representations (ICLR), San Diego, CA, USA, 2015.
[16]
HeK. M., ZhangX. Y., RenS. Q., and SunJ., Deep residual learning for image recognition, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770-778.
[17]
RenS. Q., HeK. M., GirshickR., and SunJ., Faster R-CNN: Towards real-time object detection with region proposal networks, in Proc. 28th Int. Conf. Neural Information Processing Systems 28, Montreal, Canada, 2015, pp. 91-99.
[18]
RedmonJ., DivvalaS., GirshickR., and FarhadiA., You only look once: Unified, real-time object detection, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 779-788.
LongJ., ShelhamerE., and DarrellT., Fully convolutional networks for semantic segmentation, in Proc. 2015 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 3431-3440.
RonnebergerO., FischerP., and BroxT., U-Net: Convolutional networks for biomedical image segmentation, in Proc. 18th Int. Conf. Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 2015, pp. 234-241.
GoodfellowI. J., Pouget-AbadieJ., MirzaM., XuB., Warde-FarleyD., OzairS., CourvilleA., and BengioY., Generative adversarial nets, in Proc. 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 2672-2680.
[22]
KingmaD. P. and WellingM., Auto-encoding variational Bayes, in Proc. 2nd Int. Conf. Learning Representations (ICLR), Ithaca, NY, USA, 2014.
[23]
RadfordA., MetzL., and ChintalaS., Unsupervised representation learning with deep convolutional generative adversarial networks, in Int. Conf. Learning Representations (ICLR), San Juan, Puerto Rico, 2016.
[24]
ZhaoJ. B., MathieuM., and LeCunY., Energy-based generative adversarial network, in Proc. 5th Int. Conf. Learning Representations (ICLR), Palais des Congrès Neptune, Toulon, France, 2017.
[25]
MaoX. D., LiQ., XieH. R., LauR. Y. K., WangZ., and SmolleyS. P., Least squares generative adversarial networks, The IEEE International Conf. Computer Vision (ICCV), Venice, Italy, 2017.
DentonE. L., ChintalaS., SzlamA., and FergusR., Deep generative image models using a laplacian pyramid of adversarial networks, in Proc. 28th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2015, pp. 1486-1494.
[30]
CreswellA., WhiteT., DumoulinV., ArulkumaranK., SenguptaB., and BharathA. A., Generative adversarial networks: An overview, arXiv preprint arXiv: 1710.07035, 2017.
GatysL. A., EckerA. S., and BethgeM., Texture synthesis using convolutional neural networks, in Proc. 28th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2015, pp. 262-270.
LiC. and WandM., Precomputed real-time texture synthesis with Markovian generative adversarial networks, in Proc. 14th European Conf. Computer Vision (ECCV), Amsterdam, The Netherlands, 2016, pp.702-716.
GatysL. A., EckerA. S., and BethgeM., Image style transfer using convolutional neural networks, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 2414-2423.
JetchevN., BergmannU., and VollgrafR., Texture synthesis with spatial generative adversarial networks, arXiv preprint arXiv: 1611.08207, 2016.
[35]
BergmannU., JetchevN., and VollgrafR., Learning texture manifolds with the periodic spatial GAN, in Proc. 34th Int Conf. Machine Learning, Sydney, Australia, 2017.
[36]
DongC., LoyC. C., HeK. M., and TangX. O., Learning a deep convolutional network for image super-resolution, in Proc 13th European Conf. Computer Vision (ECCV), Zurich, Switzerland, 2014.
KimJ., LeeJ. K., and LeeK. M., Deeply-recursive convolutional network for image super-resolution, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 1637-1645.
ShiW. Z., CaballeroJ., HuszarF., TotzJ., AitkenA. P., BishopR., RueckertD., and WangZ. H., Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 1874-1883.
LedigC., TheisL., HuszárF., CaballeroJ., CunninghamA., AcostaA., AitkenA., TejaniA., TotzJ., WangZ. H., and ShiW. Z., Photo-realistic single image super-resolution using a generative adversarial network, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017.
HuangB., ChenW. H., WuX. M., and LinC. L., High-quality face image SR using conditional generative adversarial networks, arXiv preprint arXiv: 1707.00737, 2017.
[41]
PathakD., KrähenbühlP., DonahueJ., DarrellT., and EfrosA. A., Context encoders: Feature learning by inpainting, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016.
YangC., LuX., LinZ., ShechtmanE., WangO., and LiH., High-resolution image inpainting using multi-scale neural patch synthesis, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017.
YuF. and KoltunV., Multi-scale context aggregation by dilated convolutions, in Int. Conf. Learning Representations (ICLR), San Juan, Puerto Rico, 2016.
[47]
YehR. A., ChenC., LimT. Y., SchwingA. G., Hasegawa-JohnsonM., and DoM. N., Semantic image inpainting with deep generative models, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017.
LiY. J., LiuS. F., YangJ. M., and YangM. H., Generative face completion, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017.
[49]
ZhangZ. F., SongY., and QiH. R., Age progression/regression by conditional adversarial autoencoder, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017.
TranL., YinX., and LiuX. M., Disentangled representation learning GAN for pose-invariant face recognition, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017.
YinX., YuX., SohnK., LiuX. M., and ChandrakerM., Towards large-pose face frontalization in the wild, in Proc. IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017.
BlanzV. and VetterT., A morphable model for the synthesis of 3D faces, in Proc. 26th Annu. Conf. Computer Graphics and Interactive Techniques, Los Angeles, CA, USA, 1999, pp. 187-194.
HuangR., ZhangS., LiT. Y., and HeR., Beyond face rotation: Global and local perception GAN for photorealistic and identity preserving frontal view synthesis, in Proc. IEEE Int. Conf. Computer Vision (ICCV), Honolulu, HI, USA, 2017.
ZhengZ. D., ZhengL., and YangY., Unlabeled samples generated by GAN improve the person re-identification baseline in vitro, in Proc. IEEE Int. Conf. Computer Vision (ICCV), Honolulu, HI, USA, 2017.
SohnK., YanX. C., and LeeH., Learning structured output representation using deep conditional generative models, in Proc. 28th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2015, pp. 3483-3491.
[58]
MaL. Q., SunQ. R., JiaX., SchieleB., TuytelaarsT., and Van GoolL., Pose guided person image generation, arXiv preprint arXiv: 1705.09368, 2017.
[59]
IsolaP., ZhuJ. Y., ZhouT. H., and EfrosA. A., Image-to-image translation with conditional adversarial networks, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017.
ZhuJ. Y., ParkT., IsolaP., and EfrosA. A., Unpaired image-to-image translation using cycle-consistent adversarial networks, in The IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017.
JohnsonJ., AlahiA., and LiF. F., Perceptual losses for real-time style transfer and super-resolution, in Proc. 14th European Conf. Computer Vision (ECCV), Amsterdam, The Netherlands, 2016.
TungH. Y. F., HarleyA. W., SetoW., and FragkiadakiK., Adversarial inverse graphics networks: Learning 2D-to-3D lifting and image-to-image translation from unpaired supervision, in The IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017.
ReedS., AkataZ., YanX. C., LogeswaranL., SchieleB., and LeeH., Generative adversarial text to image synthesis, in Proc. 33rd Int. Conf. Machine Learning, New York, NY, USA, 2016, pp. 1060-1069.
[64]
ReedS., AkataZ., MohanS., TenkaS., SchieleB., and LeeH., Learning what and where to draw, in Proc. 29th Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 217-225.
[65]
ZhangH., XuT., LiH. S., ZhangS. T., WangX. G., HuangX. L., and MetaxasD., StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks, in The IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017.
SangkloyP., LuJ. W., FangC., YuF., and HaysJ., Scribbler: Controlling deep image synthesis with sketch and color, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017.
ZhangH. and CaoX. C., Magic pencil: Generalized sketch inversion via generative adversarial nets, in Proc. SIGGRAPH ASIA 2016 Posters, Macau, China, 2016.
AlexaM., Cohen-OrD., and LevinD., As-rigid-as-possible shape interpolation, in Proc. 27th Annu. Conf. Computer Graphics and Interactive Techniques, New Orleans, LA, USA, 2000, pp. 157-164.
FarbmanZ., HofferG., LipmanY., Cohen-OrD., and LischinskiD., Coordinates for instant image cloning, in Proc. ACM SIGGRAPH 2009 Papers, New Orleans, LA, USA, 2009.
ZhuJ. Y., KrähenbühlP., ShechtmanE., and EfrosA. A., Generative visual manipulation on the natural image manifold, in Proc. 14th European Conf. Computer Vision (ECCV), Amsterdam, The Netherlands, 2016.
BrockA., LimT., RitchieJ. M., and WestonN., Neural photo editing with introspective adversarial networks, in Int. Conf. Learning Representations (ICLR), Palais des Congrès Neptune, Toulon, France, 2017.
[82]
CaoY., ZhouZ. M., ZhangW. N., and YuY., Unsupervised diverse colorization via generative adversarial networks, arXiv preprint arXiv: 1702.06674, 2017.
WuH. K., ZhengS., ZhangJ. G., and HuangK. Q., GP-GAN: Towards realistic high-resolution image blending, arXiv preprint arXiv: 1703.07195, 2017.
[84]
VondrickC., PirsiavashH., and TorralbaA., Generating videos with scene dynamics, in Proc. 29th Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 613-621.
[85]
SaitoM., MatsumotoE., and SaitoS., Temporal generative adversarial nets with singular value clipping, in The IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017.
MathieuM., CouprieC., and LeCunY., Deep multi-scale video prediction beyond mean square error, in Int. Conf. Learning Representations (ICLR), San Juan, Puerto Rico, 2016.
[88]
ZhouY. P. and BergT. L., Learning temporal transformations from time-lapse videos, in Proc. 14th European Conf. Computer Vision (ECCV), Amsterdam, The Netherlands, 2016.
VondrickC. and TorralbaA., Generating the future with adversarial transformers, in The IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 1020-1028.
LiangX. D., LeeL., DaiW., and XingE. P., Dual motion GAN for future-flow embedded video prediction, in The IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017.
Van Den OordA., KalchbrennerN., and KavukcuogluK., Pixel recurrent neural networks, in Proc. 33rd Int. Conf. Machine Learning, New York, NY, USA, 2016.
[93]
WuJ. J., ZhangC. K., XueT. F., FreemanW. T., and TenenbaumJ. B., Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling, in Proc. 30th Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 82-90.
[94]
SmithE. J. and MegerD., Improved adversarial systems for 3D object generation and reconstruction, in Proc. 1st Conf. Robot Learning, Mountain View, CA, USA, 2017, pp. 87-96.
[95]
WangW. Y., HuangQ. G., YouS. Y., YangC., and NeumannU., Shape inpainting using 3D generative adversarial network and recurrent convolutional networks, in The IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017, pp. 2298-2306.
YangB., WenH. K., WangS., ClarkR., MarkhamA., and TrigoniN., 3D object reconstruction from a single depth view with adversarial learning, in Int. Conf. Computer Vision Workshops (ICCVW), 2017, pp. 679-688.
Wu X, Xu K, Hall P. A Survey of Image Synthesis and Editing with Generative Adversarial Networks. Tsinghua Science and Technology, 2017, 22(6): 660-674. https://doi.org/10.23919/TST.2017.8195348
10.23919/TST.2017.8195348.T1Comparison between different GAN-based methods. Each row gives the information of a specific method. From left to right, we give the method name, its input format, its output format, its characteristic, the composition of its loss function, its maximal allowed image/video resolution, and the framework of its provided code. In the column of loss function, , , , , , , , , , , , denote adversarial loss, distance, distance, feature loss, texture loss, TV loss, segmentation loss, identity preserving loss, symmetry loss, cycle consistency loss, classification loss, and KL divergence, respectively. In the column of code, T, Th, TF, C, PT, Ch denote torch, theano, tensorflow, caffe, pytorch, chainer, respectively.