AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (1.7 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research Article | Open Access

Object removal from complex videos using a few annotations

LTCI, Télécom ParisTech, Université Paris-Saclay, 75013 Paris, France.
MAP5, CNRS & Université Paris Descartes, 75006 Paris, France.
Univ Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5208, Institut Camille Jordan, 69622 Villeurbanne, France.
Show Author Information

Abstract

We present a system for the removal of objects from videos. As input, the system only needs a user to draw a few strokes on the first frame, roughly delimiting the objects to be removed. To the best of our knowledge, this is the first system allowing the semi-automatic removal of objects from videos with complex backgrounds. The key steps of our system are the following: after initialization, segmentation masks are first refined and then automatically propagated through the video. Missing regions are then synthesized using video inpainting techniques. Our system can deal with multiple, possibly crossing objects, with complex motions, and with dynamic textures. This results in a computational tool that can alleviate tedious manual operations for editing high-quality videos.

References

[1]
X. Bai,; J. Wang,; D. Simons,; G. Sapiro, Video SnapCut: Robust video object cutout using localized classifiers. ACM Transactions on Graphics Vol. 28, No. 3, Article No. 70, 2009.
[2]
T. T. Le,; A. Almansa,; Y. Gousseau,; S. Masnou, Removing objects from videos with a few strokes. In: Proceedings of the SIGGRAPH Asia Technical Briefs, Article No. 22, 2018.
[3]
S. Wang,; H. Lu,; F. Yang,; M.-H. Yang, Superpixel tracking. In: Proceedings of the IEEE International Conference on Computer Vision, 1323-1330, 2011.
[4]
E. Levinkov,; J. Tompkin,; N. Bonneel,; S. Kirchhoff; B. Andres,; H. Pfister, Interactive multicut video segmentation. In: Proceedings of the 24th Pacific Conference on Computer Graphics and Applications: Short Papers, 33-38, 2016.
[5]
N. Marki,; F. Perazzi,; O. Wang,; A. Sorkine-Hornung, Bilateral space video segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 743-751, 2016.
[6]
N. S. Nagaraja,; F. R. Schmidt,; T. Brox, Video segmentation with just a few strokes. In: Proceedings of the IEEE International Conference on Computer Vision, 3235-3243, 2015.
[7]
F. Perazzi,; J. Pont-Tuset,; B. McWilliams,; L. van Gool,; M. Gross,; A. Sorkine-Hornung, A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 724-732, 2016.
[8]
S. Caelles,; K. K. Maninis,; J. Pont-Tuset,; L. Leal-Taixé,; D. Cremers,; L. van Gool, One-shot video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5320-5329, 2017.
[9]
F. Perazzi,; A. Khoreva,; R. Benenson,; B. Schiele,; A. Sorkine-Hornung, Learning video object segmentation from static images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3491-3500, 2017.
[10]
S. Caelles,; Y. Chen,; J. Pont-Tuset,; L. van Gool, Semantically-guided video object segmentation. arXiv preprint arXiv:1704.01926, 2017.
[11]
J. F. Dai,; K. M. He,; J. Sun, Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3150-3158, 2016.
[12]
Y. J. Lee,; J. Kim,; K. Grauman, Key-segments for video object segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, 1995-2002, 2011.
[13]
A. Papazoglou,; V. Ferrari, Fast object segmentation in unconstrained video. In: Proceedings of the IEEE International Conference on Computer Vision, 1777-1784, 2013.
[14]
Y. C. Yang,; G. Sundaramoorthi,; S. Soatto, Self-occlusions and disocclusions in causal video object segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, 4408-4416 2015.
[15]
A. Colombari,; A. Fusiello,; V. Murino, Segmentation and tracking of multiple video objects. Pattern Recognition Vol. 40, No. 4, 1307-1317, 2007.
[16]
F. X. Li,; T. Kim,; A. Humayun,; D. Tsai,; J. M. Rehg, Video segmentation by tracking many figure-ground segments. In: Proceedings of the IEEE International Conference on Computer Vision, 2192-2199, 2013.
[17]
G. Seguin,; P. Bojanowski,; R. Lajugie,; I. Laptev, Instance-level video segmentation from object tracks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3678-3687, 2016.
[18]
B. Drayer; T. Brox, Object detection, tracking, and motion segmentation for object-level video segmentation.arXiv preprint arXiv:1608.03066, 2016.
[19]
J. Pont-Tuset,; F. Perazzi,; S. Caelles,; P. Arbeláez,; A. Sorkine-Hornung,; L. van Gool, The 2017 DAVIS challenge on video object segmentation. arXiv preprint arXiv:1704.00675, 2017.
[20]
P. Voigtlaender,; B. Leibe, Online adaptation of convolutional neural networks for the 2017 DAVIS challenge on video object segmentation. In: Proceedings of the DAVIS Challenge on Video Object Segmentation, 2017.
[21]
A. Khoreva,; R. Benenson,; E. Ilg,; T. Brox,; B. Schiele, Lucid data dreaming for object tracking. In: Proceedings of the DAVIS Challenge on Video Object Segmentation, 2017.
[22]
P. Tokmakov,; K. Alahari,; C. Schmid, Learning motion patterns in videos. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, 3386-3394, 2017.
[23]
X. Li,; Y. Qi,; Z. Wang,; K. Chen,; Z. Liu,; J. Shi,; P. Luo,; X. Tang,; C. C. Loy, Video object segmentation with re-identification. In: Proceedings of the DAVIS Challenge on Video Object Segmentation, 2017.
[24]
Y.-T. Hu,; J.-B. Huang,; A. Schwing, MaskRNN: Instance level video object segmentation. In: Proceedings of the 31st Conference on Neural Information Processing Systems, 324-333, 2017.
[25]
N. Xu,; L. Yang,; Y. Fan,; D. Yue,; Y. Liang,; J. Yang,; T. Huang, YouTube-VOS: A large-scale video object segmentation benchmark. arXiv preprint arXiv:1809.03327, 2018.
[26]
J. Luiten,; P. Voigtlaender,; B. Leibe, PReMVOS: Proposal-generation, refinement and merging for video object segmentation. In: Computer Vision - ACCV 2018. Lecture Notes in Computer Science, Vol. 11364. C. Jawahar,; H. Li,; G. Mori,; K. Schindler, Eds. Springer Cham, 565-580, 2019.
[27]
S. Caelles,; A. Montes,; K.-K. Maninis,; Y. Chen,; L. van Gool,; F. Perazzi,; J. Pont-Tuset, The 2018 DAVIS challenge on video object segmentation. arXiv preprint arXiv:1803.00557, 2018.
[28]
K. He,; G. Gkioxari,; P. Dollar,; R. Girshick, Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 2961-2969, 2017.
[29]
N. Xu,; B. Price,; S. Cohen,; J. Yang,; T. Huang, Deep grabcut for object selection. arXiv preprint arXiv:1707.00243, 2017.
[30]
L. C. Chen,; Y. K. Zhu,; G. Papandreou,; F. Schroff,; H. Adam, Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11211. V. Ferrari,; M. Hebert,; C. Sminchisescu,; Y. Weiss, Eds. Springer Cham, 833-851, 2018.
[31]
L. Yang,; Y. Wang,; X. Xiong,; J. Yang,; A. K. Katsaggelos, Efficient video object segmentation via network modulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6499-6507, 2018.
[32]
Y. H. Chen,; J. Pont-Tuset,; A. Montes,; L. van Gool, Blazingly fast video object segmentation with pixel-wise metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1189-1198, 2018.
[33]
J. Cheng,; Y.-H. Tsai,; W.-C. Hung,; S. Wang,; M.-H. Yang, Fast and accurate online video object segmentation via tracking parts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7415-7424, 2018.
[34]
S. W. Oh,; J. Y. Lee,; K. Sunkavalli,; S. J. Kim, Fast video object segmentation by reference-guided mask propagation. In: Proceedings of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition, 7376-7385, 2018.
[35]
M. Leake,; A. Davis,; A. Truong,; M. Agrawala, Computational video editing for dialogue-driven scenes. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 130, 2017.
[36]
Z. P. Cui,; O. Wang,; P. Tan,; J. Wang, Time slice video synthesis by robust video alignment. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 131, 2017.
[37]
N. Bonneel,; K. Sunkavalli,; J. Tompkin,; D. Q. Sun,; S. Paris,; H. Pfister, Interactive intrinsic video editing. ACM Transactions on Graphics Vol. 33, No. 6, 1-10, 2014.
[38]
F. L. Zhang,; X. Wu,; H. T. Zhang,; J. Wang,; S. M. Hu, Robust background identification for dynamic video editing. ACM Transactions on Graphics Vol. 35, No. 6, Article No. 197, 2016.
[39]
A. Levin,; D. Lischinski,; Y. Weiss, A closed-form solution to natural image matting. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 30, No. 2, 228-242, 2008.
[40]
Y. Y. Chuang,; A. Agarwala,; B. Curless,; D. H. Salesin,; R. Szeliski, Video matting of complex scenes. ACM Transactions on Graphics Vol. 21, No. 3, 243-248, 2002.
[41]
Y. Aksoy,; T. H. Oh,; S. Paris,; M. Pollefeys,; W. Matusik, Semantic soft segmentation. ACM Transactions on Graphics Vol. 37, No. 4, Article No. 72, 2018.
[42]
S. Masnou,; J.-M. Morel, Level lines based disocclusion. In: Proceedings of the International Conference on Image Processing, Vol. 3, 259-263, 1998.
[43]
M. Bertalmio,; G. Sapiro,; V. Caselles,; C. Ballester, Image inpainting. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, 417-424, 2000.
[44]
I. Drori,; D. Cohen-Or,; H. Yeshurun, Fragment-based image completion. ACM Transactions on Graphics Vol. 22, No. 3, 303-312, 2003.
[45]
A. Criminisi,; P. Perez,; K. Toyama, Region filling and object removal by exemplar-based image inpainting. IEEE Transactions on Image Processing Vol. 13, No. 9, 1200-1212, 2004.
[46]
A. A. Efros,; T. K. Leung, Texture synthesis by non-parametric sampling. In: Proceedings of the 7th IEEE International Conference on Computer Vision, 1033-1038, 1999.
[47]
K. A. Patwardhan,; G. Sapiro,; M. Bertalmio, Video inpainting of occluding and occluded objects. In: Proceedings of the IEEE International Conference on Image Processing, II-69, 2005.
[48]
K. A. Patwardhan,; G. Sapiro,; M. Bertalmio, Video inpainting under constrained camera motion. IEEE Transactions on Image Processing Vol. 16, No. 2, 545-553, 2007.
[49]
M. Granados,; K. I. Kim,; J. Tompkin,; J. Kautz,; C. Theobalt, Background inpainting for videos with dynamic objects and a free-moving camera. In: Computer Vision - ECCV 2012. Lecture Notes in Computer Science, Vol. 7572. A. Fitzgibbon,; S. Lazebnik,; P. Perona,; Y. Sato,; C. Schmid, Eds. Springer Berlin Heidelberg, 682-695, 2012.
[50]
J. Herling,; W. Broll, PixMix: A real-time approach to high-quality diminished reality. In: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, 141-150, 2012.
[51]
J. Y. Jia,; Y. W. Tai,; T. P. Wu,; C. K. Tang, Video repairing under variable illumination using cyclic motions. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 28, No. 5, 832-839, 2006.
[52]
H. Grossauer, Inpainting of movies using optical flow. In:Mathematical Models for Registration and Applications to Medical Imaging. Mathematics in Industry, Vol. 10. O. Scherzer, Ed. Springer Berlin Heidelberg, 151-162, 2006.
[53]
Y. Matsushita,; E. Ofek,; W. N. Ge,; X. O. Tang,; H. Y. Shum, Full-frame video stabilization with motion inpainting. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 28, No. 7, 1150-1163, 2006.
[54]
T. Shiratori,; Y. Matsushita,; X. Tang,; S. B. Kang, Video completion by motion field transfer. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 411-418, 2006.
[55]
N. C. Tang,; C. T. Hsu,; C. W. Su,; T. K. Shih,; H. Y. M. Liao, Video inpainting on digitized vintage films via maintaining spatiotemporal continuity. IEEE Transactions on Multimedia Vol. 13, No. 4, 602-614, 2011.
[56]
S. You,; R. T. Tan,; R. Kawakami,; K. Ikeuchi, Robust and fast motion estimation for video completion. In: Proceedings of the MVA, 181-184, 2013.
[57]
A. Bokov,; D. Vatolin, 100+ times faster video completion by optical-flow-guided variational refinement. In: Proceedings of the 25th IEEE International Conference on Image Processing 2122-2126, 2018.
[58]
Y. Wexler,; E. Shechtman,; M. Irani, Space-time completion of video. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 29, No. 3, 463-476, 2007.
[59]
A. Newson,; A. Almansa,; M. Fradet,; Y. Gousseau,; P. Pérez, Video inpainting of complex scenes. SIAM Journal on Imaging Sciences Vol. 7, No. 4, 1993-2019, 2014.
[60]
M. Granados,; J. Tompkin,; K. Kim,; O. Grau,; J. Kautz,; C. Theobalt, How not to be seen—Object removal from videos of crowded scenes. Computer Graphics Forum Vol. 31, No. 2pt1, 219-228, 2012.
[61]
J. B. Huang,; S. B. Kang,; N. Ahuja,; J. Kopf, Temporally coherent completion of dynamic video. ACM Transactions on Graphics Vol. 35, No. 6, Article No. 196, 2016.
[62]
T. T. Le,; A. Almansa,; Y. Gousseau,; S. Masnou, Demonstration abstract: Motion-consistent video inpainting. In: Proceedings of the IEEE International Conference on Image Processing, 4587, 2017.
[63]
D. Pathak,; P. Krahenbuhl,; J. Donahue,; T. Darrell,; A. A. Efros, Context encoders: Feature learning by inpainting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2536-2544, 2016.
[64]
S. Iizuka,; E. Simo-Serra,; H. Ishikawa, Globally and locally consistent image completion. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 107, 2017.
[65]
H. V. Vo,; N. Q. K. Duong,; P. Pérez, Structural inpainting. In: Proceedings of the 26th ACM International Conference on Multimedia, 1948-1956, 2018.
[66]
S. D. Jain,; K. Grauman, Click carving: Segmenting objects in video with point clicks. In: Proceedings of the 4th AAAI Conference on Human Computation and Crowdsourcing, 89-98, 2016.
[67]
S. N. Xie,; Z. W. Tu, Holistically-nested edge detection. International Journal of Computer Vision Vol. 125, Nos. 1-3, 3-18, 2017.
[68]
F. Meyer, Topographic distance and watershed lines. Signal Processing Vol. 38, No. 1, 113-125, 1994.
[69]
F. Perazzi,; J. Pont-Tuset,; B. McWilliams,; L. van Gool,; M. Gross,; A. Sorkine-Hornung, A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 724-732, 2016.
[70]
H. X. Yang,; L. Shao,; F. Zheng,; L. Wang,; Z. Song, Recent advances and trends in visual tracking: A review. Neurocomputing Vol. 74, No. 18, 3823-3831, 2011.
[71]
S. A. Ramakanth,; R. V. Babu, SeamSeg: Video object segmentation using patch seams. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 376-383, 2014.
[72]
Y. H. Tsai,; M. H. Yang,; M. J. Black, Video segmentation via object flow. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3899-3908, 2016.
[73]
J. Long,; E. Shelhamer,; T. Darrell, Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431-3440, 2015.
[74]
L. C. Chen,; G. Papandreou,; I. Kokkinos,; K. Murphy,; A. L. Yuille, DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 4, 834-848, 2018.
[75]
M. Everingham,; S. M. A. Eslami,; L. van Gool,; C. K. I. Williams,; J. Winn,; A. Zisserman, The Pascal visual object classes challenge: A retrospective. International Journal of Computer Vision Vol. 111, No. 1, 98-136, 2015.
[76]
A. Newson,; A. Almansa,; Y. Gousseau,; P. Pérez, Non-local patch-based image inpainting. Image Processing on Line Vol. 7, 373-385, 2017.
[77]
P. Pérez,; M. Gangnet,; A. Blake, Poisson image editing. ACM Transactions on Graphics Vol. 22, No. 3, 313-318, 2003.
[78]
S. Korman,; S. Avidan, Coherency sensitive hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 38, No. 6, 1099-1112, 2016.
[79]
S. A. Ramakanth,; R. V. Babu, FeatureMatch: A general ANNF estimation technique and its applications. IEEE Transactions on Image Processing Vol. 23, No. 5, 2193-2205, 2014.
[80]
A. Dehghan,; S. M. Assari,; M. Shah, GMMCP tracker: Globally optimal generalized maximum multi clique problem for multiple object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4091-4099, 2015.
[81]
A. Roshan Zamir,; A. Dehghan,; M. Shah, GMCP-tracker: Global multi-object tracking using generalized minimum clique graphs. In: Computer Vision - ECCV 2012. Lecture Notes in Computer Science, Vol. 7573. A. Fitzgibbon,; S. Lazebnik,; P. Perona,; Y. Sato,; C. Schmid, Eds. Springer Berlin Heidelberg, 343-356, 2012.
[82]
H. Bay,; A. Ess,; T. Tuytelaars,; L. van Gool, Speeded-up robust features (SURF). Computer Vision and Image Understanding Vol. 110, No. 3, 346-359,2008.
[83]
E. Ilg,; N. Mayer,; T. Saikia,; M. Keuper,; A. Dosovitskiy,; T. Brox, FlowNet 2.0: Evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1647-1655, 2017.
[84]
B. B. Xu,; S. Pathak,; H. Fujii,; A. Yamashita,; H. Asama, Spatio-temporal video completion in spherical image sequences. IEEE Robotics and Automation Letters Vol. 2, No. 4, 2032-2039, 2017.
[85]
J. M. Odobez,; P. Bouthemy, Robust multiresolution estimation of parametric motion models. Journal of Visual Communication and Image Representation Vol. 6, No. 4, 348-365, 1995.
[86]
J. Sánchez, Comparison of motion smoothing strategies for video stabilization using parametric models. Image Processing on Line Vol. 7, 309-346, 2017.
[87]
S. Choi,; T. Kim,; W. Yu, Robust video stabilization to outlier motion using adaptive RANSAC. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 1897-1902, 2009.
[88]
W. C. Chiu,; M. Fritz, Multi-class video co-segmentation with a generative multi-video model. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 321-328, 2013.
[89]
M. Y. Yang,; M. Reso,; J. Tang,; W. T. Liao,; B. Rosenhahn, Temporally object-based video co-segmentation. In: Advances in Visual Computing. Lecture Notes in Computer Science, Vol. 9474. G. Bebis, et al. Eds. Springer Cham, 198-209, 2015.
Computational Visual Media
Pages 267-291
Cite this article:
Le TT, Almansa A, Gousseau Y, et al. Object removal from complex videos using a few annotations. Computational Visual Media, 2019, 5(3): 267-291. https://doi.org/10.1007/s41095-019-0145-0

641

Views

16

Downloads

12

Crossref

N/A

Web of Science

14

Scopus

0

CSCD

Altmetrics

Revised: 20 March 2019
Accepted: 18 May 2019
Published: 22 August 2019
© The author(s) 2019

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from thecopyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Return