Temporally consistent video colorization with deep feature propagation and self-regularization learning

Yihao Liu; Hengyuan Zhao; Kelvin C. K. Chan; Xintao Wang; Chen Change Loy; Yu Qiao; Chao Dong

doi:10.1007/s41095-023-0342-8

| Sign up

PDF (16.6 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Research Article | Open Access

Temporally consistent video colorization with deep feature propagation and self-regularization learning

Yihao Liu^{¹^,²^,^*}, Hengyuan Zhao^{¹^,^*}, Kelvin C. K. Chan^³, Xintao Wang^⁴, Chen Change Loy^³, Yu Qiao^¹, Chao Dong^¹()

1Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China

2University of Chinese Academy of Sciences, Beijing 100049, China

3Department of Electrical & Computer Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798

4Applied Research Center, Tencent PCG, Shenzhen, China

* Yihao Liu and Hengyuan Zhao contributed equally to this work.

Show Author Information

Graphical Abstract

View original image Download original image

Abstract

Video colorization is a challenging and highly ill-posed problem. Although recent years have witnessed remarkable progress in single image colorization, there is relatively less research effort on video colorization, and existing methods always suffer from severe flickering artifacts (temporal incon-sistency) or unsatisfactory colorization. We address this problem from a new perspective, by jointly considering colorization and temporal consistency in a unified framework. Specifically, we propose a novel temporally consistent video colorization (TCVC) framework. TCVC effectively propagates frame-level deep features in a bidirectional way to enhance the temporal consistency of colorization. Furthermore, TCVC introduces a self-regularization learning (SRL) scheme to minimize the differences in predictions obtained using different time steps. SRL does not require any ground-truth color videos for training and can further improve temporal consistency. Experiments demonstrate that our method can not only provide visually pleasing colorized video, but also with clearly better temporal consistency than state-of-the-art methods. A video demo is provided at https://www.youtube.com/watch?v=c7dczMs-olE, while code is available at https://github.com/lyh-18/TCVC-Temporally-Consistent-Video-Colorization.

Keywords

video colorization temporal consistency feature propagation self-regularization

References

[1]

Ren,

S. Q.

; He,

K. M.

; Girshick,

; Sun,

Faster R-CNN: Towards real-time object detection with region proposal networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, Vol. 1, 91–99, 2015.

[2]

Redmon,

; Divvala,

; Girshick,

; Farhadi,

You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 779–788, 2016.

Crossref

[3]

Vondrick,

; Shrivastava,

; Fathi,

; Guadarrama,

; Murphy,

Tracking emerges by colorizing videos. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11217. Ferrari,

; Hebert,

; Sminchisescu,

; Weiss,

Eds. Springer Cham, 402–419, 2018.

Crossref

[4]

Zhang,

Z. P.

; Peng,

H. W.

Deeper and wider Siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4586–4595, 2019.

Crossref

[5]

Larsson,

; Maire,

; Shakhnarovich,

Colorization as a proxy task for visual understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 840–849, 2017.

Crossref

[6]

Iizuka,

; Simo-Serra,

; Ishikawa,

Let there be color!: Joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Transactions on Graphics Vol. 35, No. 4, Article No. 110, 2016.

Crossref Google Scholar

[7]

Zhang,

; Isola,

; Efros,

A. A.

Colorful imagecolorization. In: Computer Vision – ECCV 2016. Lecture Notes in Computer Science, Vol. 9907. Leibe,

; Matas,

; Sebe,

; Welling,

Eds. Springer Cham, 649–666, 2016.

Crossref

[8]

Cheng,

Z. Z.

; Yang,

Q. X.

; Sheng,

Deep colorization. In: Proceedings of the IEEE International Conference on Computer Vision, 415–423, 2015.

Crossref

[9]

Zhang,

; Zhu,

J. Y.

; Isola,

; Geng,

X. Y.

; Lin,

A. S.

; Yu,

T. H.

; Efros,

A. A.

Real-time user-guided image colorization with learned deep priors. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 119, 2017.

Crossref Google Scholar

[10]

Su,

J. W.

; Chu,

H. K.

; Huang,

J. B.

Instance-aware image colorization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7965–7974, 2020.

Crossref

[11]

Paul,

; Bhattacharya,

; Gupta,

Spatiotemporal colorization of video using 3D steerable Pyramids. IEEE Transactions on Circuits and Systems for Video Technology Vol. 27, No. 8, 1605–1619, 2017.

Crossref Google Scholar

[12]

Sheng,

; Sun,

H. Q.

; Magnor,

; Li,

Video colorization using parallel optimization in feature space. IEEE Transactions on Circuits and Systems for Video Technology Vol. 24, No. 3, 407–417, 2014.

Crossref Google Scholar

[13]

Lei,

C. Y.

; Chen,

Q. F.

Fully automatic video colorization with self-regularization and diversity. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3748–3756, 2019.

Crossref

[14]

Bonneel,

; Tompkin,

; Sunkavalli,

; Sun,

D. Q.

; Paris,

; Pfister,

Blind video temporal consistency. ACM Transactions on Graphics Vol. 34, No. 6, Article No. 196, 2015.

Crossref Google Scholar

[15]

Yao,

C. H.

; Chang,

C. Y.

; Chien,

S. Y.

Occlusion-aware video temporal consistency. In: Proceedings of the 25th ACM International Conference on Multimedia, 777–785, 2017.

Crossref

[16]

Lai,

W. S.

; Huang,

J. B.

; Wang,

; Shechtman,

; Yumer,

; Yang,

M. H.

Learning blind video temporal consistency. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11219. Ferrari,

; Hebert,

; Sminchisescu,

; Weiss,

Eds. Springer Cham, 179–195, 2018.

Crossref

[17]

Lei,

C. Y.

; Xing,

Y. Z.

; Chen,

Q. F.

Blind video temporal consistency via deep video prior. arXiv preprint arXiv:2010.11838, 2020.

Google Scholar

[18]

Levin,

; Lischinski,

; Weiss,

Colorization using optimization. ACM Transactions on Graphics Vol. 23, No. 3, 689–694, 2004.

Crossref Google Scholar

[19]

Qu,

Y. G.

; Wong,

T. T.

; Heng,

P. A.

Manga colorization. ACM Transactions on Graphics Vol. 25, No. 3, 1214–1220, 2006.

Crossref Google Scholar

[20]

Luan,

; Wen,

; Cohen-Or,

; Liang,

; Xu,

Y. Q.

; Shum,

H. Y.

Natural image colorization. In: Proceedings of the 18th Eurographics Conference on Rendering Techniques, 309–320, 2007.

[21]

Larsson,

; Maire,

; Shakhnarovich,

Learning representations for automatic colorization. In: Com-puter Vision – ECCV 2016. Lecture Notes in Computer Science, Vol. 9908. Leibe,

; Matas,

; Sebe,

; Welling,

Eds. Springer Cham, 577–593, 2016.

Crossref

[22]

Chen,

X. W.

; Zou,

D. Q.

; Zhao,

Q. P.

; Tan,

Manifold preserving edit propagation. ACM Transactions on Graphics Vol. 31, No. 6, Article No. 132, 2012.

Crossref Google Scholar

[23]

Gupta,

R. K.

; Chia,

A. Y. S.

; Rajan,

; Ng,

E. S.

; Huang,

Z. Y.

Image colorization using similar images. In: Proceedings of the 20th ACM International Conference on Multimedia, 369–378, 2012.

Crossref

[24]

Welsh,

; Ashikhmin,

; Mueller,

Transferring color to greyscale images. In: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, 277–280, 2002.

Crossref

[25]

Liu,

X. P.

; Wan,

; Qu,

Y. G.

; Wong,

T. T.

; Lin,

; Leung,

C. S.

; Heng,

P. A.

Intrinsic colorization. ACM Transactions on Graphics Vol. 27, No. 5, Article No. 152, 2008.

Crossref Google Scholar

[26]

He,

M. M.

; Chen,

D. D.

; Liao,

; Sander,

P. V.

; Yuan,

Deep exemplar-based colorization. ACM Transactions on Graphics Vol. 37, No. 4, Article No. 47, 2018.

Crossref Google Scholar

[27]

Lee,

; Kim,

; Lee,

; Kim,

; Chang,

; Choo,

Reference-based sketch image colorization using augmented-self reference and dense semantic correspondence. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5800–5809, 2020.

Crossref

[28]

Yoo,

; Bahng,

; Chung,

; Lee,

; Chang,

; Choo,

Coloring with limited data: Few-shot colorization via memory augmented networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11275–11284, 2019.

Crossref

[29]

Xu,

Z. Y.

; Wang,

T. T.

; Fang,

F. M.

; Sheng,

; Zhang,

G. X.

Stylization-based architecture for fast deep exemplar colorization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9360–9369, 2020.

Crossref

[30]

Zhang,

; He,

M. M.

; Liao,

; Sander,

P. V.

; Yuan,

; Bermak,

; Chen,

Deep exemplar-based video colorization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8044–8053, 2019.

Crossref

[31]

Shi,

; Zhang,

J. Q.

; Chen,

S. Y.

; Gao,

; Lai,

Y. K.

; Zhang,

F. L.

Reference-based deep line art video colorization. IEEE Transactions on Visualization and Computer Graphics Vol. 29, No. 6, 2965–2979, 2023.

Crossref Google Scholar

[32]

Thasarathan,

; Nazeri,

; Ebrahimi,

Automatic temporally coherent video colorization. In: Proceedings of the 16th Conference on Computer and Robot Vision, 189–194, 2019.

Crossref

[33]

Iizuka,

; Simo-Serra,

DeepRemaster: Temporal source-reference attention networks for comprehensive video enhancement. ACM Transactions on Graphics Vol. 38, No. 6, Article No. 176, 2019.

Crossref Google Scholar

[34]

Gatys,

L. A.

; Ecker,

A. S.

; Bethge,

A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576, 2015.

Crossref Google Scholar

[35]

Zhu,

J. Y.

; Park,

; Isola,

; Efros,

A. A.

Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2242–2251, 2017.

Crossref

[36]

Ruder,

; Dosovitskiy,

; Brox,

Artistic style transfer for videos. In: Pattern Recognition. Lecture Notes in Computer Science, Vol. 9796. Rosenhahn,

; Andres,

Eds. Springer Cham, 26–36, 2016.

Crossref

[37]

Jampani,

; Gadde,

; Gehler,

P. V.

Video pro-pagation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3154–3164, 2017.

Crossref

[38]

Chu,

M. Y.

; Xie,

; Mayer,

; Leal-Taixé,

; Thuerey,

Learning temporal coherence via self-supervision for GAN-based video generation. ACM Transactions on Graphics Vol. 39, No. 4, Article No. 75, 2020.

Crossref Google Scholar

[39]

Dong,

; Liu,

Y. H.

; Zhang,

; Chen,

S. F.

; Qiao,

FD-GAN: Generative adversarial networks with fusion-discriminator for single image dehazing. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34, No. 7, 10729–10736, 2020.

Crossref Google Scholar

[40]

He,

J. W.

; Liu,

Y. H.

; Qiao,

; Dong,

Conditional sequential modulation for efficient global image retouching. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12358. Vedaldi,

; Bischof,

; Brox,

; Frahm,

J. M.

Eds. Springer Cham, 679–695, 2020.

Crossref

[41]

Eilertsen,

; Mantiuk,

R. K.

; Unger,

Single-frame regularization for temporally stable CNNs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11168–11177, 2019.

Crossref

[42]

Lei,

C. Y.

; Xing,

Y. Z.

; Chen,

Q. F.

Blind video temporal consistency via deep video prior. arXiv preprint arXiv:2010.11838, 2020.

Google Scholar

[43]

Johnson,

; Alahi,

; Li,

F. F.

Perceptual losses for real-time style transfer and super-resolution. In: Computer Vision – ECCV 2016. Lecture Notes in Computer Science, Vol. 9906. Leibe,

; Matas,

; Sebe,

; Welling,

Eds. Springer Cham, 694–711, 2016.

Crossref

[44]

Jaderberg,

; Simonyan,

; Zisserman,

; Kavukcuoglu,

Spatial transformer networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, Vol. 2, 2017–2025, 2015.

[45]

Ilg,

; Mayer,

; Saikia,

; Keuper,

; Dosovitskiy,

; Brox,

FlowNet 2.0: Evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1647–1655, 2017.

Crossref

[46]

Perazzi,

; Pont-Tuset,

; McWilliams,

; Van Gool,

; Gross,

; Sorkine-Hornung,

A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 724–732, 2016.

Crossref

[47]

Hasler,

; Suesstrunk,

S. E.

Measuring colorfulness in natural images. In: Proceedings of the SPIE 5007, Human Vision and Electronic Imaging VIII, 87–95, 2003.

Crossref

[48]

Kingma,

D. P.

; Ba,

Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

Google Scholar

[49]

Deshpande,

; Lu,

J. J.

; Yeh,

M. C.

; Chong,

M. J.

; Forsyth,

Learning diverse image colorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2877–2885, 2017.

Crossref

[50]

Xue,

T. F.

; Chen,

B. A.

; Wu,

J. J.

; Wei,

D. L.

; Freeman,

W. T.

Video enhancement with task-oriented flow. International Journal of Computer Vision Vol. 127, No. 8, 1106–1125, 2019.

Crossref Google Scholar

[51]

Bao,

W. B.

; Lai,

W. S.

; Ma,

; Zhang,

X. Y.

; Gao,

Z. Y.

; Yang,

M. H.

Depth-aware video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3698–3707, 2019.

Crossref

[52]

Lu,

; Ouyang,

W. L.

; Xu,

; Zhang,

X. Y.

; Cai,

C. L.

; Gao,

Z. Y.

DVC: An end-to-end deep video compression framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10998–11007, 2019.

Crossref

Computational Visual Media

Volume 10 Issue 2,
April 2024

Pages 375-395

DOI: 10.1007/s41095-023-0342-8

Cite this article:

Liu Y, Zhao H, Chan KCK, et al. Temporally consistent video colorization with deep feature propagation and self-regularization learning. Computational Visual Media, 2024, 10(2): 375-395. https://doi.org/10.1007/s41095-023-0342-8