SRNET: A Shallow Skip Connection Based Convolutional Neural Network Design for Resolving Singularities

Robail Yasrab

doi:10.1007/s11390-019-1950-8

| Sign up

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Outline

Abstract

Keywords

Electronic Supplementary Material

References

Show full outline

Hide outline

Regular Paper

SRNET: A Shallow Skip Connection Based Convolutional Neural Network Design for Resolving Singularities

Robail Yasrab

Computer Vision Laboratory, School of Computer Science, University of Nottingham, Nottingham, NG8-1BB, U.K.

Show Author Information

Abstract

Convolutional neural networks (CNNs) have shown tremendous progress and performance in recent years. Since emergence, CNNs have exhibited excellent performance in most of classification and segmentation tasks. Currently, the CNN family includes various architectures that dominate major vision-based recognition tasks. However, building a neural network (NN) by simply stacking convolution blocks inevitably limits its optimization ability and introduces overfitting and vanishing gradient problems. One of the key reasons for the aforementioned issues is network singularities, which have lately caused degenerating manifolds in the loss landscape. This situation leads to a slow learning process and lower performance. In this scenario, the skip connections turned out to be an essential unit of the CNN design to mitigate network singularities. The proposed idea of this research is to introduce skip connections in NN architecture to augment the information flow, mitigate singularities and improve performance. This research experimented with different levels of skip connections and proposed the placement strategy of these links for any CNN. To prove the proposed hypothesis, we designed an experimental CNN architecture, named as Shallow Wide ResNet or SRNet, as it uses wide residual network as a base network design. We have performed numerous experiments to assess the validity of the proposed idea. CIFAR-10 and CIFAR-100, two well-known datasets are used for training and testing CNNs. The final empirical results have shown a great many of promising outcomes in terms of performance, efficiency and reduction in network singularities issues.

Keywords

convolutional neural network (CNN)wide residual network (WRN)dropout skip connection deep neural network (DNN)

Electronic Supplementary Material

Download File(s)

jcst-34-4-924-Highlights.pdf (654.9 KB)

References

[1]

Krizhevsky A, Ilya S, Geoffrey E H. ImageNet classification with deep convolutional neural networks. In Proc. the 26th Annual Conference on Neural Information Processing Systems, December 2012, pp.1106-1114.

[2]

Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Berg A C. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3): 211-252.

Crossref Google Scholar

[3]

LeCun Y, Yoshua B, Geoffrey E H. Deep learning. Nature, 2015, 521(7553): 436-444.

Crossref Google Scholar

[4]

Zou W Y, Wang X, Sun M, Lin Y. Generic object detection with dense neural patterns and regionlets. arXiv: 1404.4316, 2014. https://arxiv.org/abs/1404.4316, July 2018.

Crossref

[5]

Lin M, Chen Q, Yan S. Network in network. arXiv: 13-12.4400, 2013. https://arxiv.org/abs/1312.4400, July 2018.

[6]

Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, Le-Cun Y. OverFeat: Integrated recognition, localization and detection using convolutional networks. arXiv: 1312.6229, 2013. https://arxiv.org/abs/1312.6229, July 2018.

[7]

Simonyan K. Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556, 2014. https://arxiv.org/abs/1409.1556, July 2018.

[8]

Yasrab R. ECRU: An encoder-decoder based convolution neural network (CNN) for road-scene understanding. Journal of Imaging, 2018, 4(10): Article No. 116.

Crossref Google Scholar

[9]

Yasrab R, Gu N, Zhang X. SCNet: A simplified encoder-decoder CNN for semantic segmentation. In Proc. the 5th International Conference on Computer Science and Network Technology, December 2016, pp.785-789.

Crossref

[10]

Yasrab R, Gu N, Zhang X. An encoder-decoder based convolution neural network (CNN) for future advanced driver assistance system (ADAS). Applied Sciences, 2017, 7(4): Article No. 312.

Crossref Google Scholar

[11]

Sutskever I, Martens J, Dahl G, Hinton G. On the importance of initialization and momentum in deep learning. In Proc. the 30th International Conference on Machine Learning, June 2013, pp.1139-1147.

[12]

Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In Proc. the 13th International Conference on Artificial Intelligence and Statistics, May 2010, pp.249-256.

[13]

He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.1026-1034.

Crossref

[14]

Lee C Y, Xie S, Gallagher P, Zhang Z, Tu Z. Deeply-supervised nets. In Proc. the 18th International Conference on Artificial Intelligence and Statistics, May 2015, pp.562-570.

[15]

Raiko T, Valpola H, LeCun Y. Deep learning made easier by linear transformations in perceptrons. In Proc. the 15th International Conference on Artificial Intelligence and Statistics, April 2012, pp.924-932.

[16]

Schmidhuber J. Learning complex, extended sequences using the principle of history compression. Neural Computation, 1992, 4(2): 234-242.

Crossref Google Scholar

[17]

Chen T, Goodfellow I, Shlens J. Net2net: Accelerating learning via knowledge transfer. arXiv: 1511.05641, 2015. https://arxiv.org/abs/1511.05641, November 2018.

[18]

Romero A, Ballas N, Kahou S E, Chassang A, Gatta C, Bengio Y. FitNets: Hints for thin deep nets. arXiv: 1412.6-550, 2014. https://arxiv.org/abs/1412.6550, July 2018.

[19]

Wei H, Zhang J, Cousseau F, Ozeki T, Amari S. Dynamics of learning near singularities in layered networks. Neural Computation, 2008, 20(3): 813-843.

Crossref Google Scholar

[20]

Amari S I, Park H, Ozeki T. Singularities affect dynamics of learning in neuromanifolds. Neural Computation, 2006, 18(5), 1007-1065.

Crossref Google Scholar

[21]

Saxe A M, McClelland J L, Ganguli S. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv: 1312.6120, 2013. https://arxiv.org/abs/1312.6120, August 2018.

[22]

Orhan A E, Pitkow X. Skip connections eliminate singularities. arXiv: 1701.09175, 2017. https://arxiv.org/abs/17-01.09175, September 2018.

[23]

He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778.

Crossref

[24]

Huang G, Sun Y, Liu Z, Sedra D, Weinberger K Q. Deep networks with stochastic depth. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.646-661.

Crossref

[25]

He K, Zhang X, Ren S, Sun J. Identity mappings in deep residual networks. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.630-645.

Crossref

[26]

Srivastava R K, Greff K, Schmidhuber J. Highway networks. arXiv: 1505.00387, 2015. https://arxiv.org/abs/1505.00387, June 2018.

[27]

Zhang K, Sun M, Han X, Yuan X, Guo L, Liu T. Residual networks of residual networks: Multilevel residual networks. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(6): 1303-1314.

Crossref Google Scholar

[28]

Zhang K, Guo L, Gao C, Zhao Z. Pyramidal RoR for image classification. arXiv: 1710.00307, 2017. https://arxiv.org/abs/1710.00307, May 2018.

[29]

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.1-9.

Crossref

[30]

Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 1994, 5(2): 157-166.

Crossref Google Scholar

[31]

Shen F, Gan R, Zeng G. Weighted residuals for very deep networks. In Proc. the 3rd International Conference on Systems and Informatics, November 2016, pp.936-941.

Crossref

[32]

Bengio Y, LeCun Y. Scaling learning algorithms towards AI. In Large-Scale Kernel Machines, Bottou L, Chapelle O, DeCoste D, Weston J (eds.), MIT Press, 2017.

[33]

Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y. An empirical evaluation of deep architectures on problems with many factors of variation. In Proc. the 24th International Conference on Machine Learning, June 2007, pp.473-480.

Crossref

[34]

Zagoruyko S, Komodakis N. Wide residual networks. arXiv: 1605.07146, 2016. https://arxiv.org/abs/1605.07146, January 2019.

[35]

Srivastava N, Hinton G E, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014, 15(1): 1929-1958.

Google Scholar

[36]

Huang G, Liu Z, Weinberger K Q, Maaten L. Densely connected convolutional networks. arXiv: 1608.06993, 2016. https://arxiv.org/abs/1608.06993, September 2018.

[37]

Han D, Kim J, Kim J. Deep pyramidal residual networks. arXiv: 1610.02915, 2016. https://arxiv.org/abs/1610.02915, July 2018.

[38]

Xie S, Girshick R, Dollár P, Tu Z, He K. Aggregated residual transformations for deep neural networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.5987-5995.

Crossref

[39]

Szegedy C, Loffe S, Vanhoucke V, Alemi A A. Inception-v4, Inception-ResNet and the impact of residual connections on learning. In Proc. the 31st AAAI Conference on Artificial Intelligence, February 2017, pp.4278-4284.

Crossref

[40]

Loffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proc. the 32nd International Conference on Machine Learning, July 2015, pp.448-456.

[41]

Nair V, Hinton G E. Rectified linear units improve restricted Boltzmann machines. In Proc. the 27th International Conference on Machine Learning, June 2010, pp.807-814.

[42]

Hinton G E, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv: 1207.0580, 2012. https://arxiv.org/abs/1207.0580, July 2018.

[43]

Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: Convolutional architecture for fast feature embedding. In Proc. the 22nd ACM International Conference on Multimedia, November 2014, pp.675-678.

Crossref

[44]

LeCun Y, Boser B, Denker J S, Henderson D, Howard R E, Hubbard W, Jackel L D. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1989, 1(4): 541-551.

Crossref Google Scholar

[45]

Rastegari M, Ordonez V, Redmon J, Farhadi A. XNOR-Net: ImageNet classification using binary convolutional neural networks. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.525-542.

Crossref

[46]

Sheen S, Lyu J. Median binary-connect method and a binary convolutional neural network for word recognition. arXiv: 1811.02784v1, 2018. https://arxiv.org/abs/18-11.02784v1, December 2018.

[47]

Lin X, Zhao C, Pan W. Towards accurate binary convolutional neural network. In Proc. the 2017 Annual Conference on Neural Information Processing Systems, December 2017, pp.344-352.

[48]

Juefei-Xu F, Boddeti V N, Savvides M. Local binary convolutional neural networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.4284-4293.

Crossref

Journal of Computer Science and Technology

Volume 34 Issue 4,
July 2019

Pages 924-938

DOI: 10.1007/s11390-019-1950-8

Cite this article:

Yasrab R. SRNET: A Shallow Skip Connection Based Convolutional Neural Network Design for Resolving Singularities. Journal of Computer Science and Technology, 2019, 34(4): 924-938. https://doi.org/10.1007/s11390-019-1950-8