National Innovation of Defense Technology, Academy of Military Sciences PLA China, Beijing100071, China.
Department of Automation, Tsinghua University, Beijing100084, China.
Show Author Information
Hide Author Information
Abstract
The Histograms of Oriented Gradients (HOG) can produce good results in an image target recognition mission, but it requires the same size of the target images for classification of inputs. In response to this shortcoming, this paper performs spatial pyramid segmentation on target images of any size, gets the pixel size of each image block dynamically, and further calculates and normalizes the gradient of the oriented feature of each block region in each image layer. The new feature is called the Histogram of Spatial Pyramid Oriented Gradients (HSPOG). This approach can obtain stable vectors for images of any size, and increase the target detection rate in the image recognition process significantly. Finally, the article verifies the algorithm using VOC2012 image data and compares the effect of HOG.
B.Liang and L.Zheng, Diffractive phase elements based on two-dimensional artificial dielectrics, presented at the 22th International Conference on Pattern Recognition, Stockholm, Sweden, 2014.
[3]
Q.Liu, Z. G.Wu, and J. M.Guo, The conversion of histograms of oriented gradient in different vision-angle and rotation-angle, Control Theory & Applications, vol. 27, no. 9, pp. 1269-1272, 2010.
S. A.Iamsa and P.Horata, Hand written character recognition using histograms of oriented gradient features in deep learning of artificial neural network, presented at the 3th International Conference on IT Convergence and Security, Macao, China, 2013.
[5]
Y. W.Pang, Y.Yuan, X. L.Li, and J.Pan, Efficient HOG human detection, Signal Processing, vol. 91, no. 4. pp. 773-781, 2011.
Y. E.Lina, Y. L.Chen, and J. L.Lin, Pedestrian fast detection based on histograms of oriented gradient, Computer Engineering, vol. 36, no. 22, pp. 206-207, 2010.
K.Grauman and T.Darrell, The pyramid match kernel: Discriminative classification with sets of image features, presented at the 10th IEEE Conference on Computer Vision and Pattern Recognition (CVDR), Beijing, China, 2005.
[8]
N. V.Tavari and A. V.Deorankar, Indian sign language recognition based on histograms of oriented gradient, International Journal of Computer Science & Information Technoloy, vol. 5, no. 3, pp. 3657-3660, 2014.
H. X.Jia and Y. J.Zhang, Fast human detection by boosting histograms of oriented gradients, presented at the 8th International Conference on Image and Graphics, Tianjin, China, 2007.
[10]
A.Krizhevsky, I.Sutskever, and G. E.Hinton, ImageNet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, vol. 25, no. 2, pp. 1-8, 2012.
M. D.Zeiler and R.Fergus, Visualizing and understanding convolutional networks, presented at the 13th European Conference on Computer Vision (ECCV), Zurich, Switzerland, 2014.
[12]
J.Donahue, Y.Jia, and O.Vinyals, DeCAF: A deep convolutional activation feature for generic visual recognition, https://arxiv.org/abs/1310.1531, 2013.
[13]
R.Girshick, J.Donahue, T.Darrel, and J.Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, presented at the 31th IEEE Conference on Computer Vision, Columbia, CA, USA, 2014.
[14]
K.He, X.Zhang, and S.Ren, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 37, no. 9, pp. 1904-1916, 2015.
P. C.Hung, Colorimetric calibration in electronic imaging devices using a look-up-table model and interpolations, Journal of Electronic Imaging, vol. 2, no. 1, p. 53, 1993.
P.Felzenszwalb, D.Mcallester, and D.Ramanan, A discriminatively trained, multiscale, deformable part model, presented at the 25th Conference on Computer Vision and Pattern Recognition (CVPR), Alaska, AK, USA, 2008.
[17]
J. P.Dong and C.Kim, A hybrid bags-of-feature model for sports scene classification, Journal of Signal Processing Systems, vol. 81, no. 2, pp. 249-263, 2014.
Guo S, Liu F, Yuan X, et al. HSPOG: An Optimized Target Recognition Method Based on Histogram of Spatial Pyramid Oriented Gradients. Tsinghua Science and Technology, 2021, 26(4): 475-483. https://doi.org/10.26599/TST.2020.9010011
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
10.26599/TST.2020.9010011.F001
Cropping or warping to fit a fixed size.
10.26599/TST.2020.9010011.F002
Zooming method used in HSPOG. If the large edge of an image can not be divided by 2, it would be filled with zeros along the large edge until it can be divided by 2 exactly.
2 HSPOG
HSPOG is an improvement of HOG. For the same target in two images with different sizes, the two target image areas would be divided into cells, and the relative characteristics between corresponding cells of the two areas would remain unchanged even if the cells have different sizes. The goal of HSPOG is to extract stable target features from different image targets and remove the unique image target size and ratio requirements of HOG. HOG always scales an image target into a fixed size of and defines the cell size of the target as . However, we think that the cell size should be dynamically determined by the cell number. So when calculating HSPOG features, we scale a target image dynamically without distorting (see
Fig. 1
). The scale ratio can be calculated by Eq. (
1
),
where is the adjustment scale.
10.26599/TST.2020.9010011.F001
Cropping or warping to fit a fixed size.
To get more target features of pyramid scales, the bigger the value of n in principle, the higher the accuracy of recognition. However, each increment of n by 1 will exponentially increase the computation amount, so we make empirically. There is a feature vector for each scale, adding all the features together can form a long descriptor, which includes coarse features on large spatial scales and fine features on small spatial scales. HSPOG is described in Algorithm 1. The length of the small edge of an input image after zooming may be 64, if we specify the cell number along the small edge to be 8, the parameter n is 3. If the large edge of an image cannot be divided exactly by , we must fill the image by zeros before it can be divided exactly by , as shown in
Fig. 2
.
10.26599/TST.2020.9010011.F002
Zooming method used in HSPOG. If the large edge of an image can not be divided by 2, it would be filled with zeros along the large edge until it can be divided by 2 exactly.
The flow chart of HSPOG is shown in
Fig. 3
. HSPOG can avoid image distortion and information loss. The size of an image cell is calculated dynamically, while the corresponding of HOG is constant. Dynamic cell determination ensures the ratio of the cell proportions the same. HSPOG contains multiple spatial pyramid scales for feature calculation, which enables HSPOG to integrate coarse features at large spatial scales and fine features at small spatial scales while preserving information integrity.
10.26599/TST.2020.9010011.F003
Flow chart of HSPOG ( f is the feature vector of HSPOG).
10.26599/TST.2020.9010011.F003
Flow chart of HSPOG ( f is the feature vector of HSPOG).
10.26599/TST.2020.9010011.F004
Two types of bins ( is thei-th bin).
3 Accelerating HSPOG
The rapid detection and recognition of an image target is key to engineering applications. The article improves the algorithm of HSPOG to accelerate the calculation of features.
When calculating a feature map, HOG divides 180 degrees into 9 oriented blocks commonly, and the degrees between 180 and 360 can be considered as its negative situations. HSPOG not only divides 180 degrees into 9 blocks, but also divides 360 degrees into 18 blocks to obtain better oriented gradient features. In this process, trilinear interpolation[
15
] is no longer used to calculate the degrees of its adjacent fans.
In HSPOG, we use bilinear interpolation instead of trilinear interpolation to calculate the weights of each pixel to neighboring cells, which is a fast calculating method[
16
]. For example, the similarity between 85 and 80 degrees is 0.75, and the similarity between 100 and 85 degrees is 0.25. So, the best position of 85 degrees may be determined as the fifth block. We illustrate the processing in
Fig. 4
. Feature map of each cell will be normalized four times because it may be contained in four blocks. When all pixel features are added to HSPOG together for an image target, we will discard the features of boundary pixels because they cannot be normalized four times. For an image target, assuming is the number of cells in the row and is the number of cells in the column, the length of the image target HSPOG is . The 32 features in the previous equation contains 9 containers (see
Fig. 4a
), 18 containers (see
Fig. 4b
), 4 texture features, and 1 truncated feature, fast HSPOG is shown in Algorithm 2. The acceleration algorithm is an approximate simplification of the original algorithm, but the attenuation is negligible.
10.26599/TST.2020.9010011.F004
Two types of bins ( is thei-th bin).
The spatial pyramid scales shown in
Fig. 5
will be used when computing the HSPOG for an image. The scales play important roles in conventional methods, e.g., the Scale-Invariant Feature Transform (SIFT) vectors are also collected at multiple scales[
7
,
17
]. So, we also compute the HSPOG at multiple scales, in the processing, scales include , , and . All of these pyramid scales make HSPOG more potent during the processing of object recognition missions.
10.26599/TST.2020.9010011.F005
Spatial pyramid scales and the feature map of an image.
10.26599/TST.2020.9010011.F005
Spatial pyramid scales and the feature map of an image.
10.26599/TST.2020.9010011.F006
Curves of recognition rate and false alarm of experiments about (a) airplane, (b) bicycle, (c) bird, (d) boat, (e) bottle, and (f) bus.
10.26599/TST.2020.9010011.F007
Curves recognition rate of and false alarm of experiments about (a) car, (b) chair, (c) cow, (d) dog, (e) horse, and (f) motobike.
10.26599/TST.2020.9010011.F008
Curves of false alarm and recognition rate of experiments about (a) person, (b) potted plant, (c) sheep, (d) sofa, (e) cat, and (f) dinningtable.
10.26599/TST.2020.9010011.F009
Part of positive and negative training samples. (a), (b), and (c) are samples of positive images and (d), (e), and (f) are samples of negtive images.
10.26599/TST.2020.9010011.F010
Comparison of HSPOG and HOG after increasing feature quantity (Each point represents 50 sample images and the total number of testing images is 1684).