Estimation algorithm of driver's gaze zone based on lightweight spatial feature encoding network

Mingfang ZHANG; Guilin LI; Chuna WU; Li WANG; Lianghao TONG

doi:10.16511/j.cnki.qhdxxb.2023.26.045

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (7.9 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Publishing Language: Chinese

Estimation algorithm of driver's gaze zone based on lightweight spatial feature encoding network

Mingfang ZHANG^¹(

), Guilin LI^¹, Chuna WU^², Li WANG^¹, Lianghao TONG^¹

Beijing Key Laboratory of Urban Road Intelligent Traffic Control Technology, North China University of Technology, Beijing 100144, China

Key Laboratory of Operation Safety Technology on Transport Vehicles, Research Institute of Highway, Ministry of Transport, Beijing 100088, China

Show Author Information

Abstract

Objective

The real-time monitoring of a driver's gaze region is essential for human-machine shared driving vehicles to understand and predict the driver's intentions. Because of the limited computational resources and storage capacity of in-vehicle platforms, existing gaze region estimation algorithms often hardly balance accuracy and real-time performance and ignore temporal information.

Methods

Therefore, this paper proposes a lightweight spatial feature encoding network (LSFENet) for driver gaze region estimation. First, the image sequence of the driver's upper body is captured by an RGB camera. Image preprocessing steps, including face alignment and glasses removal, are performed to obtain left- and right-eye images and facial keypoint coordinates to handle challenges such as cluttered backgrounds and facial occlusions in the captured images. Face alignment is conducted using the multi-task cascaded convolutional network algorithm, and the glasses are removed using the cycle-consistent adversarial network algorithm. Second, we build the LSFENet feature extraction network based on the GCSbottleneck module to improve the MobileNetV2 architecture, since the inverted residual structure in the MobileNetV2 network requires a significant amount of memory and floating-point operations and ignores the redundancy and the correlation among the feature maps. We embed a ghost module to improve memory consumption and integrate the channel and spatial attention modules to extract the cross-channel and spatial information from the feature map. Next, the Kronecker product is used to fuse eye features with facial keypoint features to reduce the impact of the information complexity imbalance. Then, the fused features from the images at continuous frames are input into a recurrent neural network to estimate the gaze zone of the image sequence. Finally, the proposed network is evaluated using the public driver gaze in the wild (DGW) dataset and a self-collected dataset. The evaluation metrics include the number of parameters, the floating-point operations per second (FLOPs), the frames per second (FPS), and the F₁ score.

Results

The experimental results showed the following: (1) The gaze region estimation accuracy of the proposed algorithm was 97.08%, which was approximately 7% higher than that of the original MobileNetV2. Additionally, both the number of parameters and FLOPs were reduced by 22.5%, and the FPS was improved by 36.43%. The proposed network had a frame rate of approximately 103 FPS and satisfied the computational efficiency and accuracy requirements under in-vehicle environments. (2) The estimation accuracies of the gaze regions 1, 2, 3, 4, and 9 were over 85% for the proposed algorithm. The macro-average and micro-average precisions of the DGW dataset reached 74.32% and 76.01%, respectively. (3) The proposed algorithm provided high classification accuracy for fine-grained eye images with small intra-class differences. (4) The visualization results of the class activation mapping demonstrated that the proposed algorithm had strong adaptability to various lighting conditions and glass occlusion situations.

Conclusions

The research results are of great significance for the recognition of a driver's visual distraction states.

Keywords

recurrent neural network feature extraction attention mechanism gaze zone estimation lightweight spatial feature encoding network Kronecker's product

CLC number: U495 Document code: A Article ID: 1000-0054(2024)01-0044-11

References

[1]

WANG T H, LUO Y G, LIU J X, et al. End-to-end self-driving policy based on the deep deterministic policy gradient algorithm considering the state distribution[J]. Journal of Tsinghua University (Science and Technology), 2021, 61(9): 881-888. (in Chinese)

Crossref Google Scholar

[2]

ZONG C F, DAI C H, ZHANG D. Human-machine interaction technology of intelligent vehicles: Current development trends and future directions[J]. China Journal of Highway and Transport, 2021, 34(6): 214-237. (in Chinese)

Crossref Google Scholar

[3]

CHANG W J, CHEN L B, CHIOU Y Z. Design and implementation of a drowsiness-fatigue-detection system based on wearable smart glasses to increase road safety[J]. IEEE Transactions on Consumer Electronics, 2018, 64(4): 461-469.

Crossref Google Scholar

[4]

PLOPSKI A, HIRZLE T, NOROUZI N, et al. The eye in extended reality: A survey on gaze interaction and eye tracking in head-worn extended reality[J]. ACM Computing Surveys, 2023, 55(3): 53.

Crossref Google Scholar

[5]

SHI H L, CHEN L F, WANG X Y, et al. A nonintrusive and real-time classification method for driver's gaze region using an RGB camera[J]. Sustainability, 2022, 14(1): 508.

Crossref Google Scholar

[6]

YUAN G L, WANG Y F, YAN H Z, et al. Self-calibrated driver gaze estimation via gaze pattern learning[J]. Knowledge-Based Systems, 2022, 235: 107630.

Crossref Google Scholar

[7]

LIU M H, DAI H H. Driver gaze zone estimation based on RGB camera[J]. Modern Computer, 2019, 25(36): 69-75. (in Chinese)

Google Scholar

[8]

LUNDGREN M, HAMMARSTRAND L, MCKELVEY T. Driver-gaze zone estimation using Bayesian filtering and Gaussian processes[J]. IEEE Transactions on Intelligent Transportation Systems, 2016, 17(10): 2739-2750.

Crossref Google Scholar

[9]

LU F, SUGANO Y, OKABE T, et al. Adaptive linear regression for appearance-based gaze estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(10): 2033-2046.

Crossref Google Scholar

[10]

AUNSRI N, RATTAROM S. Novel eye-based features for head pose-free gaze estimation with web camera: New model and low-cost device[J]. Ain Shams Engineering Journal, 2022, 13(5): 101731.

Crossref Google Scholar

[11]

YAN Q N, ZHANG W W. Estimation of driver's gaze area based on multi-modal feature fusion[J]. Computer and Digital Engineering, 2022, 50(10): 2217-2222. (in Chinese)

Google Scholar

[12]

WANG Y F, YUAN G L, MI Z T, et al. Continuous driver's gaze zone estimation using RGB-D camera[J]. Sensors, 2019, 19(6): 1287.

Crossref Google Scholar

[13]

HAN K, PAN H W, ZHANG W, et al. Alzheimer's disease classification method based on multi-modal medical images[J]. Journal of Tsinghua University (Science and Technology), 2020, 60(8): 664-671, 682. (in Chinese)

Google Scholar

[14]

RIBEIRO R F, COSTA P D P. Driver gaze zone dataset with depth data[C]//14th International Conference on Automatic Face & Gesture Recognition. Lille, France: IEEE, 2019: 1-5.

Crossref

[15]

GHOSH S, DHALL A, SHARMA G, et al. Speak2Label: Using domain knowledge for creating a large scale driver gaze zone estimation dataset[C]//IEEE/CVF International Conference on Computer Vision Workshops. Montreal, Canada: IEEE, 2021: 2896-2905.

Crossref

[16]

SANDLER M, HOWARD A, ZHU M L, et al. MobileNetv2: Inverted residuals and linear bottlenecks[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 4510-4520.

Crossref

[17]

RANGESH A, ZHANG B W, TRIVEDI M M. Gaze preserving CycleGANs for eyeglass removal and persistent gaze estimation[J]. IEEE Transactions on Intelligent Vehicles, 2022, 7(2): 377-386.

Crossref Google Scholar

[18]

YANG Y R, LIU C S, CHANG F L, et al. Driver gaze zone estimation via head pose fusion assisted supervision and eye region weighted encoding[J]. IEEE Transactions on Consumer Electronics, 2021, 67(4): 275-284.

Crossref Google Scholar

[19]

KRAFKA K, KHOSLA A, KELLNHOFER P, et al. Eye tracking for everyone[C]//IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 2176-2184.

Crossref

[20]

ASSI L, CHAMSEDDINE F, IBRAHIM P, et al. A global assessment of eye health and quality of life: A systematic review of systematic reviews[J]. JAMA Ophthalmology, 2021, 139(5): 526-541.

Crossref Google Scholar

[21]

ZHANG K P, ZHANG Z P, LI Z F, et al. Joint face detection and alignment using multitask cascaded convolutional networks[J]. IEEE Signal Processing Letters, 2016, 23(10): 1499-1503.

Crossref Google Scholar

[22]

ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 2242-2251.

Crossref

[23]

NAN Y H, JU J G, HUA Q Y, et al. A-MobileNet: An approach of facial expression recognition[J]. Alexandria Engineering Journal, 2022, 61(6): 4435-4444.

Crossref Google Scholar

[24]

HAN K, WANG Y H, TIAN Q, et al. GhostNet: More features from cheap operations[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 1577-1586.

Crossref

[25]

WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]//15th European Conference on Computer Vision. Munich, Germany: Springer, 2018: 3-19.

Crossref

[26]

HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3[C]//IEEE/CVF International Conference on Computer Vision. Seoul, Republic of Korea: IEEE, 2019: 1314-1324.

Crossref

Journal of Tsinghua University (Science and Technology)

Volume 64 Issue 1,
January 2024

Pages 44-54

DOI: 10.16511/j.cnki.qhdxxb.2023.26.045

Cite this article:

ZHANG M, LI G, WU C, et al. Estimation algorithm of driver's gaze zone based on lightweight spatial feature encoding network. Journal of Tsinghua University (Science and Technology), 2024, 64(1): 44-54. https://doi.org/10.16511/j.cnki.qhdxxb.2023.26.045

124

Views

Downloads

Crossref

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 03 March 2023

Published: 15 January 2024