AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (7.9 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Publishing Language: Chinese

Estimation algorithm of driver's gaze zone based on lightweight spatial feature encoding network

Mingfang ZHANG1( )Guilin LI1Chuna WU2Li WANG1Lianghao TONG1
Beijing Key Laboratory of Urban Road Intelligent Traffic Control Technology, North China University of Technology, Beijing 100144, China
Key Laboratory of Operation Safety Technology on Transport Vehicles, Research Institute of Highway, Ministry of Transport, Beijing 100088, China
Show Author Information

Abstract

Objective

The real-time monitoring of a driver's gaze region is essential for human-machine shared driving vehicles to understand and predict the driver's intentions. Because of the limited computational resources and storage capacity of in-vehicle platforms, existing gaze region estimation algorithms often hardly balance accuracy and real-time performance and ignore temporal information.

Methods

Therefore, this paper proposes a lightweight spatial feature encoding network (LSFENet) for driver gaze region estimation. First, the image sequence of the driver's upper body is captured by an RGB camera. Image preprocessing steps, including face alignment and glasses removal, are performed to obtain left- and right-eye images and facial keypoint coordinates to handle challenges such as cluttered backgrounds and facial occlusions in the captured images. Face alignment is conducted using the multi-task cascaded convolutional network algorithm, and the glasses are removed using the cycle-consistent adversarial network algorithm. Second, we build the LSFENet feature extraction network based on the GCSbottleneck module to improve the MobileNetV2 architecture, since the inverted residual structure in the MobileNetV2 network requires a significant amount of memory and floating-point operations and ignores the redundancy and the correlation among the feature maps. We embed a ghost module to improve memory consumption and integrate the channel and spatial attention modules to extract the cross-channel and spatial information from the feature map. Next, the Kronecker product is used to fuse eye features with facial keypoint features to reduce the impact of the information complexity imbalance. Then, the fused features from the images at continuous frames are input into a recurrent neural network to estimate the gaze zone of the image sequence. Finally, the proposed network is evaluated using the public driver gaze in the wild (DGW) dataset and a self-collected dataset. The evaluation metrics include the number of parameters, the floating-point operations per second (FLOPs), the frames per second (FPS), and the F1 score.

Results

The experimental results showed the following: (1) The gaze region estimation accuracy of the proposed algorithm was 97.08%, which was approximately 7% higher than that of the original MobileNetV2. Additionally, both the number of parameters and FLOPs were reduced by 22.5%, and the FPS was improved by 36.43%. The proposed network had a frame rate of approximately 103 FPS and satisfied the computational efficiency and accuracy requirements under in-vehicle environments. (2) The estimation accuracies of the gaze regions 1, 2, 3, 4, and 9 were over 85% for the proposed algorithm. The macro-average and micro-average precisions of the DGW dataset reached 74.32% and 76.01%, respectively. (3) The proposed algorithm provided high classification accuracy for fine-grained eye images with small intra-class differences. (4) The visualization results of the class activation mapping demonstrated that the proposed algorithm had strong adaptability to various lighting conditions and glass occlusion situations.

Conclusions

The research results are of great significance for the recognition of a driver's visual distraction states.

CLC number: U495 Document code: A Article ID: 1000-0054(2024)01-0044-11

References

[1]

WANG T H, LUO Y G, LIU J X, et al. End-to-end self-driving policy based on the deep deterministic policy gradient algorithm considering the state distribution[J]. Journal of Tsinghua University (Science and Technology), 2021, 61(9): 881-888. (in Chinese)

[2]

ZONG C F, DAI C H, ZHANG D. Human-machine interaction technology of intelligent vehicles: Current development trends and future directions[J]. China Journal of Highway and Transport, 2021, 34(6): 214-237. (in Chinese)

[3]

CHANG W J, CHEN L B, CHIOU Y Z. Design and implementation of a drowsiness-fatigue-detection system based on wearable smart glasses to increase road safety[J]. IEEE Transactions on Consumer Electronics, 2018, 64(4): 461-469.

[4]

PLOPSKI A, HIRZLE T, NOROUZI N, et al. The eye in extended reality: A survey on gaze interaction and eye tracking in head-worn extended reality[J]. ACM Computing Surveys, 2023, 55(3): 53.

[5]

SHI H L, CHEN L F, WANG X Y, et al. A nonintrusive and real-time classification method for driver's gaze region using an RGB camera[J]. Sustainability, 2022, 14(1): 508.

[6]

YUAN G L, WANG Y F, YAN H Z, et al. Self-calibrated driver gaze estimation via gaze pattern learning[J]. Knowledge-Based Systems, 2022, 235: 107630.

[7]

LIU M H, DAI H H. Driver gaze zone estimation based on RGB camera[J]. Modern Computer, 2019, 25(36): 69-75. (in Chinese)

[8]

LUNDGREN M, HAMMARSTRAND L, MCKELVEY T. Driver-gaze zone estimation using Bayesian filtering and Gaussian processes[J]. IEEE Transactions on Intelligent Transportation Systems, 2016, 17(10): 2739-2750.

[9]

LU F, SUGANO Y, OKABE T, et al. Adaptive linear regression for appearance-based gaze estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(10): 2033-2046.

[10]

AUNSRI N, RATTAROM S. Novel eye-based features for head pose-free gaze estimation with web camera: New model and low-cost device[J]. Ain Shams Engineering Journal, 2022, 13(5): 101731.

[11]

YAN Q N, ZHANG W W. Estimation of driver's gaze area based on multi-modal feature fusion[J]. Computer and Digital Engineering, 2022, 50(10): 2217-2222. (in Chinese)

[12]

WANG Y F, YUAN G L, MI Z T, et al. Continuous driver's gaze zone estimation using RGB-D camera[J]. Sensors, 2019, 19(6): 1287.

[13]

HAN K, PAN H W, ZHANG W, et al. Alzheimer's disease classification method based on multi-modal medical images[J]. Journal of Tsinghua University (Science and Technology), 2020, 60(8): 664-671, 682. (in Chinese)

[14]
RIBEIRO R F, COSTA P D P. Driver gaze zone dataset with depth data[C]//14th International Conference on Automatic Face & Gesture Recognition. Lille, France: IEEE, 2019: 1-5.
[15]
GHOSH S, DHALL A, SHARMA G, et al. Speak2Label: Using domain knowledge for creating a large scale driver gaze zone estimation dataset[C]//IEEE/CVF International Conference on Computer Vision Workshops. Montreal, Canada: IEEE, 2021: 2896-2905.
[16]
SANDLER M, HOWARD A, ZHU M L, et al. MobileNetv2: Inverted residuals and linear bottlenecks[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 4510-4520.
[17]

RANGESH A, ZHANG B W, TRIVEDI M M. Gaze preserving CycleGANs for eyeglass removal and persistent gaze estimation[J]. IEEE Transactions on Intelligent Vehicles, 2022, 7(2): 377-386.

[18]

YANG Y R, LIU C S, CHANG F L, et al. Driver gaze zone estimation via head pose fusion assisted supervision and eye region weighted encoding[J]. IEEE Transactions on Consumer Electronics, 2021, 67(4): 275-284.

[19]
KRAFKA K, KHOSLA A, KELLNHOFER P, et al. Eye tracking for everyone[C]//IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 2176-2184.
[20]

ASSI L, CHAMSEDDINE F, IBRAHIM P, et al. A global assessment of eye health and quality of life: A systematic review of systematic reviews[J]. JAMA Ophthalmology, 2021, 139(5): 526-541.

[21]

ZHANG K P, ZHANG Z P, LI Z F, et al. Joint face detection and alignment using multitask cascaded convolutional networks[J]. IEEE Signal Processing Letters, 2016, 23(10): 1499-1503.

[22]
ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 2242-2251.
[23]

NAN Y H, JU J G, HUA Q Y, et al. A-MobileNet: An approach of facial expression recognition[J]. Alexandria Engineering Journal, 2022, 61(6): 4435-4444.

[24]
HAN K, WANG Y H, TIAN Q, et al. GhostNet: More features from cheap operations[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020: 1577-1586.
[25]
WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]//15th European Conference on Computer Vision. Munich, Germany: Springer, 2018: 3-19.
[26]
HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3[C]//IEEE/CVF International Conference on Computer Vision. Seoul, Republic of Korea: IEEE, 2019: 1314-1324.
Journal of Tsinghua University (Science and Technology)
Pages 44-54
Cite this article:
ZHANG M, LI G, WU C, et al. Estimation algorithm of driver's gaze zone based on lightweight spatial feature encoding network. Journal of Tsinghua University (Science and Technology), 2024, 64(1): 44-54. https://doi.org/10.16511/j.cnki.qhdxxb.2023.26.045

124

Views

2

Downloads

0

Crossref

0

Scopus

0

CSCD

Altmetrics

Received: 03 March 2023
Published: 15 January 2024
© Journal of Tsinghua University (Science and Technology). All rights reserved.
Return