Accuracy Improvement of Real-Time Driver Drowsiness Detection Using Transformer Model

Askari, Havva; Rastgoo, Razieh; Kiani, Kourosh

doi:10.22044/jadm.2025.16191.2743

Document Type : Original/Review Paper

Authors

Electrical and Computer Engineering Department, Semnan University, Semnan, Iran.

https://doi.org/10.22044/jadm.2025.16191.2743

Abstract

Drowsiness remains a significant challenge for drivers, often resulting from extended working hours, inadequate sleep, and accumulated fatigue. This condition not only impairs reaction time and decision-making but also contributes to a substantial number of road accidents globally. Therefore, reliable and timely detection of driver drowsiness is essential for enhancing transportation safety and reducing the risk of traffic-related fatalities. With the rapid progress in deep learning, numerous models have been developed to detect driver drowsiness with high accuracy. However, the real-world performance of these models can deteriorate under varying environmental conditions, such as changes in cabin illumination, facial occlusions, and dynamic shadows on the driver’s face. To address these limitations, this paper proposes a robust, real-time driver drowsiness detection model that leverages facial behavioral features and a Transformer-based neural network architecture. The Mediapipe framework is utilized to extract a comprehensive set of facial keypoints, capturing subtle facial movements and expressions indicative of drowsiness. These keypoints are then encoded to form feature vectors that serve as input to the Transformer network, enabling effective temporal modeling of facial dynamics. The proposed model is trained and evaluated on the National Tsing Hua University (NTHU) Driver Drowsiness Detection dataset, achieving a state-of-the-art accuracy of 99.71%, demonstrating its potential for deployment in real-world in-vehicle systems.

Keywords

Main Subjects

H.6.5.2. Computer vision

References

[1] S.A. El-Nabi, W. El-Shafai, ES.M. El-Rabaie, et al., “Machine learning and deep learning techniques for driver fatigue and drowsiness detection: a review,” Multimed Tools Appl, 2023. https://doi.org/10.1007/s11042-023-15054-0

[2] “NHTSA report (accessed on Nov 29, 2023).” [Online]. Available: https://www.nhtsa.gov.

[3] J. Flores-Monroy, M. Nakano-Miyatake, G. Sanchez-Perez, and H. Perez-Meana, “Visual-based real time driver drowsiness detection system using CNN,” in 2021 18th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE), IEEE, 2021, pp. 1–5.

[4] P. Mate, N. Apte, M. Parate, et al., “Detection of driver drowsiness using transfer learning techniques,” Multimed Tools Appl, 2023.

[5] A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv Prepr. ArXiv201011929, 2020.

[6] E. Magán, M. P. Sesmero, J. M. Alonso-Weber, and A. Sanchis, “Driver Drowsiness Detection by Applying Deep Learning Techniques to Sequences of Images,” Appl. Sci., vol. 12, no. 3, Art. no. 3, Jan. 2022, doi: 10.3390/app12031145.

[7] R. Rastgoo, K. Kiani, S. Escalera, “Diffusion-Based Continuous Sign Language Generation with Cluster-Specific Fine-Tuning and Motion-Adapted Transformer,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp. 4088-4097, 2025.

[8] K. Kiani, R. Rastgoo, A. Chaji, S. Escalera, “Image Inpainting Enhancement by Replacing the Original Mask with a Self-attended Region from the Input Image,” Journal of AI and Data Mining, vol. 13, no. 3, pp. 379-391, 2025.

[9] N. Esfandiari, K. Kiani, R. Rastgoo, “Development of a Persian Mobile Sales Chatbot based on LLMs and Transformer,” Journal of AI and Data Mining, vol. 12, no. 4, pp. 465-472, 2024.

[10] N. Esfandiari, K. Kiani, R. Rastgoo, “Transformer-based Generative Chatbot Using Reinforcement Learning,” Journal of AI and Data Mining, vol. 12, no. 3, pp. 349-358, 2024.

[11] A.M. Ahmadi, K. Kiani, R. Rastgoo, “A Transformer-based model for abnormal activity recognition in video,” Journal of Modeling in Engineering, vol. 22, no. 76, pp. 213-221, 2024.

[12] F. Bagherzadeh, R. Rastgoo, “Deepfake image detection using a deep hybrid convolutional neural network,” Journal of Modeling in Engineering, vol. 21, no. 75, pp. 19-28, 2023.

[13] M. Talebian, K. Kiani, R. Rastgoo, “A Deep Learning-based Model for Fingerprint Verification,” Journal of AI and Data Mining, vol. 12, no. 2, pp. 241-248, 2024.

[14] H. Zaferani, K. Kiani, R. Rastgoo, “Real-time face verification on mobile devices using margin distillation,” Multimedia Tools and Applications, vol. 82, no. 28, pp. 44155-44173, 2023.

[15] S. Zarbafi, K. Kiani, R. Rastgoo, “Spoken Persian digits recognition using deep learning,” Journal of Modeling in Engineering, vol. 21, no. 74, pp. 163-172, 2023.

[16] N. Esfandiari, K. Kiani, R. Rastgoo, “A conditional generative chatbot using transformer model,” arXiv:2306.02074, 2023.

[17] N. Majidi, K. Kiani, R. Rastgoo, “A deep model for super-resolution enhancement from a single image,” Journal of AI and Data Mining, vol. 8, no. 4, pp. 451-460, 2020.

[18] R. Rastgoo, K. Kiani, “Face recognition using fine-tuning of Deep Convolutional Neural Network and transfer learning,” Journal of Modeling in Engineering, vol. 17, no. 58, pp. 103-111, 2019.

[19] R. Rastgoo, V. Sattari-Naeini, “Gsomcr: Multi-constraint genetic-optimized qos-aware routing protocol for smart grids,” Iranian Journal of Science and Technology, Transactions of Electrical, Engineering, vol. 42, no. 2, pp. 185-194, 2018.

[20] R. Rastgoo, V. Sattari-Naeini, “Tuning parameters of the QoS-aware routing protocol for smart grids using genetic algorithm,” Applied Artificial Intelligence, vol. 30, no. 1, pp. 52-76, 2016.

[21] R. Rastgoo, V. Sattari Naeini, “A neurofuzzy QoS-aware routing protocol for smart grids,” 22nd Iranian Conference on Electrical Engineering (ICEE), pp. 1080-1084, 2014.

[22] F. Alinezhad, K. Kiani, R. Rastgoo, “A Deep Learning-based Model for Gender Recognition in Mobile Devices,” Journal of AI and Data Mining, vol. 11, no. 2, pp. 229-236, 2023.

[23] S. Shekarizadeh, R. Rastgoo, S. Al-Kuwari, M. Sabokrou, “Deep-disaster: unsupervised disaster detection and localization using visual data,” 26th International Conference on Pattern Recognition (ICPR), pp. 2814-2821, 2022.

[24] R. Rastgoo, K. Kiani, S. Escalera, “ZS-GR: zero-shot gesture recognition from RGB-D videos,” Multimedia Tools and Applications, vol. 82, no. 28, pp. 43781-43796, 2023.

[25] R. Rastgoo, K. Kiani, S. Escalera, “A deep co-attentive hand-based video question answering framework using multi-view skeleton,” Multimedia Tools and Applications, vol. 82, no. 1, pp. 1401-1429, 2023.

[26] A. Pourreza, K. Kiani, “A partial-duplicate image retrieval method using color-based SIFT,” 24th Iranian Conference on Electrical Engineering (ICEE), pp. 1410-1415, 2016.

[27] A. Fakhari, K. Kiani, “A new restricted boltzmann machine training algorithm for image restoration,” Multimedia Tools and Applications, vol. 80, no. 2, pp. 2047-2062, 2021.

[28] A. Alsayat, “Improving Sentiment Analysis for Social Media Applications Using an Ensemble Deep Learning Language Model,” Arabian Journal for Science and Engineering, vol. 47, pp. 2499–2511, 2022.

[29] A. Mukhamadiyev, L. Khujayarov, O. Djuraev, J. Cho, “Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language,” Sensors, vol. 22, no. 10, 2023.

[30] T. D. Pereira et al., “SLEAP: A deep learning system for multi-animal pose tracking,” Nat. Methods, vol. 19, no. 4, 2022, doi: 10.1038/s41592-022-01426-1.

[31] J. Cui et al., “A compact and interpretable convolutional neural network for cross-subject driver drowsiness detection from single-channel EEG,” Methods, vol. 202, pp. 173–184, 2022.

[32] M. H. Z. M. Fodli, F. H. K. Zaman, N. K. Mun, and L. Mazalan, “Driving Behavior Recognition using Multiple Deep Learning Models,” in 2022 IEEE 18th International Colloquium on Signal Processing & Applications (CSPA), IEEE, 2022, pp. 138–143.

[33] A. Quddus, A. S. Zandi, L. Prest, and F. J. Comeau, “Using long short-term memory and convolutional neural networks for driver drowsiness detection,” Accid. Anal. Prev., vol. 156, pp. 106107, 2021.

[34] S. Anber, W. Alsaggaf, and W. Shalash, “A Hybrid Driver Fatigue and Distraction Detection Model Using AlexNet Based on Facial Features,” Electronics, vol. 11, no. 2, p. 285, 2022.

[35] I. Jahan et al., “4D: A Real-Time Driver Drowsiness Detector Using Deep Learning,” Electronics, vol. 12, no. 1, 2023, doi: 10.3390/electronics12010235.

[36] A. Vaswani et al., “Attention is all you need,” Adv. Neural Inf. Process. Syst., vol. 30, 2017.

[37] Y. LeCun and Y. Bengio, “Convolutional networks for images, speech, and time series,” Handb. Brain Theory Neural Netw, vol. 3361, no. 10, p. 1995, 1995.

[38] R. Yogesh, V. Ritheesh, S. Reddy, and R. G. Rajan, “Driver Drowsiness Detection and Alert System using YOLO,” in 2022 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES), IEEE, 2022, pp. 1–6, 2022.

[39] J. Bai et al., “Two-stream spatial-temporal graph convolutional networks for driver drowsiness detection,” IEEE Trans. Cybern., 2021.

[40] M. Omidyeganeh et al., “YawDD: Yawning Detection Dataset.” IEEE Transactions on Instrumentation and Measurement, vol. 65, no. 3, 2016. https://ieee-dataport.org/open-access/yawdd-yawning-detection-dataset.

[41] Ch.H. Weng, Y.H. Lai, Sh.H. Lai, “Driver Drowsiness Detection via a Hierarchical Temporal Deep Belief Network,” In Asian Conference on Computer Vision Workshop on Driver Drowsiness Detection from Video, Taipei, Taiwan, Nov. 2016

[42] S. E. Bekhouche, Y. Ruichek, and F. Dornaika, “Driver drowsiness detection in video sequences using hybrid selection of deep features,” Knowledge-Based Systems, vol. 252, pp. 109436, 2022.

[43] A. Aytekin and V. Mençik, “Detection of Driver Dynamics with VGG16 Model,” Appl. Comput. Syst., vol. 27, no. 1, pp. 83–88, 2022.

[44] H. Ja, Z. Xiao, and P. Ji, “Real-time fatigue driving detection system based on multi-module fusion,” Comput. Graph, vol. 108, pp. 22–33, 2022.

[45] G. S. Krishna, K. Supriya, and J. Vardhan, “Vision Transformers and YoloV5 based driver drowsiness detection framework,” arXiv Prepr. ArXiv220901401, 2022.

[46] R. Ghoddoosian, M. Galib, and V. Athitsos, “A realistic dataset and baseline temporal model for early drowsiness detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops (CVPRW), 2019, pp. 1–10.

[47] C. Lugaresi et al., “Mediapipe: A framework for building perception pipelines,” ArXiv190608172, 2019.

[48] R. Jabbar, K. Al-Khalifa, M. Kharbeche, W. Alhajyaseen, M. Jafari, and S. Jiang, “Real-time driver drowsiness detection for android application using deep neural networks techniques,” Procedia Comput. Sci., vol. 130, pp. 400–407, 2018.

Accuracy Improvement of Real-Time Driver Drowsiness Detection Using Transformer Model

References

References

Volume 13, Issue 4October 2025Pages 481-490

Volume 13, Issue 4
October 2025
Pages 481-490