Original/Review Paper
H.6.5.2. Computer vision
Mahdi Davari; Razieh Rastgoo
Abstract
Detecting driver distraction is critically important, as it remains a major contributor to road accidents and traffic-related injuries worldwide. This study introduces a novel hybrid deep learning model that integrates Spatio-Temporal Graph Convolutional Networks (ST-GCN) with a Transformer Encoder and ...
Read More
Detecting driver distraction is critically important, as it remains a major contributor to road accidents and traffic-related injuries worldwide. This study introduces a novel hybrid deep learning model that integrates Spatio-Temporal Graph Convolutional Networks (ST-GCN) with a Transformer Encoder and Attention mechanisms to effectively detect distracted driving behaviors. The ST-GCN component captures spatial and temporal dependencies in 3D skeletal motion data, modeling the dynamic body movements of the driver. Following this, a Transformer Encoder is employed to further refine temporal representations by leveraging global attention, allowing the model to understand long-range dependencies and subtle behavioral patterns over time. In addition, an Attention mechanism is applied to emphasize the most informative joints and time frames. To address class imbalance in the dataset, the model uses a focal loss function, which helps focus training on more difficult-to-classify examples. The proposed approach is validated on the 3D skeletal Drive&Act dataset, where it achieves a high accuracy of 97.47%, outperforming existing models, particularly under challenging conditions such as poor lighting and complex driving environments. The system demonstrates strong potential for real-time driver monitoring, offering an intelligent solution to enhance road safety and reduce accident risks through early detection of driver distraction.