SqueezeNet and Fusion Network-Based Accurate Fast Fully Convolutional Network for Hand Detection and Gesture Recognition

SqueezeNet and Fusion Network-Based Accurate Fast Fully Convolutional Network for Hand Detection and Gesture Recognition

Abstract:

Accurate fast hand detection and gesture recognition for hand understanding are still challenging tasks that are influenced by the diversity of hands and the complexity of the scene in color images. To address the above problem, we propose a novel SqueezeNet and fusion network-based fully convolutional network (SF-FCNet) to accurately and quickly perform hand detection and gesture recognition in color images. First, we introduce the first 17-layer structure in the lightweight SqueezeNet as the hand feature extraction network to accelerate the detection and recognition speed by greatly compressing the network parameters. Second, a precise hand prediction fusion network is designed by adding a residual structure to the deconvolutional network to integrate high- and low-level features of hands, and hand detection and gesture recognition are performed on a single convolutional layer at multiple scales to improve the precision and reduce the computational costs. The verification results on the Oxford hand dataset show that SF-FCNet can reach a precision of 84.1% and a speed of 32 FPS. The experimental results show that SF-FCNet can substantially enhance the precision and speed of hand detection and gesture recognition on three benchmark datasets and has a strong generalization ability on a homemade test set.