Abstract:Traditional CNN models are suitable for the feature extraction of gray image sequences or the color image separate channels, which ignores the interdependency among the channels and destroys the color features of real world objects, thereby affects the accuracy rate of human body action recognition. In order to solve this problem, a human body action recognition method is proposed based on quaternion spatialtemporal convolutional neural network (QSTCNN). Firstly, codebook algorithm is adopted to process all the images in the sample set and extract the key regions of human body motion in the images. Then,the quaternion matrix expression of the color images is taken as the input of the QSTCNN. The spatial convolutional layer of CNN is expended as a quaternion spatial convolutional layer. The values of the red, green, and blue channels of the color images are considered simultaneously as a whole in a spatial convolutional layer to conduct the extraction of the action spatial features, and avoid the loss of spatial relationships. The dynamical information of adjacent frames is extracted in a temporal convolutional layer. Finally, experiment was conducted, in which QSTCNN, gray single channel CNN (GrayCNN) and RGB three channel CNN (3 ChannelCNN) were compared. The experiment result demonstrates that the QSTCNN boosts the performance of action recognition, the proposed method is superior to other popular methods and achieves the recognition rates of 85.34% and 80.2% in the Weizmann and UCF sports datasets, respectively.