Turkish Journal of Electrical Engineering and Computer Sciences




A motion history image (MHI) is a temporal template that collapses temporal motion information into a single image in which intensity is a function of recency of motion. In recent years, the popularity of deep learning architectures for human activity recognition has encouraged us to explore the effectiveness of combining them and MHIs. Based on this, two new methods are introduced in this paper. In the first method, which is called the basic method, each video splits into N groups of consecutive frames, and the MHI is calculated for each group. Transfer learning with the fine-tuning technique is used for classifying these temporal templates. The experimental results show that some misclassification errors are created because of the similarities between these temporal templates; these errors can be corrected by detecting specific objects in the scenes. Thus, spatial information consisting of a single frame is also added to the second method, called the proposed method. By converting video classification problems into image classification problems in the proposed method, less memory is needed and the time complexity is greatly reduced. They are implemented and compared with state-of-the-art approaches on two data sets. The results show that the proposed method significantly outperforms the others. It achieves recognition accuracies of 92% and 92.4% for the UCF Sport and UCF-11 action data sets, respectively.


Motion history image, pretrained network, spatial stream, temporal stream

First Page


Last Page