The most challenging objective in machine translation of sign language has been the machine?s inability tolearn interoccluding finger movements during an action process. This work addresses the problem of teaching a deeplearning model to recognize differently oriented skeletal data. The multi-view 2D skeletal sign language video data isobtained using 3D motion-captured system. A total of 9 signer views were used for training the proposed network andthe 6 for testing and validation. In order to obtain multi-view deep features for recognition, we proposed an end-to-endtrainable multistream convolutional neural network (CNN) with late feature fusion. The fused multiview features arethen inputted to a two-layer dense and a decision making softmax. The proposed CNN employs numerous layers tocharacterize view correspondence to generate maximally discriminative features. This study is important to understandthe effects of multiview data processing by CNNs for sign language recognition in decoding joint spatial information.Further, deeper perspectives were developed into multiview processing of CNNs by applying skeletal action data.
Skeletal sign language, multiview learning, deep learning, pattern recognition
SHAIK, ASHRAF ALI; MAREEDU, VENKATA DURGA PRASAD; and POLURIE, VENKATA VIJAYA KISHORE
"Learning multiview deep features from skeletal sign language videos forrecognition,"
Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 29:
2, Article 37.
Available at: https://journals.tubitak.gov.tr/elektrik/vol29/iss2/37