Authors: CEMİL GÜNDÜZ, HÜSEYİN POLAT
Abstract: Sign languages are nonverbal, visual languages that hearing- or speech-impaired people use for communication.Aside from hands, other communication channels such as body posture and facial expressions are also valuable insign languages. As a result of the fact that the gestures in sign languages vary across countries, the significance ofcommunication channels in each sign language also differs. In this study, representing the communication channels usedin Turkish sign language, a total of 8 different data streams-4 RGB, 3 pose, 1 optical flow-were analyzed. Inception3D was used for RGB and optical flow; and LSTM-RNN was used for pose data streams. Experiments were conductedby merging the data streams in different combinations, and then a sign language recognition system that merged themost suitable streams with the help of a multistream late fusion mechanism was proposed. Considering each data streamindividually, the accuracies of the RGB streams were between 28% and 79%; pose stream accuracies were between9% and 50%; and optical flow data accuracy was 78.5%. When these data streams were used in combination, the signlanguage recognition performance was higher in comparison to any of the data streams alone. The proposed sign languagerecognition system uses a multistream data fusion mechanism and gives an accuracy of 89.3% on BosphorusSign Generaldataset. The multistream data fusion mechanisms have a great potential for improving sign language recognition results.
Keywords: Deep learning, sign language recognition, 3D convolutional neural networks, long short-term memory,recurrent neural networks
Full Text: PDF