Turkish Journal of Electrical Engineering and Computer Sciences




Recent developments in man--machine interaction have motivated researchers to recognize human emotion from speech signals. In this study, we propose using nonlinear dynamics features (NLDs) for emotion recognition. NLDs are extracted from the geometrical properties of the reconstructed phase space of speech signals. The traditional prosodic and spectral features are also used as a benchmark. The Fisher discriminant ratio acts as a filter to remove irrelevant features quickly. Then a wrapper method based on a genetic algorithm and support vector machine is employed to find the best feature subset that obtains the maximum recognition rate. The classification accuracy of the proposed system is evaluated using a 10-fold cross-validation technique on the Berlin database. Our results show that combining the proposed features with prosodic and spectral features notably reduces the classification ambiguity between joy and anger, which are highly confused. The NLDs further render a substantial improvement of 3.32{\%} for females and 7.27{\%} for males in recognition performance when used to augment prosodic and spectral features. Finally, by using all types of features for classifying 7 emotion categories, overall recognition rates of 82.72{\%} and 85.90{\%} are obtained for females and males, respectively.


Nonlinear dynamics features, phase space reconstruction, speech emotion recognition, Fisher discriminant ratio

First Page


Last Page