Abstract

The feature extraction process is a fundamental part of speech processing. Mel frequency cepstral coefficients (MFCCs) are the most commonly used feature types in the speech/speaker recognition literature. However, the MFCC framework may face numerical issues or dynamic range problems, which decreases their performance. A practical solution to these problems is adding a constant to filter-bank magnitudes before log compression, thus violating the scale-invariant property. In this work, a magnitude normalization and a multiplication constant are introduced to make the MFCCs scale-invariant and to avoid dynamic range expansion of nonspeech frames. Speaker verification experiments are conducted to show the effectiveness of the proposed scheme.

DOI

10.3906/elk-1901-231

Keywords

Feature extraction, speaker recognition, speech recognition

First Page

3758

Last Page

3762

Recommended Citation

TÜFEKCİ, Z, & DİŞKEN, G (2019). Scale-invariant MFCCs for speech/speaker recognition. Turkish Journal of Electrical Engineering and Computer Sciences 27 (5): 3758-3762. https://doi.org/10.3906/elk-1901-231

Download

Included in

Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons

COinS

Turkish Journal of Electrical Engineering and Computer Sciences

Scale-invariant MFCCs for speech/speaker recognition

Abstract

DOI

Keywords

First Page

Last Page

Recommended Citation

Included in

Issues by Year

Search

Turkish Journal of Electrical Engineering and Computer Sciences

Scale-invariant MFCCs for speech/speaker recognition

Authors

Abstract

DOI

Keywords

First Page

Last Page

Recommended Citation

Included in

Share

Issues by Year

Search