DOI

10.3906/elk-1901-231

Abstract

The feature extraction process is a fundamental part of speech processing. Mel frequency cepstral coefficients (MFCCs) are the most commonly used feature types in the speech/speaker recognition literature. However, the MFCC framework may face numerical issues or dynamic range problems, which decreases their performance. A practical solution to these problems is adding a constant to filter-bank magnitudes before log compression, thus violating the scale-invariant property. In this work, a magnitude normalization and a multiplication constant are introduced to make the MFCCs scale-invariant and to avoid dynamic range expansion of nonspeech frames. Speaker verification experiments are conducted to show the effectiveness of the proposed scheme.

Keywords

Feature extraction, speaker recognition, speech recognition

First Page

3758

Last Page

3762

Recommended Citation

TÜFEKCİ, ZEKERİYA and DİŞKEN, GÖKAY (2019) "Scale-invariant MFCCs for speech/speaker recognition," Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 27: No. 5, Article 34. https://doi.org/10.3906/elk-1901-231
Available at: https://journals.tubitak.gov.tr/elektrik/vol27/iss5/34

Download

Included in

Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons

COinS

Turkish Journal of Electrical Engineering and Computer Sciences

Scale-invariant MFCCs for speech/speaker recognition

DOI

Abstract

Keywords

First Page

Last Page

Recommended Citation

Included in

Issues by Year

Search

Turkish Journal of Electrical Engineering and Computer Sciences

Scale-invariant MFCCs for speech/speaker recognition

Authors

DOI

Abstract

Keywords

First Page

Last Page

Recommended Citation

Included in

Share

Issues by Year

Search