Turkish Journal of Electrical Engineering and Computer Sciences
Author ORCID Identifier
AWAIS AHMED: 0000-0002-1514-4643
Abstract
Spoken digit recognition (SDR), a type of supervised automatic speech recognition, is essential for various human-machine interaction applications, including banking operations, dialing systems, price extraction, and airline reservation systems. However, designing an effective SDR system presents several challenges, such as developing labeled audio data, selecting appropriate feature extraction methods, and creating high-performance models. To overcome these challenges, a novel approach for robust spoken digit recognition using an integrated log spectrogram convolutional neural network (ILS-CNN) has been proposed. The proposed work presents an efficient SDR method by taking advantage of a log spectrogram layer directly within the neural network to enhance frequency resolution and improve feature extraction. By embedding the spectrogram calculation within the network, we streamline the preprocessing pipeline and mitigate the discrepancies often introduced by external feature computation. Our ILS-CNN architecture demonstrates significant improvements in recognition accuracy and robustness, particularly in noisy environments, which is crucial for real-world applications. The simulation results demonstrate that the proposed method achieves overall accuracy of 99.3% on noise free FSDD dataset. The proposed ILS-CNN method is also robust for noisy scenarios as it achieves an accuracy of 88.5% even when the signal to noise ratio (SNR) is as low as 0 dB.
DOI
10.55730/1300-0632.4153
Keywords
Spoken digit recognition, automatic speech recognition, convolutional neural network, English spoken digits recognition, speech feature extraction
First Page
706
Last Page
724
Publisher
The Scientific and Technological Research Council of Türkiye (TÜBİTAK)
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Recommended Citation
AHMED, A (2025). Integrated log spectrogram convolutional neural network (ILS-CNN) for robust spoken digit recognition. Turkish Journal of Electrical Engineering and Computer Sciences 33 (6): 706-724. https://doi.org/10.55730/1300-0632.4153
Included in
Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons