[1] A. S. Dhanjal and W. Singh, “A comprehensive survey on automatic speech recognition using neural networks,” Multimedia Tools and Applications, vol. 83, no. 8, pp. 23367–23412, Mar. 2024.
[2] H. Veisi and A. H. Mani, “Persian speech recognition using deep learning,” International Journal of Speech Technology, vol. 23, no. 4, pp. 893–905, Dec. 2020.
[3] M. S. Zandi and R. Rajabi, “Deep learning based framework for Iranian license plate detection and recognition,” Multimedia Tools and Applications, vol. 81, no. 11, pp. 15841–15858, May 2022.
[4] A. Kavand and M. Bekrani, “Speckle noise removal in medical ultrasonic image using spatial filters and DnCNN,” Multimedia Tools and Applications, vol. 83, pp. 45903–45920, May 2024.
[5] D. Yu and L. Deng, Automatic speech recognition: A deep learning approach, Springer Publishing Company, 2016.
[6] M. H. Rahimi Pour, N. Rastin, and M. M. Kermani, “Persian automatic speech recognition by the use of whisper model,” in 20th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP), Iran, Feb. 2024.
[7] M. M. Homayounpour, J. Kabudian, H. Bashiri, and Z. Ahmadpour, “Recognition of Farsi number over telephone: A comparison of statistical neural and hybrid approaches,” Amirkabir, vol. 14, no. 56, pp. 1045–1065, Jan. 2003.
[8] M. M. Homayounpour, “FarsDigits database,” in Technical Report, Laboratory for Intelligent Sound and Speech Processing, Amirkabir University of Technology, 2005.
[9] J. V. Doremalen and L. Boves, “Spoken digit recognition using a hierarchical temporal memory,” in Interspeech, 2008, pp. 2566–2569.
[10] N. Hammami, M. Bedda, N. Farah, and R. O. Lakehal-Ayat, “Spoken Arabic digits recognition based on (GMM) for e-Quran voice browsing: Application for blind category,” in International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, 2013, pp. 123–127.
[11] D. Dhanashri and S. B. Dhonde, “Isolated word speech recognition system using deep neural networks,” in International Conference on Data Engineering and Communication Technology: ICDECT 2016, vol. 1, 2017, pp. 9–17.
[12] R. G. Leonard and G. Doddington, “TIDIGITS dataset,” Linguistic Data Consortium, Philadelphia, 1993.
[13] B. Zada and R. Ullah, “Pashto isolated digits recognition using deep convolutional neural network,” Heliyon, vol. 6, no. 2, Feb. 2020.
[14] S. Tabibian, “Robust Persian isolated digit recognition based on LSTM and speech spectral features,” Iranian Journal of Electrical and Computer Engineering, vol. 86, no. 19, pp. 1–17, Oct. 2021.
[15] S. M. Hoseini, “Recognition of Persian digits from zero to nine using acoustic images based on Mel Cepstrum coefficients and neural network,” International Journal of Mechatronics, Electrical and Computer Technology, vol. 11, no. 42, pp. 5059–5064, 2020.
[16] J. Oruh, S. Viriri, and A. Adegun, “Long short-term memory recurrent neural network for automatic speech recognition,” IEEE Access, vol. 10, pp. 30069–30079, 2022.
[17] C. Amadeus, I. Syafalni, N. Sutisna, and T. Adiono, “Digit number speech recognition using spectrogram-based convolutional neural network,” in International Symposium on Electronics and Smart Devices (ISESD), 2022, pp. 1–6.
[18] B. Paul and S. Phadikar, “A hybrid feature-extracted deep CNN with reduced parameters substitutes an end-to-end CNN for the recognition of spoken Bengali digits,” Multimedia Tools and Applications, vol. 83, no. 1, pp. 1669–1692, Jan. 2024.
[19] A. A. Ramadan and K. M. Ezzat, “Spoken digit recognition using machine and deep learning-based approaches,” in International Telecommunications Conference (ITC-Egypt), Alexandria, Egypt, 2023, pp. 592-596.
[21] K. Lounnas, M. Lichouri, and M. Abbas, “Analysis of the effect of audio data augmentation techniques on phone digit recognition for Algerian Arabic dialect,” in International Conference on Advanced Aspects of Software Engineering (ICAASE), 2022, pp. 1–5.
[22] T. Ko, V. Peddinti, D. Povey, M. L. Seltzer, and S. Khudanpur, “A study on data augmentation of reverberant speech for robust speech recognition,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 5220–5224.
[23] F. Mahdavi, H. Zayyani, and R. Rajabi, “RSS localization using an optimized fusion of two deep neural networks,” IEEE Sensors Letters, vol. 5, no. 12, pp. 1–4, Dec. 2021.
[24] W. Hartmann, T. Ng, R. Hsiao, S. Tsakalidis, and R. M. Schwartz, “Two-stage data augmentation for low-resourced speech recognition,” in Proc. Interspeech, vol. 9, 2016, pp. 2378–2382.
[25] D. S. Park, W. Chan, Y. Zhang, C. C. Chiu, B. Zoph, E. D. Cubuk, Q. V. and Le, “Specaugment: A simple data augmentation method for automatic speech recognition,” arXiv preprint, arXiv:1904.08779, 2019.
[27] N. Dave, “Feature extraction methods LPC, PLP and MFCC in speech recognition,” International Journal for Advance Research in Engineering and Technology, vol. 1, no. 6, pp. 1–4, July 2013.
[28] D. Amodei, et al., “Deep speech 2: End-to-end speech recognition in English and Mandarin,” in International Conference on Machine Learning, 2016, pp. 173–182.
[29] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
[30] S. H. S. Basha, S. R. Dubey, V. Pulabaigari, and S. Mukherjee, “Impact of fully connected layers on performance of convolutional neural networks for image classification,” Neurocomputing, vol. 378, pp. 112–119, Feb. 2020.
[31] Q. Tao, F. Liu, Y. Li, and D. Sidorov, “Air pollution forecasting using a deep learning model based on 1D convnets and bidirectional GRU,” IEEE Access, vol. 7, pp. 76690–76698, June 2019.
[32] A. Zakir, et al. “Database development and automatic speech recognition of isolated Pashto spoken digits using MFCC and K-NN,” International Journal of Speech Technology, vol. 18, pp. 271-275, June 2015.