CONCEPTUAL REGIONAL ORIGIN RECOGNITION USING CNN CONVOUTION NEURAL NETWORK ON BANDUNG, BOGOR AND CIREBON REGIONAL ACCENTS

Authors

  • Adam Huda Nugraha Gunadarma University
  • Achmad Benny Mutiara Gunadarma University
  • Dewi Agushinta Rahayu Gunadarma University

DOI:

https://doi.org/10.56127/ijml.v2i2.696

Keywords:

Sound, MFCC, CNN architecture

Abstract

Sound detection is a challenge in machine learning due to the noisy nature of signals, and the small amount of (labeled) data that is usually available. The need for sound detection in Indonesia is quite important because there are many community organizations that form groups according to the land of their origin. Especially in big cities, where people from various tribes gather and exchange cultures. However, it has a disadvantage that affects these tribes, namely the loss of the original culture of certain areas. The Sundanese are the object of this research, including Bandung, Bogor and Cirebon. Voice data is divided into 2 types, namely male and female, each region consists of 50 respondents with 25 male and female voices with a maximum voting time of 1 minute. The method used is CNN architecture based on supervised learning, preprocessing using MFCC (Mel Frequency Cepstral Coefficients) to obtain feature extraction from voice data. CNN architecture is carried out 3 times convolution with max pooling and dropout on each convolution.

References

Aggarwal, A, Sahay, T,, dan Chandra, M 2015, Performance Evaluation of Artificial Neural Networks for Isolated Hindi Digit Recognition with LPC And MFCC, International Conference on Advanced Computing and Communication Systems, 2015, pages 1-6, IEEE.

Al-Haddad, S, A, R,, Samad, S, A,, Hussain, A,, Ishak, K, A, dan Mirvaziri, H 2007, Decision Fusion for Isolated Malay Digit Recognition Using Dynamic Time Warping (DTW) And Hidden Markov Model (HMM), SCORED 2007, 5th Student Conference on Research and Development, pages 1-6, IEEE

Ali, H,, Jianwei, A, dan Iqbal, K 2015, Automatic Speech Recognition of Urdu Digits with Optimal Classification Approach, International Journal of Computer Applications, 118(9)

Alkhawaldeh, R. S. (2019). DGR: gender recognition of human speech using one-dimensional conventional neural network. Scientific Programming, 2019.

Anna, N & Santoso, CL 1997, Pendidikan anak, edk 5, Family Press, Jakarta.

Azis, A., Wardhono, W. S., & Afirianto, T., 2020, Pengembangan Media Pembelajaran Holografis (Studi Kasus: Bab Indera Pendengaran Manusia),Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer e-ISSN, 2548, 964X.

Chapaneri, S, V, dan Jayaswal, D, J 2013, Efficient Speech Recognition System for Isolated Digits, International Journal Computer Science and Engineering Technologies, 4(3):228–236

Chavan, M, R, S, & Sable, G, S 2013, An Overview of Speech Recognition Using HMM, International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering (IJAREEIE), 2(6):233,238.

Chu, W, C 2003, Speech Coding Algorithms: Foundation and Evolution of Standardized Coders, A John Wiley & Sons Inc.

Computer Graphics Inter-Facing 1996, 3rd edn, Modern technology Corporation, Minnepolis.

Conley, D 2002, The daily miracle: an introduction to journalism, Oxford University Press, New York.

Cucu, H,, Caranica, A,, Buzo, A, dan Burileanu, C 2015, On Transcribing Informally-Pronounced Numbers In Romanian Speech, 38th International Conference on Telecommunications and Signal Processing (TSP) 2015, pages 372–376, IEEE.

Darabkh, K, A, Khalifeh, A, F,, Bathech, B, A,, dan Sabah, S, W 2013, Efficient DTW-Based Speech Recognition System for Isolated Words of Arabic Language, Proceedings of International Conference on Electrical and Computer Systems Engineering (ICECSE 2013), pages 689–692, Citeseer

Davis, S, B, dan Mermelstein, P 1990, Comparison of Parametric Representations for Monosyllabic Word Recognition In Continuously Spoken Sentences, Readings in Speech Recognition, pages 65–74, Elsevier.

Deng, M. et al. (2020). Heart sound classification based on improved MFCC features and convolutional recurrent neural networks. Neural Networks, 130, 22–32.

Dewi, I, N, Firdausillah, F,, dan Supriyanto, C 2013, Sphinx-4 Indonesian Isolated Digit Speech Recognition, Journal of Theoretical & Applied Information Technology, 53(1).

Dhandhania, V, Hansen, J, K,, Kandi, S, J, dan Ramesh, A 2012, A Robust Speaker Independent Speech Recognizer for Isolated Hindi Digits, International Journal of Computer and Communication Engineering, 1(4):483.

Dixit, A,, Vidwans, A,, dan Sharma, P 2016, Improved MFCC And LPC Algorithm for Bundelkhandi Isolated Digit Speech Recognition, International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), pages 3755–3759, IEEE

Dufaux, A. et al. (2000). Automatic sound detection and recognition for noisy environment. 2000 10th European Signal Processing Conference, 1–4.

Ertam, F. (2019). An effective gender recognition approach using voice data via deeper LSTM networks. Applied Acoustics, 156, 351–358.

Graves, A,, Mohamed, A, R,, dan Hinton, G 2013, Speech Recognition with Deep Recurrent Neural Networks, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6645–6649, IEEE,

Gulić, M,, Lučanin, D,, dan Šimić, A 2011, A Digit And Spelling Speech Recognition System for The Croatian Language, Proceedings of the 34th International Convention MIPRO, pages 1673–1678, IEEE,

Hachkar, Z,, Farchi, A,, Mounir, B,, dan El-Abbadi, J 2011, A Comparison Of DHMM And DTW for Isolated Digits Recognition System of Arabic Language International Journal on Computer Science and Engineering, 3(3):1002–1008,

Hochreiter, S,, dan Schmidhuber, J, 1997, Long Short-Term Memory, Neural Computation, 9(8):1735–1780,

Jurafsky, D,, dan Martin, J, H 2008, Speech and Language Processing (Prentice Hall Series in Artificial Intelligence), Prentice Hall,

Jurafsky, D., & Martin, J. H. (2019). Vector semantics and embeddings. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 270–285.

Kotler, P, Adam, S, Brown, L & Armstrong, G 2003, Principles of marketing, 2nd edn, Pearson

Lamere, P,, Kwok, P,, Gouvea, E,, Raj, B,, Singh, R,, Walker, W,, Warmuth, M,, dan Wolf, P, 2003, The CMU Sphinx-4 Speech Recognition System, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2003), Hong Kong, volume-1, pages 2–5,

Legoh, K., & others. (2019). Speaker Independent Speech Recognition System for Paite Language using C# and Sql database in Visual Studio. 2019 2nd International Conference on Innovations in Electronics, Signal Processing and Communication (IESC), 34–38.

Li, J. et al. (2017). A comparison of deep learning methods for environmental sound detection. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 126–130.

Limkar, M,, Rao, R,, dan Sagvekar, V, 2012, Isolated Digit Recognition Using MFCC And DTW, Mumbai University, India, 1:59–64,

Lina, Qolbiyatul. 2019, Apa itu Convolutional Neural Network? https://medium.com/@16611110/apa-itu-convolutional-neural-network-836f70b193a4 (diakses 14 September 2021).

McLoughlin, I. 2009, Applied Speech and Audio Processing: with Matlab Examples, Cambridge University Press.

Mukhedkar, A, S,, dan Alex, J, S, R, 2014, Robust Feature Extraction Methods for Speech Recognition In Noisy Environments, First International Conference on Networks & Soft Computing (ICNSC), 2014, pages 295–299, IEEE,

Ningthoujam N, dan Prathima V, R 2016, A Survey On Feature Extraction Algorithm for The Speech Recognition System, International Journal of Computer Science and Mobile Computing, 5(4),

Pandit, P,, dan Bhatt, S, 2014, Automatic Speech Recognition of Gujarati Digits Using Dynamic Time Warping, International Journal of Engineering and Innovative Technology, 3(12)

Prakoso, H,, Ferdiana, R,, dan Hartanto, R, 2016, Indonesian Automatic Speech Recognition System Using CMU-Sphinx Toolkit and Limited Dataset, International Symposium on Electronics and Smart Devices (ISESD), pages 283–286, IEEE,

Rabiner, L, R, & Juang, B, H 1986, An Introduction to Hidden Markov Model, IEEE ASSP Magazine 0740-7467/86/0100-0004$01,00©1986 IEEE

Sakoe, H,, dan Chiba, S, 1978, Dynamic Programming Algorithm Optimization for Spoken Word Recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1):43–49,

Sakoe, H,, Isotani, R,, Yoshida, K,, Iso, K,, dan Watanabe, T, 1990, Speaker Independent Word Recognition Using Dynamic Programming Neural Networks, Readings in Speech Recognition, pages 439–442, Elsevier,

Saksono, MT,Widyanto, H Achmad & A ZAjub 2008, Aplikasi Pengenalan Ucapan Sebagai Pengatur Mobil Dengan Pengendali Jarak Jauh, http://eprints.undip.ac.id/4310/1/ mar08_t05_ucapan_ayub.pdf.

Silvester D, Gusti S, Budi A, Yaddarabullah A, Wahyu C, Robbi Rahim, 2021, Classification of bird sounds as an early warning method of forest fires using Convolutional Neural Network (CNN) algorithm, Journal of King Saud University–Computer and Information Sciences.

Stevens, K, N, 2000, Acoustic phonetics, volume-30, MIT press,

Terissi, L, D,, dan Gómez, J, C, 2005, Template-Based and HMM-Based Approaches for Isolated Spanish Digit Recognition, Inteligencia Artificial, Revista Iberoamericana de Inteligencia Artificial, 9(26),

Winursito, A. et al. (2018). Improvement of MFCC feature extraction accuracy using PCA in Indonesian speech recognition. 2018 International Conference on Information and Communications Technology (ICOIACT), 379–383.

Wolfram, W. (2017). Dialect in society. The Handbook of Sociolinguistics, 107–126.

Yi, J,, Ni, H,, Wen, Z,, Liu, B,, dan Tao, J, 2016, CTC Regularized Model Adaptation For Improving LSTM RNN Based Multi-Accent Mandarin Speech Recognition, 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), pages 1–5, IEEE.

Zhichao Z, Shugong X, Shunqing Z, Tianhao Q, Shan Cao, 2020, Attention based convolutional recurrent neural network for environmental sound classification, Neurocomputing, Elsevier.

Published

2023-06-17