Sound, MFCC, CNN architecture


Sound detection is a challenge in machine learning due to the noisy nature of signals, and the small amount of (labeled) data that is usually available. The need for sound detection in Indonesia is quite important because there are many community organizations that form groups according to the land of their origin. Especially in big cities, where people from various tribes gather and exchange cultures. However, it has a disadvantage that affects these tribes, namely the loss of the original culture of certain areas. The Sundanese are the object of this research, including Bandung, Bogor and Cirebon. Voice data is divided into 2 types, namely male and female, each region consists of 50 respondents with 25 male and female voices with a maximum voting time of 1 minute. The method used is CNN architecture based on supervised learning, preprocessing using MFCC (Mel Frequency Cepstral Coefficients) to obtain feature extraction from voice data. CNN architecture is carried out 3 times convolution with max pooling and dropout on each convolution.


