Home / Blog / Data Science / Classification of Music Genre Using KNN

Classification of Music Genre Using KNN

February 18, 2024
66

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of AiSPRY and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 18+ years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

Table of Content

Required Libraries
Create a function to detect neighbors and measure the distance between feature vectors.
Determine the category of nearest neighbors
Model Evaluation
Feature Extraction
Train-test split the dataset
Calculate the distance between two instance
Training the Model and making predictions
Use the new Audio File to test the Classifier

Different sounds are classified into distinct categories using machine learning. Nearly all data science enthusiasts desire tasks that are appealing and stand out on their resumes, and audio processing is one of these areas. In this project, we will use the K-Nearest Neighbors classification method to develop a full-fledged music genre categorization project from scratch.

Compared to image processing and other classification methods, audio processing is one of the most challenging data science projects. One such use is the classification of music genres, which seeks to place audio files in the appropriate sound groups to which they belong. Because classifying music manually requires listening to each track for the entirety, the application is crucial and needs automation to cut down on manual error and time. Therefore, machine learning and deep learning algorithms are what we will employ in this essay to automate the procedure.

In a nutshell, the issue statement for our project may be stated as follows: Given several audio files, the job is to classify each audio file into a specific genre, such as disco, hip-hop, etc. The top 4 methodologies that are most frequently utilized to construct the classification of musical genres are shown below.

Multiclass support vector machine
K-Nearest Neighbors
K-means clustering algorithm
Convolutional neural network

We will use K-Nearest Neighbors algorithm because various types of research prove it is one of the best algorithms to give good

A machine learning approach called K-Nearest Neighbor (KNN) is used for classification and regression. The lazy learner algorithm is another name for it. It merely applies a distance-based algorithm to identify the K number of comparable neighbors to fresh data, and it outputs the class where the majority of neighbors are located. Let's now prepare our system for project implementation.

The dataset we will use is named the GTZAN genre collection dataset which is a very popular audio collection dataset. It contains approximately 1000 audio files that belong to 10 different classes. Each audio file is in .wav format (extension). The classes to which audio files belong are Blues, Hip-hop, classical, pop, Disco, Country, Metal, Jazz, Reggae, and Rock. You can easily find the dataset on Kaggle and can download it from Kaggle

Required Libraries:

It's crucial to install specific libraries before moving on to load datasets and model construction. To extract features and experience something new, we will use the Python Speech Feature Library. We will use the scipy library as well as load the dataset in the WAV format, thus we must install these two libraries.

Import required libraries

!pip install python_speech_features

!pip install scipy

Make the necessary imports to play with the data in your freshly generated Kaggle notebook or Jupyter notebook first.

Create a function to detect neighbors and measure the distance between feature vectors:

KNN operates by determining the K number of neighbors and computing distance. We shall develop various functions to accomplish this exclusively for each capability. The first thing we'll do is put together a function that takes training data, existing instances, and the necessary number of neighbors. It will measure the separation between each point in the training set and every other point, locate the nearest K neighbors, and then return all neighbors. To make a project workflow clear and straightforward, we will build a function to determine the distance between two points.

Determine the category of nearest neighbors:

We now have a list of neighbors, and we need to identify the class with the maximum number of neighbors. To store the class and its associated count of neighbors, we declare a dictionary. We create a frequency map, sort it based on the number of neighbors, and then return the first class.

Model Evaluation:

To verify the precision and accuracy of the algorithm we develop, we also need a function that assesses a model. To calculate accuracy, we will create a function that is quite basic and that returns the total number of accurate forecasts divided by the total number of predictions.

Feature Extraction:

Therefore, we are applying the KNN Classifier, and we have just constructed the algorithm from scratch to give you a sense of how the project would operate at this point. Now that the data has been loaded from all folders of the appropriate categories, we will extract features from each audio file and store them in binary form with the DAT extension.

Mel Frequency Cepstral Coefficients

The technique of extracting significant features from data is known as feature extraction. It involves locating linguistic information and avoiding all noise. Three categories—high-level, mid-level, and low-level audio features—are used to categorize audio features.

⦁ High-level features are related to music lyrics like chords, rhythm, melody, etc.

⦁ Mid-level features include beat-level attributes, pitch-like fluctuation patterns, and MFCCs.

⦁ Low-level features include energy and a zero-crossing rate which are statistical measures that get extracted from audio during feature extraction.

To produce these features, we follow a series of procedures that are collectively known as MFCC, which aids in the extraction of mid-level and low-level audio features. The steps for how MFCCs function in feature extraction are discussed below.

Audio files can range in length from a few seconds to many minutes. And the pitch or frequency is always changing, therefore to comprehend this, we first split the audio stream into little, 20–40 ms long frames.

We attempt to recognize and extract various frequencies from each frame after breaking it into frames. Assume that one frame breaks down into a single frequency when we divide in such a little frame.

remove the noise from the language frequencies

Take a discrete cosine transform (DCT) of the frequencies to eliminate all noise. Students with engineering backgrounds may be familiar with and have studied the cosine transform in discrete mathematics courses.

Now that we have imported everything from the Python speech feature library, we no longer need to implement each of these stages individually because MFCC does it for us. Each category folder will be iterated through, the audio file read, the MFCC feature extracted, and the feature dumped in a binary file using the pickle module. To understand and manage any exceptions that may arise while importing large datasets, I usually advise using try-catch.

Train-test split the dataset:

From the audio file, which is stored in binary format and used as the filename for my dataset, we have retrieved some features. We will now put into practice a function that takes a filename and copies all the data as a data frame. The data will then be randomly divided into train and test sets based on a predetermined threshold because we want to include a variety of genres in both sets. There are various ways to conduct a train-test split. Here, I'm using a random module and looping till the length of the dataset to generate a random fractional integer between 0 and 1; if it's less than 66, a specific row is added to the train test set; otherwise, the test set is used.

Calculate the distance between two instance:

To compute the distance between two points, we must implement this function at the start. However, to fully describe the project's workflow and technique, I will discuss the supporting function after the primary phases. However, you must layer the function on top. To determine the real distance between the two data points, the function requires two inputs (X and Y coordinates). We employ the low-level implementation of conventional linear algebra provided by the Numpy linear algebra module. To determine the real distance, we must first calculate the dot product between the X-X and Y-Y coordinates of both points. Then, we may extract the determinant from the resultant array of both points and determine the distance.

Training the Model and making predictions:

You were all waiting for the stage where you fed the data into the KNN algorithms, made all your predictions, and got accurate results on the test dataset. This step code appears to be a lot, but it is extremely small because we are using a stepwise functional programming method, thus all we need to do is call the functions. The initial step is to gather neighbors, extract classes, and assess the model's correctness.

Use the new Audio File to test the Classifier:

Now that the model has been deployed and trained, it is time to check for fresh data to see how well our model predicts fresh audio files. Since we have all of the labels (classes) in numerical form, we will first create a dictionary with the key being the numerical label and the value being the category name.