Home / Blog / / Market Segment Analysis of Ed-Tech Market using K-means Clustering Algorithm

Market Segment Analysis of Ed-Tech Market using K-means Clustering Algorithm

  • September 24, 2022
  • 3107
  • 91
Author Images

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of Innodatatics Pvt Ltd and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 17 years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

Read More >

The worldwide Edtech market spans the whole company lifecycle, from early-stage startups to middle-market businesses to publicly traded enterprises, and is a varied, quickly expanding sector. Technology is integrated throughout a learner's life in all three of the key education categories (Pre-K–12, post-secondary, and corporate training), not just by seasoned industry experts but also by generalist investors. In the Edtech industry, each category functions as a separate sub-segment, as experienced investors are aware. Due to variances in methodology and categorizations, many publications have issued investment numbers that can help us triangulate and get a feel of the size, scope, and growth of the Edtech business. Each category is distinct and has different end users, purchasers, and funding channels

Data Source and Data preprocessing:

Data are pieces of information that can be studied to help formulate company plans. As technology advances, different kinds of data are constantly being gathered. With such an abundance of data at our disposal, new tactics are being developed, along with concerns over privacy and new ethical codes. The goal of data preprocessing is to transform raw data into a format that can be used and understood. Python libraries like NumPy, Pandas, Scikit-learn, and SciPy are used in the workflow, and the results are guaranteed to be repeatable.

360DigiTMG 360DigiTMG

Above are the columns provided in the dataset and started analysis, by looking in to presence of null values. There are no empty values found in the dataset.


Seaborn, which is a high level interface for graphical representations, has been used along with Matplotlib. It aims to find data visualization making it a central theme for exploring and understanding of current dataset taken for the study. The total population taken for different states, explained using barplot, where we can notice, Uttar Pradesh has highest population and these are more than 10 states, that have less than 10-20% population, when compared with UP.


Private vs. Public Sector Role in Education in difference states

Public vs. private education is a persistent topic of discussion. Some academics believe that encouraging private education will increase options and foster healthy competition. The opponents of this school of thinking contend that education should be a public good and free for all.

360DigiTMG 360DigiTMG

Literacy Rate Analysis:

Based on the National Statistical Office (NSO) survey and data collected, India has a 75% literacy rate and Kerala now has a 93% literacy rate. With a literacy rate of 63.82%, Bihar is the least literate state in India. Let's compare the female literacy rates of the top three and bottom three states. According to the 2011 Census on Literacy Rates, 65.46% of women nowadays are literate, compared to almost 80% of men. India's average literacy rate is 77.70%, with male literacy at the country level in 2021 standing at 84.70% and female literacy at 70.30%, according to data from the given survey report. It is observed that, the states with low literacy levels have wide disparities and the lowest states have poor female literacy rates despite having high male literacy rates.


Comparison of Overall Gender Gap in Literacy Rate and Literacy Rate in Urban vs. Rural Areas:

About three decades ago, the adult male literacy rate in India was almost twice that for adult females. While this gap has narrowed substantially over the years, adult male literacy rate still surpasses the adult female literacy rate by 17 percentage points. In India, the literacy rate for people 7 years of age and over was at 77.7%. The literacy rate in rural areas was 73.5%, whereas it was 87.7% in urban areas. Let's compare the literacy rates of the populations in urban and rural areas. Data on the rural population is not available. As a result, when comparing the literacy rate of rural and urban areas, one rural area is equivalent to 100 urban areas.


Analysis with educational level with different age groups

Taken the percentage distribution of rural persons based on ages 15 years & above by highest completed levels of education. From this analysis, the insights say, 31.5% were not literate and only 20.9% were literate up to primary level education. A student percentage of 17.2% was seen doing their upper primary/middle level education and 24.9% were in secondary and higher secondary. There is only a fraction of the population 5.7% were doing their graduation & above. When it comes to the urban population these results were completely different. The non-literate rate was 13.9%, which was 31% in the rural population. 14.7% were literates up to the primary, 14.0% were of level upper primary/middle, and 35.8% were of level secondary and higher secondary. Students going for their higher studies after the higher secondary level was 5% in the rural population, but, when it comes to the urban population it is 21.7%. This seems to be more awareness, which should be provided in rural areas related to the importance of higher studies after higher secondary level.


K-Means Algorithm used for Segment Extraction

A straightforward and elegant method for dividing a data collection into K unique, non-overlapping clusters is K-means clustering. The K-means algorithm will place each observation in precisely one of the K clusters when we define the K-means clustering's desired number of clusters, K. The results of executing K-means clustering on a synthetic example with 150 observations in two dimensions using three different values of K are shown in the image below. An algorithm for unsupervised learning is K-Means clustering. Contrary to supervised learning, this clustering lacks tagged data. K-Means divides objects into clusters that have things in common and are different from things in another cluster. The elbow approach entails determining the elbow point and a metric to assess how well a clustering result is for different values of K. When the value of K is changed, the efficiency of clustering first improves quickly before stabilizing. The elbow point is where relative growth is no longer particularly significant. The two graphs below for the metric mean within-cluster sum of squares illustrate this in visual form. In this case, we upload a data set using the Python library and create a model using machine learning techniques (K-means clustering). Using the scatter matrix library, we choose the columns in the District data set and create a scatter plot.

Required packages are imported for further analysis and model building. There are packages such as plotly, Matplotlib, Seaborn, are used for data visualization Pandas and NumPy for data manipulation and numerical analysis respectively. Imported packages such as Scipy and Sklearn for model building. Data preprocessing was performed, where, certain columns such as schools with ramp and schools that require ramp were droped.


A method of linear dimensionality reduction is PCA and it was applied on the current data. It preserves as much of the diversity in the original dataset as feasible while transforming a collection of correlated variables (p) into a smaller k (kp) number of uncorrelated variables known as principle components.


K-means Algorithm


Mapping and plotting the Elbow Method


The scree plot below shows the curve and the elbow point for the % of variance explained metrics based on the data provided.


Based on the scree plot, now we can iterate the values of k from 2 to 5. We make the assumption that there are no real-world data sets where all the data points can be effectively grouped into a single group.


When we take 4 clusters and look in to the scatter plot, the below plot illustrates the same. We have taken component 1 and 2 as, PCA has been applied on the initial data, and we will have maximum information in the first two components.


From the analysis above selecting the Target

Target marketing in the field of education technology refers to the process of showcasing your teaching skills to a particular audience utilizing strategies like audience analysis, segmentation, and more. With Edtech's target marketing, you can connect better with your ideal customer or student. There are four methods to describe your target market. You may choose the right target market for your marketing initiatives by taking a few factors into account. To ensure that your Edtech advertising campaigns achieve their goals, it is important to be as precise as you can when choosing a target audience. Let's examine the details in more detail.

You must decide where to concentrate your marketing efforts while conducting research into your target market. Picking up the focal area relies heavily on geo-location. This decision will be made in light of the physical location of your institution and the expected travel time for students. Promotional strategies should focus on the target audience's demographics. You must take into account a variety of aspects when choosing which demographics to targets, such as sexual identity, age, education, number of dependents, profession, and household income. When conducting a psychographic analysis, it is important to take into account your audience's lifestyle, behavior, and personality. You should also think about how open the target audience is to novel concepts. Determine the level of comprehension of our target audience by analyzing their needs; this may be done by analyzing their behavior, which is also a crucial component. We also need to evaluate how responsive they are to various products and services.

Click here to learn Data Science Course, Data Science Course in Hyderabad, Data Science Course in Bangalore

Data Science Training Institutes in Other Locations

Data Analyst Courses in Other Locations

Navigate to Address

360DigiTMG - Data Science, IR 4.0, AI, Machine Learning Training in Malaysia

Level 16, 1 Sentral, Jalan Stesen Sentral 5, Kuala Lumpur Sentral, 50470 Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, Malaysia

+60 19-383 1378

Get Direction: Data Science Course

Make an Enquiry
Call Us