Login
Congrats in choosing to up-skill for your bright career! Please share correct details.
Home / Blog / Data Science Digital Book / Types of Clustering / Segmentation Algorithms
Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of AiSPRY and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 18+ years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.
Table of Content
A straightforward boxplot may be used to do clustering when there is only one variable present. Scatter diagrams can be used when there are two variables.
Click here to learn Data Science in Hyderabad
When we have more than 2 variables then there are a lot of other techniques such as:
Click here to learn Data Science in Bangalore
Non-Hierarchical Clustering is the name given to K-Means clustering.
We use a Scree plot or an Elbow Curve to determine the number of clusters up front.
Click here to learn Data Analytics in Bangalore
When centroids are initialised at random, the clustering exercise ends at a local minima (minima since the goal is to obtain the minimum inside the sum of squares).
The answer is to start the algorithm several times with various beginning partitions.
Although there are some general guidelines, they are not infallible. There is no established rule for choosing the K-value.
Solution: Run the algorithm with a variety of different 'K' values, and then choose the clusters with the lowest 'Within Sum of Square' and highest 'Between Sum of Square' values.
extremely sensitive to excessive values or outliers.
Solution: K-medians and K-Medoids are two more variations that effectively manage outliers.
When dealing with continuous data, K-Means clustering is effective.
Use K-Modes for categorical data as a solution.
Clusters with non-convex forms cannot be found.
Solution: Use Kernel K-Means and density-based clustering as a solution.
Click here to learn Data Analytics in Hyderabad
K-Means++ altogether addresses the problem of different initializations leading to different clusters.
Click here to learn Artificial Intelligence in Bangalore
K-Medians is excellent at coping with outliers.
The distance unit used is L1 Norm, often known as Manhattan Distance.
Steps and K-Means are quite similar, with the exception that we compute Median rather than Mean.
Click here to learn Artificial Intelligence in Hyderabad
K-Medoids address the problem of K-Means getting influenced by outliers.
Choose 'K' data points randomly as medoids
Instead of taking the centriod of data points of a cluster, medoids are considered to be the center.
Find out the distance from each and every data point to the medoid and add them to get a value. This value is called total cost.
Select any other point randomly as a representative point (any point other than medoid points)
Find out the distance from each of the points to the new representative point and add them to get a value. This value is called the total cost of a new representative point.
If the total cost of step 3 is greater than the total cost of step 5 then the representative point at step 4 will become a new medoid and the process continues.
If the total cost of step 3 is less than the total cost of step 5 then the algorithm ends.
Click here to learn Machine Learning in Hyderabad
Partitioning Around Medoids (PAM) is a classic example of K-Medoids Algorithm.
Click here to learn Machine Learning in Bangalore
In the case of large datasets performing clustering by in-memory computation is not feasible. The sampling technique is used to avoid this problem
CLARA is a variant of PAM.
However unlike PAM, the medoids of all the data points aren’t calculated, but only for a small sample.
The PAM algorithm is now applied to create optimal medoids for the sample.
CLARA then performs the entire process for a specified no of points to reduce bias.
The shortcoming of CLARA is that, it varies based on the sample size.
CLARANS is akin to double randomization where the algorithm randomly selects the ‘K’. And also randomly selects medoids and a non-medoid object (Similar to K-Medoids).
CLARANS repeats this randomised process a finite number of times to obtain optimal solution.
Click here to learn Data Science Course, Data Science Course in Hyderabad, Data Science Course in Bangalore
360DigiTMG - Data Science, IR 4.0, AI, Machine Learning Training in Malaysia
Level 16, 1 Sentral, Jalan Stesen Sentral 5, Kuala Lumpur Sentral, 50470 Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, Malaysia
+60 19-383 1378
Didn’t receive OTP? Resend
Let's Connect! Please share your details here