Call Us

Home / Blog / Data Science Digital Book / Unsupervised Learning - Preliminaries

Unsupervised Learning - Preliminaries

  • July 15, 2023
  • 2561
  • 28
Author Images

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of Innodatatics Pvt Ltd and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 18+ years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

Read More >

 

Distance Calculation

Distance is either calculated between: distance calculation

Click here to learn Data Science in Hyderabad

 


Learn the core concepts of Data Science Course video on YouTube:

Distance Properties:

  • Should be non-negative (distance > 0)
  • Distance between a record to itself is equal to 0
  • Satisfies Symmetry (Distance between records 'i' & 'j' is equal to the distance between records 'j' & 'i')

If the variables scale or have different units, standardise or normalise the variables before computing the distance.

distance properties

Click here to learn Data Science in Bangalore


Distance Calculations

Distance Metrics for Continuous Data

Click here to learn Artificial Intelligence in Hyderabad

distance calcutaions

Distance Metrics for Binary Categorical Data

  • Binary Euclidean Distance
  • Simple Matching Coefficient
  • Jaccard's Coefficient

Click here to learn Artificial Intelligence in Bangalore

Distance Metrics for Categorical Data (> 2 categories)

  • Distance is 0, if both items have same category
  • Distance is 1 otherwise

Click here to learn Data Analytics in Hyderabad

Distance Metrics when both Quantitative Data & Categorical Data exists in a dataset

  • Gower's General Dissimilarity Coefficient

Click here to learn Data Analytics in Bangalore


Linkages

Linkages - Distance between a record & a cluster, or between two clusters.

  • Single Linkage - This is the closest a record may be to a cluster or to another cluster.

    • Single Linkage is also called as Nearest Neighbor
    • Emphasis is on close records or regions and not on overall structure of Data
    • Capable of clustering non-elliptical shaped regions
    • Gets influenced greatly by outliers or noisy data

     

  • Complete Linkage - The diameter between a record and a cluster, or between two clusters, is the greatest.

    • Complete Linkage is also called as Farthest Neighbor
    • Complete Linkage is also sensitive to outliers
    complete linkage complete linkage

     

  • Average Linkage - This is the mean distance between any two clusters or between any two records.

    • Average Linkage is also called Group Average
    • Very expensive because computation takes a lot of time

     

  • Centroid Linkage - This is the separation between two clusters' centroids, or between a cluster's record and centroid.

    • Centroid Linkage is also called Centroid Similarity

     

  • Ward's Criterion - By combining them into a single cluster, the SSE criteria for clustering's value increased.

    • This is also called Ward's Minimum Variance and it minimizes the total within cluster variance

     

  • Group Averaged Agglomerative Clustering (GAAC)

    • Two clusters are merged based on cardinality of the clusters and centroid of clusters
    • Cardinality is the number of elements in the cluster

     

Click here to learn Data Science Course, Data Science Course in Hyderabad, Data Science Course in Bangalore

Data Science Placement Success Story

Data Science Training Institutes in Other Locations

Navigate to Address

360DigiTMG - Data Science Course, Data Scientist Course Training in Chennai

D.No: C1, No.3, 3rd Floor, State Highway 49A, 330, Rajiv Gandhi Salai, NJK Avenue, Thoraipakkam, Tamil Nadu 600097

1800-212-654-321

Get Direction: Data Science Course

Make an Enquiry