Sent Successfully.
Home / Blog / Data Science Digital Book / Decision Tree
Decision Tree
Table of Content
“Categorical Variable Decision Tree” When output is Categorical
“Continuous Variable Decision Tree” When output is Numerical
Decision Trees are
- Nonparametric hierarchical model, that works on divide & conquer strategy
- Rule-based algorithm that works on the principle of recursive partitioning. A path from root node to leaf node represents a rule
- Tree-like structure in which an internal node represents a test on an attribute, each branch represents outcome of test and each leaf node represents the class label
Click here to explore 360DigiTMG.
Click here to learn Data Science in Hyderabad
A Greedy Algorithm
To develop a Decision Tree, consider 2 important questions:
Q1. Which variable to split
Q2. When to stop growing the Tree
Click here to learn Data Science in Bangalore
Information Theory 101
Click here to Learn Data Science Course in Chennai
When an event occurs frequently, it has extremely little informational value.
Therefore, "Information Content is Proportional to Rarity"
Click here to learn Data Analytics in Bangalore
Entropy:
- Entropy is the expected information content of all the events
- Entropy value of 0 means the sample is completely homogeneous
- Entropy value of 1 means the sample is completely heterogeneous
Purity = Accuracy = 1 - Entropy
In Accuracy we assign the Dominant Label to each region.
Each area is given the Dominant Label in Accuracy.
Decision trees identify qualities that yield the most homogenous branches by measuring the entropy, which is a measure of disorder or impurity (variation/heterogeneity).
The GINI Measure, which is an expected measurement of purity, can also be used. With probabilistic labelling, accuracy.
One must choose which characteristic to divide after determining the purity measure. To do this, one needs quantify the change in homogeneity brought on by a feature split. Information Gain is the name given to this computation.
Entropy in the segment before the split (S1) and the partitions that resulted from the split (S2) are compared to determine the information gain of a feature.
Better for the attribute is less variety in class labels after the split.
Gain in information: A decrease in entropy (variation) as a result of dividing the dataset along an attribute.
Greater information gain indicates greater uniformity.
Click here to learn Machine Learning in Hyderabad
Pros and Cons of Decision Tree
Strengths | Weaknesses |
---|---|
Uses the important feature during decision making | Biased towards factors (features), which have a lot of levels |
Interpretation is very simple because there is no mathematical background needed | Small changes in the data will result in large changes to decision making |
Regularisation techniques can be used to combat model overfitting.
The regularisation method employed in the Decision Tree is pruning.
Reducing the tree's size in order to generalise previously unobserved data is known as pruning.
Two Pruning techniques are
Pre-Pruning or Early Stopping |
---|
Stopping the tree from growing once the desired condition is meet.
Disadvantage: When to stop the tree from growing. What if an important pattern was prevented from learning? |
Post-Pruning |
---|
Grows the tree completely and then apply the conditions to reduce the tree size. Example, if the error rate is less than 3% then reduce the nodes. So, the nodes and branches that have less reduction of errors are removed. This process of grafting branches is known as subtree raising or subtree replacement. |
Post-Pruning is more effective than Pre-Pruning
Click here to learn Data Science Course, Data Science Course in Hyderabad, Data Science Course in Bangalore
Data Science Training Institutes in Other Locations
Agra, Ahmedabad, Amritsar, Anand, Anantapur, Bangalore, Bhopal, Bhubaneswar, Chengalpattu, Chennai, Cochin, Dehradun, Malaysia, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Gwalior, Hebbal, Hyderabad, Jabalpur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Khammam, Kolhapur, Kothrud, Ludhiana, Madurai, Meerut, Mohali, Moradabad, Noida, Pimpri, Pondicherry, Pune, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thane, Thiruvananthapuram, Tiruchchirappalli, Trichur, Udaipur, Yelahanka, Andhra Pradesh, Anna Nagar, Bhilai, Borivali, Calicut, Chandigarh, Chromepet, Coimbatore, Dilsukhnagar, ECIL, Faridabad, Greater Warangal, Guduvanchery, Guntur, Gurgaon, Guwahati, Hoodi, Indore, Jaipur, Kalaburagi, Kanpur, Kharadi, Kochi, Kolkata, Kompally, Lucknow, Mangalore, Mumbai, Mysore, Nagpur, Nashik, Navi Mumbai, Patna, Porur, Raipur, Salem, Surat, Thoraipakkam, Trichy, Uppal, Vadodara, Varanasi, Vijayawada, Vizag, Tirunelveli, Aurangabad
Navigate to Address
360DigiTMG - Data Science Course, Data Scientist Course Training in Chennai
D.No: C1, No.3, 3rd Floor, State Highway 49A, 330, Rajiv Gandhi Salai, NJK Avenue, Thoraipakkam, Tamil Nadu 600097
1800-212-654-321