Home / Blog / Data Science / Decision Tree in a Cheat Sheet

Decision Tree in a Cheat Sheet

  • July 05, 2023
  • 5050
  • 46
Author Images

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of Innodatatics Pvt Ltd and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 17 years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

Read More >

A supervised, non-parametric machine learning technique called a decision tree is utilised for both classification and regression.

Decision Trees are represented as Nodes:

  • Root Node represented as a Rectangle or a Square: or
  • Branch/ Internal Node represented as a Circle:
  • Leaf /Terminal Node represented as a Triangle or a dot: or

Click here to explore 360DigiTMG.

Learn the core concepts of Data Science Course video on Youtube:

Information Gain:

After the dataset is divided based on an attribute, the information gain is based on the decrease in entropy. It has a value between 0 and 1.

Entropy before - after is the formula for information gain (IG).

Entropy:

It is the measure of impurity, it is also called a measure of uncertainty.

Its value ranges between 0 to 1

Decision Tree in a Cheat Sheet

Gini Index:

The purity is measured by the Gini Index. Gini Index is used by the CART algorithm for decision trees. It has a value between 0 and 1

Decision Tree in a Cheat Sheet

Decision Tree in a Cheat Sheet

  • Stacking: A meta-classifier or a meta-regression is used in the ensemble learning approach known as stacking to merge many classification or regression models.
  • Voting: Voting combines the predictions from multiple machine learning algorithms
  • Hard Voting: The class that gained the most votes in this case will be selected as the output class.
  • Soft Voting: In this, the probability values for each predicted class are added and taken an average, the one with more average is considered.
  • Bagging: Bagging is aggregation in Bootstrap. It improves accuracy and decreases over-fitting.
  • Random Forest: Random Forest is an extension to Bagging. IT minimizes the overfit
  • Ada Boost: Ada Boost seeks to create a powerful classifier by merging many weak classifiers. Improve the weak classifier's accuracy.
  • Gradient Boosting: Gradient Boosting is used to define the loss function and reduce it. It works well with categorical and count data and also handles the missing data well
  • XG Boost: Gradient boosting is improved by XG Boost, which can be applied to both classifiers and regression models.

Decision Tree in a Cheat Sheet

Libraries to install in Python for Decision Tree and Ensemble

  • from sklearn.preprocessing import LabelEncoder - Used for one-hot encoding on the data
  • from sklearn.preprocessing import scale - Data preprocessing for standardization
  • from sklearn.model_selection import train_test_split - To split the data into Train and Test
  • from sklearn.tree import DecisionTreeClassifier as DT - Used in multiclass classification
  • from sklearn import tree - Used to generate and draw trees
  • from sklearn.metrics import accuracy_score - Multilabel classification for subset accuracy
  • from sklearn.metrics import confusion_matrix - Used to evaluate the quality of o/p classifier
  • from sklearn.ensemble import VotingClassifier - Used for prediction based on the most frequent one
  • from sklearn.ensemble import BaggingClassifier - Used on the base classifier on random subsets of the original dataset and aggregate individual predictions
  • from sklearn.ensemble import RandomForestClassifier - Used in both classification and regression models
  • from sklearn.ensemble import AdaBoostClassifier - It uses multiple classifiers to increase the accuracy of the classifier
  • from sklearn.ensemble import GradientBoostingClassifier - Gradient Boosting classifiers is to minimize the loss
  • import xgboost as xgb - XGB is an extension of GB used for speed and performance

 

Watch Free Videos on Youtube

Libraries to install in R for Decision Tree and Ensemble

  • library(caTools) -Used for basic utility functions
  • library(C50) - C5.0 classification model for Decision Tree
  • library(rpart) - R implementation in Recursive Partitioning And Regression Trees
  • library(gmodels) - For model fitting
  • library(caret) - For Classification and Regression
  • library(randomForest) - Algorithm for Classification and Regression
  • library(adabag) - AdaBoost for classification with bagging and boosting
  • library(gbm) - Gradient Boosting Machine for Regression models
  • library(xgboost) - It’s an extension to GB and it supports both classification and regression models

 

Hyperparameters in Decision Tree
Hyper Parameters Input Values Default Value
max_depth Integer or None, Optional None
min_samples_split Integer, Float, Optional 2
min_samples_leaf Integer, Float, Optional 1
min_weight_fraction_leaf Float, Optional 0
max_features Integer, Float, string or None, Option None
random_state Integer, RSI or None, Optional None
min_impurity_decrease Float, Optional 0
base_estimator Int Decision Tree
n_estimators Int 10
random_state seed None
n_jobs Int, None None
Criterion Integer, float Gini
min_samples_leaf Integer 1
oob_score Boolean False
learning_rate Integer 1
colsample_byleve Integer, float 1
colsample_bytree Integer, float 1
Subsample Integer, float 1
Eta Integer, float 0.3
min_child_weight Integer 1
Gamma Integer, Float 0
Alpha Integer, float 0
Lambda Integer, float 1

Click here to learn Data Science Course, Data Science Course in Hyderabad, Data Science Course in Bangalore

Data Science Placement Success Story

Data Science Training Institutes in Other Locations

Data Analyst Courses in Other Locations

Navigate to Address

360DigiTMG - Data Analytics, Data Science Course Training Hyderabad

2-56/2/19, 3rd floor, Vijaya Towers, near Meridian School, Ayyappa Society Rd, Madhapur, Hyderabad, Telangana 500081

099899 94319

Get Direction: Data Science Course

Read
Success Stories
Make an Enquiry