Sent Successfully.
Home / Blog / Data Science / High Level Project Management – Data Science
High Level Project Management – Data Science
Table of Content
Data Collection
- Primary Data Sources – Data collected at that moment – Surveys / Experiments
- Costly
- Time-consuming / Low quality
- Get the exact variable
- Secondary Data Sources – Data which is collected beforehand
- Quick access to data
- Free of cost
- Need not have data of interest
Want to learn more about data science? Enroll in the Best Data Science courses in Chennai to do so.
Data Cleansing / Data Preparation / Exploratory Data Analysis / Feature Engineering
-
Data Cleansing / Data Preparation
- Outlier Analysis / Treatment – 3R (Rectify, Retain, Remove)
- Missingness of data – Imputation – Mean, Median, Mode, Regression, KNN
- Standardization (X-Min(X)/Range(X) / Normalization (X-Mu/Sigma)) – Unitless and Scale Free
- Discretization / Binning / Grouping
- Transformation (log, exp, etc.)
- Non-linear
- Non-normal
- Heteroscedasticity – unequal variance
- Collinearity
- Dummy variable creation – One hot encoding
-
Exploratory Data Analysis
Earn yourself a promising career in data science by enrolling in the Data Science Classes in Pune offered by 360DigiTMG.
- First-moment business decision / Measures of central tendency
- Mean, Median, Mode
- Second-moment business decision / Measures of dispersion
- Variance, Standard Deviation, Range
- Third-moment business decision – Skewness
- Fourth-moment business decision – Kurtosis
- Graphical Representation
- Univariate
- Box Plot
- Primary purpose – Identify outliers
- Secondary purpose – Identify shape of distribution
- Histogram
- Primary purpose – Identify Shape of distribution
- Secondary purpose – Identify outliers
- Q-Q plot – Data are normal or not
- Box Plot
- Bivariate
- Scatter plot
- Primary purposes
- Direction-Positive, Negative, no correlation
- Strength – Strong, moderate, weak – Subjective; Objective – correlation coefficient; r: -1 to +1; |r| > 0.85; |r| < 0.4
- Linear or Non-linear / Curvilinear
- Secondary purposes
- Scatter plot
- Primary purposes
- Clusters
- Outliers
- Primary purposes
- Feature Engineering / Feature Extraction – Using your given variables, try to apply domain knowledge to come up with more meaningful derived variables
- Feature Selection -> Decision Tree (Information Gain), Random Forest (Variable Importance plot), Hypothesis testing, Lasso regression, Ridge regression
- Scatter plot
- Primary purposes
- Scatter plot
- Univariate
- First-moment business decision / Measures of central tendency
Data Mining (Cross-Sectional)
-
Supervised Learning / Machine Learning / Predictive Modelling (Y known)
- Regression Analysis (Interpret the parameters)
- Y= Continuous -> Linear Regression
- Y = Discrete (2 categories) -> Logistic Regression
- Y = Discrete (> 2 categories) -> Multinomial / Ordinal Regression
- Y = Count -> Poisson / Negative Binomial Regression
- Excessive Zero – ZIP / ZINB / Hurdle
- KNN
- Black Box Techniques (No interpretation exists)
- Neural Networks
- SVM
- Ensemble Techniques
- Stacking
- Bagging(Random Forest)
- Boosting (Decision Tree)
- Regression Analysis (Interpret the parameters)
-
Looking forward to becoming a Data Scientist? Check out the Data Science Course and get certified today.
Unsupervised Learning (Y unknown)
- Clustering / Segmentation – Reduce the rows
- K-Means / non-hierarchical – Upfront determine the # of clusters – Scree plot / Elbow curve
- Hierarchical / Agglomerative – Dendrogram
- DBSCAN
- OPTICS
- CLARA
- K-medians / K-Medoids / K-modes
- Dimension Reduction – Reduce the columns
- PCA, Factor Analysis
- SVD
- Association Rules / Market Basket Analysis / Affinity Analysis
- Support
- Confidence
- Lift Ratio > 1 => Antecedent and Consequent have strong association
- Recommender Systems
- Network Analytics
- Degree
- Closeness
- Betweenness
- Eigenvector
- Page Rank
- Text Mining & NLP
- BoW
- TDM / DTM
- TF / TFIDF
Also, check this Data Science Institute in Bangalore to start a career in Data Science.
- Clustering / Segmentation – Reduce the rows
-
Forecasting / Time Series
- Model-Based Approaches
- Trend
- Linear
- Exponential
- Quadratic
- Seasonality
- Additive
- Multiplicative
- Trend
- Data-Based Approaches
- AR
- MA
- ES
- SES
- Holts
- HoltWinters
- Model-Based Approaches
Data Science Placement Success Story
Data Science Training Institutes in Other Locations
Agra, Ahmedabad, Amritsar, Anand, Anantapur, Bangalore, Bhopal, Bhubaneswar, Chengalpattu, Chennai, Cochin, Dehradun, Malaysia, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Gwalior, Hebbal, Hyderabad, Jabalpur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Khammam, Kolhapur, Kothrud, Ludhiana, Madurai, Meerut, Mohali, Moradabad, Noida, Pimpri, Pondicherry, Pune, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thane, Thiruvananthapuram, Tiruchchirappalli, Trichur, Udaipur, Yelahanka, Andhra Pradesh, Anna Nagar, Bhilai, Borivali, Calicut, Chandigarh, Chromepet, Coimbatore, Dilsukhnagar, ECIL, Faridabad, Greater Warangal, Guduvanchery, Guntur, Gurgaon, Guwahati, Hoodi, Indore, Jaipur, Kalaburagi, Kanpur, Kharadi, Kochi, Kolkata, Kompally, Lucknow, Mangalore, Mumbai, Mysore, Nagpur, Nashik, Navi Mumbai, Patna, Porur, Raipur, Salem, Surat, Thoraipakkam, Trichy, Uppal, Vadodara, Varanasi, Vijayawada, Vizag, Tirunelveli, Aurangabad
Data Analyst Courses in Other Locations
ECIL, Jaipur, Pune, Gurgaon, Salem, Surat, Agra, Ahmedabad, Amritsar, Anand, Anantapur, Andhra Pradesh, Anna Nagar, Aurangabad, Bhilai, Bhopal, Bhubaneswar, Borivali, Calicut, Cochin, Chengalpattu , Dehradun, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Guduvanchery, Gwalior, Hebbal, Hoodi , Indore, Jabalpur, Jaipur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Kanpur, Khammam, Kochi, Kolhapur, Kolkata, Kothrud, Ludhiana, Madurai, Mangalore, Meerut, Mohali, Moradabad, Pimpri, Pondicherry, Porur, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thoraipakkam , Tiruchirappalli, Tirunelveli, Trichur, Trichy, Udaipur, Vijayawada, Vizag, Warangal, Chennai, Coimbatore, Delhi, Dilsukhnagar, Hyderabad, Kalyan, Nagpur, Noida, Thane, Thiruvananthapuram, Uppal, Kompally, Bangalore, Chandigarh, Chromepet, Faridabad, Guntur, Guwahati, Kharadi, Lucknow, Mumbai, Mysore, Nashik, Navi Mumbai, Patna, Pune, Raipur, Vadodara, Varanasi, Yelahanka
Navigate to Address
360DigiTMG - Data Science, IR 4.0, AI, Machine Learning Training in Malaysia
Level 16, 1 Sentral, Jalan Stesen Sentral 5, Kuala Lumpur Sentral, 50470 Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, Malaysia
+60 19-383 1378