1. Why is the Linear algebra concept needed for Machine Learning algorithms?
The Machine Learning models are applied to data that can be expressed in matrix form, which is a 2-Dimensional arrangement. Linear algebra is used in data preprocessing, data transformation, model evaluation, and so on as part of Machine Learning implementation to extract meaningful information from raw business data.
Earn yourself a promising career in data science by enrolling in the Data Science Classes in Pune offered by 360DigiTMG.
4. What is the concept behind the p-value?
When you conduct a hypothesis test, the p-value will allow you to determine the strength of your result. This value is between 0 to 1. Based on the value, we can denote the strength of specific results.
Looking forward to becoming a Data Scientist? Check out the Data Science Course and get certified today.
Also, check this Data Science Institute in Bangalore to start a career in Data Science.
11. What is the concept behind the random forest technique?
The random forest is a classifier technique that includes the number of various subsets of decision trees which will help to improve the predictive accuracy of datasets. Instead of relying on one model, the random forest takes the prediction from multiple trees based on the majority votes of the prediction. The high number of trees in the forest leads to good accuracy and prevents overfitting.
Become a Data Scientist with 360DigiTMG Data Science course in Hyderabad Get trained by the alumni from IIT, IIM, and ISB.
12. What is a primary difference between the 2 sampling techniques: probability and nonprobability?
Probability sampling is an unbiased technique which allows equal opportunity for data points to be considered while sampling. It allows to make strong statistical inferences about the population. Non-probability is a biased approach, sampling involves non-random/biased selection based on convenience or other criteria. It may not represent a strong statistical inference about the population.
13. What is Entropy in a decision tree algorithm?
Entropy is a factor that helps to check the homogeneity in the sample data. if the entropy value is zero that means the data is homogenous. In the contrast, if entropy is 1 means the sample is equally divided. Entropy is the control of how a decision tree can split the data and affects how a decision.
Learn the core concepts of Data Science Course video on YouTube:
18. What do you mean by assuming no outliers for Multiple Logistic Regression?
The variables that you simply care about must not contain outliers. The Logistic Regression is sensitive to outliers, or data points that have unusually large or small values. You will tell if your variables have outliers by plotting them and observing if any points are far away from all other points.
21. What is a box plot and how do we interpret it?
A Box plot is the visual graphical representation of any dataset. It helps to identify the outliers and distribution of the data. This is the descriptive statistical method and we will be able to summarize 5 features. 1. Maximum value, 2. Third quartile (Q3), 3. Median 4. First quartile (Q1), 5. Lower value. Boxplot uses IQR method to detect and identify the exceptional values. These are called as outliers.
23. How is cosine similarity involved in text mining?
Cosine similarity is a metric that helps to find the similarity between the two sentences in the text mining concept. In the cosine similarity concept, the object is considered a vector. Cosine similarity is measured by Theta.
if Theta is equal to 0 then the 2 vectors are similar. if theta is equal to 90 degrees both vectors are dissimilar.
25. What is the second-moment business decision?
In the second moment, the business decision describes the spread of the data. On average how far away is the data from its mean? Mathematically this is calculated by Variance, Standard deviation, and Range. if we found more spread then data will be uncertain. less spread of data is easy to do analysis.
27. What is the difference between Univariate and Bi variate analysis?
In the Univariate analysis, we are analyzing a single variable. We are performing bar charts, boxplots, and histograms. In the bivariate analysis, we are performing on two variables. A Scatter plot helps to identify the relationship between two variables.
28. Where do we use the Matplotlib library in python?
Matplotlib library we used for plotting 2D numerical values. We are using matplotlib for the visual representation of the Data frame. We can create bar, histogram, scatter plot, etc.
29. What is meant by ANN?
The ANN stands for Artificial neural network. It exactly mimics the human brain. Neurons relate to each node. This is consisting of an input layer, hidden layer, and Output layer. This ANN is otherwise called Multi-layer perceptron’s. In deep learning, ANN contributes to all complex data sets.
30. How is the seaborn package useful for data analysis?
Seaborn is one of the advanced Python data visualization libraries based on matplotlib. It gives a high-level interface for attractive and informative statistical analysis. This is an enhanced version of matplotlib. It also helps to get bar charts, scatter, box plots, etc.