Linear Regression Interview Questions & Answers
Table of Content
- Is it possible that by the transformation the R SQR value increases so does the RMSE value?
- Which of the following is correct about Heteroscedasticity?
- Which of these points reflect the assumption of multicollinearity?
- The best evaluation metric for linear regression is____ ?
- Which of the following is not an assumption of linear regression?
Is it possible that by the transformation the R SQR value increases so does the RMSE value?
- a) Not possible in Simple linear but possible in Multi linear
- b) Not Possible
- c) Possible
- d) Not possible in Multilinear but possible in Simple linear
Answer - c) Possible
When the data is transformed the intention is to improve the Correlation between the Features and the target. But one of the side effects of this can be that the transformation may increase the correlation but also increase the Error. It may be noted that R2 is a measure of a combination of 2 measures the SSR and the SSE. The SSR measures how much the inputs contribute to the change in the output. In the perspective of simple linear regression, this can be understood as the slope of the model vis-a-vis the baseline. The SSE measures the squared errors ie the difference between the actual and predicted squared. Sometimes after the transformation, the model SSR may increase excessively while the SSE also increases but not to that extent. Therefore the overall R2 may increase but the offset is that RMSE (due to the increased SSE) will also increase
Which of the following is correct about Heteroscedasticity?
- a) The variance of the errors is not constant
- b) The variance of the dependent variable is not constant
- c) The errors are not linearly independent of one another
- d) The errors have non-zero mean
Answer - a) The variance of the errors is not constant
Explanation: The term heteroscedasticity will be called when the variance of the errors is not constant and following high and low error variance or following some patterns like funnel shape is called heteroscedasticity. A residual plot can help us to understand this scenario. Calculate Square residuals and plot the graph by taking squared residuals against the explanatory variable. If the scatterplot plotted between dependent and independent variables are varying in magnitude we can understand this may lead to unequal variances. If this problem exists, the population used in the regression contains unequal variance, and the analysis results may be invalid. To fix this problem we can perform transformations.
Which of these points reflect the assumption of multicollinearity?
- a) There must not be any extreme scores in the data set
- b) An independent variable cannot be a combination of other independent variables
- c) The variance across the variables must be equal
- d) The relationship between your independent variables must not be above r = 0.7
Answer - d) The relationship between your independent variables must not be above r = 0.7
Multicollinearity refers to a situation in which two or more explanatory variables in a multiple regression model are highly linearly related. HDL and LDL are independent variables of the Regression technique. This is an example of perfect collinearity.MultiCollinearity is caused because of the inaccurate use of dummy variables. Multicollinearity generates a high variance of the estimated coefficients so the results will not be accurate. This problem will not allow for the extraction of the individual effects of each independent variable on the target variable. Due to this standard errors may be overestimated and t values are depressed. It can be detected through the Variance Inflation Factor.
The best evaluation metric for linear regression is____ ?
- a) RMSE
- b) MAE
- c) ME
- d) All the above
Answer - a) RMSE
Absolute Error is the amount of deviation in our calculations. It is the difference between the predicted and the actual value. The Mean Absolute Error(MAE) is the mean of all absolute errors. RMSE is best as it is differentiable everywhere. To optimize the squared error, we can determine the derivative and set its expression equal to 0, and solve. But to optimize the absolute error, we require more complex techniques having more calculations. We use the Root Mean Squared Error instead of the Mean squared error so that the unit of RMSE and the dependent variable are equal and the results are interpretable. Mean Absolute Error(MAE) is preferred when we have too many outliers present in the dataset because MAE is robust to outliers whereas MSE and RMSE are very liable to outliers and these start reducing the outliers by squaring the error terms, commonly known as residuals.
Which of the following is not an assumption of linear regression?
- a) There should be linear and additive relationship between input and output
- b) There should be no correlation between residual terms
- c) Inputs should not be correlated
- d) Error terms should not be normally distributed
Answer - d) Error terms should not be normally distributed
Regression is a parametric approach and it makes assumptions for analysis. If the assumptions are not satisfied the results are not fruitful. There should be a linear relationship between x and y i.e 1unit change in x will have a change in y. An additive relationship is the effect of one input on y is independent of other variables in the data. If errors have a relationship we end up with an autocorrelation problem. If the inputs have strong relationships among them it is a multicollinearity problem. Errors should be normally distributed and should have constant variance. If the error terms are non-normally distributed, confidence intervals may become too wide or narrow.
Navigate to Address
360DigiTMG - Data Analytics, Data Science Course Training Hyderabad
2-56/2/19, 3rd floor, Vijaya Towers, near Meridian School, Ayyappa Society Rd, Madhapur, Hyderabad, Telangana 500081