Call Us

Home / Blog / Interview Questions / Top Machine Learning Interview Questions & Answers

Top Machine Learning Interview Questions & Answers

  • January 09, 2023
  • 4590
  • 51
Author Images

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of Innodatatics Pvt Ltd and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 18+ years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

Read More >

Table of Content

  • What is the difference between AI, Data Science, ML, and DL?

    Artificial intelligence (AI) is an area of computer science that emphasizes on artificially introducing human intelligence in a system or a program.It mimics human behavior and performs tasks similar to human beings.AI systems are usually built using deep learning.

    Data Science is a subset of AI which focuses majorly on data, which involves developing methods to store and analyze the data to effectively extract meaning insights.

    Don't delay your career growth, kickstart your career by enrolling in this Machine Learning Certiication Course with 360DigiTMG.

    Learn the core concepts of Data Science Course video on YouTube:

    Machine learning is a subset of data science which focuses mostly on building models by using the data. The models are built in such a way where they can learn themselves and improve efficiency from the rewards and experience.

    Deep Learning is a subset of Machine Learning which is used to build AI systems, it teaches the computer to do human activities that come naturally. Deep learning is a key technology behind speech recognition, driverless cars, text generations, and many more which a normal human can do.

    Machine Learning Interview
  • What is the difference between Supervised Learning, Unsupervised Learning, and Reinforcement Learning?

    There can be different types of Machine Learning, a few of them which are used majorly are

    Supervised Learning: In this type of Machine Learning, both the outputs and the inputs are known. The algorithm learns on a labelled dataset, to generate reasonable predictions on a new data set.

    There are different types of Supervised learning:

    • Classification: Predict a categorical class.
    • Regression: Predict a numerical value
    • Recommendation: Predict user’s preferences from a large pool of options.
    • Retrieval: Predict RELEVANCE of an entity to a “query”
    Machine Learning Interview

    Unsupervised learning: In this type, the output is unknown. It tries to make sense of unlabelled data by extracting features, co-occurrence, and underlying patterns on its own.

    There are different types of Unsupervised learning

    Also, check this Machine Learning Course in Pune to start a career in Machine Learning.

    • Clustering
    • Anomaly Detection
    • Association
    • Autoencoders
    Machine Learning Interview

    Reinforcement Learning: Reinforcement learning is less supervised and depends on the learning agent in determining the output solutions by arriving at different possible ways to achieve the best possible solution.

    Machine Learning Interview
  • What do you understand by the term Regression and Classification?

    yourself a promising career in Machine Learning Training center in Chennai by enrolling in the Machine Learning Program offered by 360DigiTMG.

    Classification is used to produce discrete results, it is used to classify data into some specific categories. For example: If a person has cancer or not. Regression is used when dealing with continuous data. For example, predicting a house price.

  • What is Overfitting and how to ensure that your model is not Overfitting?

    When the testing error is greater than the training error or if there is a huge variance between testing and training error then the model is said to overfitting. We can avoid overfitting by using regularization techniques such as Lasso, k fold cross-validation, and try to avoid noise by considering fewer variables.

  • What do you understand by ‘Training set’ and ‘Test Set’?

    When working with the data in Machine Learning, the data is divided into testing and training. The training data is used to train the model and the testing data is used to test the model to get the accuracy of the prediction.

  • What do you understand by Type I vs Type II error?

    Type I error is called False positive(A particular event that occurred has been incorrectly classified as not occurred event. Type II error is equivalent to a False-negative (A particular event that did not occur has been incorrectly classified as the event occurred).

  • What are Loss Functions and Cost Functions?Is there any difference between them?

    When calculating loss only for a single data point, then we use the term as Loss Function but when calculating the sum of error for multiple data points then the term Cost Function is used. There is no major difference,the functionality of but the functions is used to calculate the loss.

  • What is Feature Engineering? What is the difference between model-centric and feature centric?

    The process of extracting features from raw data or deriving new features from the existing features is called Feature Engineering.

    Model centric approach: Considering all the features and building a complex model is called a model-centric approach.

    Feature centric approach: Carefully selecting, crafting the features, and building a model is Feature centric approach.

  • What is an Imbalanced dataset?

    A dataset is called imbalance when there is no equal distribution of classes among datasets. A number of observations per class will vary. It mostly occurs when dealing with a classification problem.

    A large amount of data for one class and fewer amount of data for one or more classes

    You can avoid imbalance data using

    • Random Resampling - Under & Over Sampling
    • K fold Cross-Validation
    • SMOTE - Synthetic Minority Oversampling Technique
    • MSMOTE - Modified SMOTE
    • Cluster Based sampling
    • Ensemble Techniques - Bagging & Boosting
    • Ada Boost
    • Gradient Tree Boosting
    • XG Boost
  • What to do if I have a dataset with missing values and not normally distributed?

    Use Mean, Median, or Mode for the missing values by looking at the dataset. To make the data normally distributed, we have to use transformation functions like log, exponential, sqrt, or reciprocal.

  • Predicting the next action of a person, explain which type of Machine Learning type will you use and why?

    It is a multi-classification problem as the output of the variable is categorical.

    Classification is used when the Y i.e the target variable is categorical. If there are only two classes then the problem would be classified as binary classification and if there are multiple target classes then it would be considered as a multi-classification problem.

    Now in the example provided, there can be multiple classes i.e., multiple actions that a person can perform. Now as there are multiple actions it falls under the category of multiple classes and also the actions are known(Y) hence it falls under the category of Supervised learning.

  • Explain about Bias and Variance tradeoff, why is it important?

    There can be two types of error, the first type of error is when there is a difference between the actual value and the predicted value. The predicted value is a value that is obtained when a model is trained on historical data also known as fitted value. This type of error is called Bias.

    Now, there can be another type of predicted value which is obtained when a model is trained on new data. The difference between the actual value and the predicted value will give you an error. This type of error as a variance.

    So, when there is a huge bias it means the model has not learned the relevant relationship between the features and the target values. This is also called as Underfitting. And, when there is a high variance it means that the model has learned the data well so that it includes the noise. This is also called Overfitting.

    We need to have a model that is neither underfitting and nor overfitting. which is called as Bias-Variance trade-off.

    If there is no balance between variance and bias then the model will not be able to generalize a failure to perform well on new data.

  • Which one is important? Model’s Performance or the Model’s Accuracy?

    Model accuracy is a subset of model performance. There can not be a situation where the model’s performance is best and the model’s accuracy is worst. But there can be a situation where the model’s accuracy is best and the performance of the model is worst. This can be avoided by a confusion matrix or cross table. Therefore better the model’s performance, the better will be the results.

  • What is Entropy?

    Entropy is a measure of randomness in the information which is being processed. If the entropy is high then it will be very difficult to conclude from the information. For example, flipping a coin can provide random information. A flipped coin can have any outcome for ‘n’ number of tosses which is very difficult to predict the outcome. This is what exactly is the Entropy.

  • What is Cross-Entropy?

    It is a measure of non-similarity between the two probability distributions or classes.

    Cross entropy is represented with the below expression:

    Machine Learning Interview

    Where ‘x’ represents the predicted results by a model and p(x) represents probability distributions of true labeled training examples and q(x) represents the predicted values by a model.

    There can be two types of cross-entropy, one is binary cross-entropy and categorical cross-entropy. Binary cross-entropy is used when there are two output classes and categorical cross-entropy is used when there are multiple output classes.

  • In which cases you would prefer cross-entropy over MSE?

    Cross entropy is preferred for classification problems, while MSE is best when used for regression problems. When dealing with classification problems MSE does not penalize the misclassification enough.

    Let's consider an example where we have predicted an image as a cat, but the model predicts the image as “not a cat” with 95% accuracy. Remaining 5 % as “cat”.If we use MSE, the error can be as less as possible maybe around 0.025 but as we observe there is a lot of error.

    Are you looking to become a Machine Learning Engineer? Go through 360DigiTMG's Machine Learning Certification in Bangalore.

    But when using cross-entropy for the same example, the loss can be around 1.03, hence the penalty is more, and therefore the model learns better.

  • How are K nearest neighbour, K-means clustering, and K-Fold cross-validation different from each other?

    As the word K sometimes makes us confused between these three techniques. But there is a lot of difference between these techniques.

    K means clustering is used for solving Unsupervised learning problems where there is no Y. It divides the data into some clusters such that the cluster which is formed is homogeneous in nature, also the points in each cluster are very close to each other.

    K nearest neighbours are used to solve Supervised learning problems. It classifies the unlabelled data point using its neighbouring data points. The number here can be anything and it represents the number of neighbours that it has to consider while labelling the unlabelled data point. As the training time related to it is very less hence it is called the lazy learner.

    Want to learn more about Machine Learning Course. Enroll in this Machine Learning Training Institute in Hyderabad to do so.

    K-Fold cross-validation is a technique that is normally used in supervised learning. It is used when the dataset is small. Training the model on small datasets might end overfitting the model and may also fall into an imbalance dataset category. So to overcome such issues, K-Fold cross-validation is used. In this technique, the k number decides the number of folds, where each fold will have a certain percentage of training data and testing data.

  • Explain about Semi-Supervised Learning and Active learning?

    These are the types of Machine Learning.

    The most famous ones are Supervised Learning, Unsupervised Learning, and Reinforcement Learning, but there are few more that add up to this list.

    One of them is Semi-Supervised Learning. As the name says, it is a combination of Supervised and Unsupervised Learning. Here a few of the data will be labelled and most of the data will be unlabeled. A model can be built on labelled data and the same model can be used to label unlabelled data. Here this unlabelled data can be considered as test data.

    Real-time example: If a child is taught only 5 different types of cars in 20 cars, the child is still going to identify the remaining 15 cars with the knowledge of 5 cars.

    Active Learning: Now this type of learning is a special case of Machine Learning where the model will actively query a user to get the unlabeled data labelled. There are certain situations where

    unlabeled data is huge but manual labelling is expensive, in such cases, the model can actively ask questions to the users and label the data by itself. Let’s say if we want to detect faces on YouTube videos, the data to the model would be bounding boxes around the faces in each video frame. But there are so many frames that need to be labelled. In such cases, Active Learning can be used. Where it not only learns the task but also tells us what labels would be most useful.

    Real-Time Example: Facebook asking to face identifications.

  • Explain Covariance and Correlation?

    Covariance measures the relationship between the two data points and the difference between the data points. If we get a positive value it means there is a direct relationship between the data points. One would increase or decrease with an increase or decrease in the bias by making other variables constant.

    The correlation will give the relationship between two random variables. There are only three values that are considered here,1,0 and -1. Where 1 represents the positive relationship-as one of the variable value increases the other variable value also increases through a linear rule t. If the value is -1 then it represents negative relationship-as one of the variable value increases the other variable value decreases through a linear rule. And if the value is 0 then it represents that the two variables are not related to each other, they are independent variables.

    If the value falls between 0 and 0.3 then it indicates the weak positive relationship between the variables.

  • Explain different data types with an example?

    There are different types of data:

    • Continuous
    • Discrete
    • Structured
    • Unstructured
    • Time series data

    Continuous: When the values in the data are floating-point numbers or the decimal numbers then such type of dataset is called Continuous data. There are again two types of continuous data, Interval data, and ratio data.

    Interval data type is subjective, it means the data is always influenced by different factors.

    For example, a person sitting in the office with AC on can feel hot for a 29 deg celsius whereas a person working at a construction site may feel it normal. The temperature value is subjective here.

    A Ratio data type is an objective, it means the data value is universally accepted.

    For example, the value of pi is 3.14 and it is not influenced by any other factor. Such type of data is called Ratio.

    Discrete: When the values in the data type are integer number or the real number then it is called a discrete data type.

    There are two types of discrete data types, Categorical, and Count. If the data has some categories or classes then it is called as Categorical data

    For example, a person performing different actions in a video can be classified as Categorical data.

    The categorical data is again classified as binary and multiple. A binary categorical data will have only two categories or only two classes.

    For example, a person having COVID-19 or not.

    Categorical data with multiple classes can be classified as multi categorical data.

    For example, categorizing a movie review, it can have a positive review, negative review, or neutral.

    The categorical data is again classified into two categories, Nominal and Ordinal. In nominal data type, the order of the data doesn't matter. Considering the above example, a person performing different actions in a video can be classified as Nominal data as there is no specific order of actions.

    But, a person’s order of education cant be altered. He has to finish his primary education to reach secondary and then to finish his masters he has to finish his degree. So, here the order can be modified, if modified then the data will not make any sense. Such type of data is called an Ordinal data type.

    Structured: Structured data will have a well-defined structure and it will follow a consistent order. It can also be easily accessed and used by a person or a computer. This type of data is usually stored in well-defined schemes like the database. It will have multiple numbers of rows and columns.

    Unstructured: It is a data type that does not have a well-structured format(no rows or columns). Text, images, audio, and video are different examples of Unstructured data.

    Time series data: Sequence of data that has a time factor associated with it is called a Time series data. Each data point will be associated with a time.

    Stock price prediction, House prices for a decade predicting temperature values are the different examples of Time series data.

  • There are multiple Machine Learning algorithms, how do you make sure which one to use?

    This completely depends on the data type that we have. If the data is discrete we would make use of perception or SVM. If the data type is continuous we would make use of Linear regression or an MLP.

    So there is no specific way or a metric which lets us know the Machine Learning algorithms, it’s all about exploratory data analysis, where we would understand the business problem and understand the data depending on the domain knowledge and come up with the best-fit algorithm for a particular dataset.

  • What is the difference between Inductive learning and Deducting learning?

    In Inductive learning, the model learns from set a of observed instances to draw a general conclusion whereas, in Deductive learning, the model fiesta applies the conclusions to draw conclusions.

    To explain this with an example, consider you teaching a kid fire burns. You can teach this in two ways, one by showing them pictures of the fire incidents and label them as hazardous, and the other method can be by asking the kid to play with the fire, and experience the fire burns.

    The first method is called Inductive and the second method is called Deductive.

  • What is Precision, Recall, Specificity, and F1 ratio?

    Consider an example of a model built to predict a person having a disease or not.

    Considering the above context, Precision is identifying a random patient with a disease as having a disease.

    Precision is calculated using:


    Where TP is True positive and FP is False Positive(TP: A person is correctly classified as having a disease and FP is a person incorrectly classified as having the disease, given he doesn't have the disease)

    Sensitivity or hit rate or recall is the proportion of people with the disease who are correctly classified as having a disease.

    Sensitivity is calculated using:


    Where TP is truly positive and FN is a false negative (TP: A person correctly classified as having a disease and FN is a person incorrectly classified as he doesn’t have the disease).

    Specificity or True Negative is the proportion of people with no disease being correctly classified as not having any disease.

    Specificity is calculated using:


    Where TN is true negative and FP is false positive (TN: A person correctly classified as not having a disease and FP is a person incorrectly classified as he has the disease)

    F1 ratio is calculated using the below formula:

    F1=2*(P*R/P+R) where P-Precision and R- Recall

    The value of F1 will always call between 0-1 and it defines a measure that balances the precision and recall

  • What do you understand by ensemble learning?

    Ensemble learning is a method of combining multiple models to improvise on the stability and the predictivity of the model.

    There are different reasons for a model to be different. Few of the top reasons for a model to be different are-

    • Different Population
    • Different Hypothesis
    • Different Modeling Techniques
    • Different Initial Seed

    When dealing with model training and testing, we will experience an error. This error can be broken down into Bias, Variance, and Irreducible error.

    Now the model should always have a balance between the bias and the variance which we term it as a bias-variance trade-off.

    Now, this ensembling learning is a way to perform this trade-off.

  • Explain different Ensemble techniques?

    Some of the top ensemble techniques are:

    • Bagging: In this type of technique, the historical dataset is divided into multiple datasets, and on each dataset, there will be a classifier that is used and then the mean is calculated from the results from the different classifiers. In general, there can be different learners of different datasets.
    • Boosting: It is an iterative process that will adjust the weight of an observation based on the past classification. If an observation was incorrectly classified then it tries to increase the weights of that particular observation or visa versa. Boosting will decrease the bias error and builds strongly predicted models.
    • Stacking: It is a technique that combines multiple classifiers and regressors using meta classifiers and meta regressors. The base-level models are usually trained on complete data set and then the meta-model is trained on outputs of the base model.
  • What is one hot encoding and label encoding?

    One hot encoding is used to represent categorical data as binary data. Label Encoding is changing labels into numbers. The dimensionality of the data set can be increased using one-hot encoding. The dimensionality of the data set doesn’t get affected by Label encoding. Using one-hot encoding new variables are created for each level in the variable whereas, for Label encoding, the levels of variables will be encoded to 1 or 0.

Data Science Placement Success Story

Macine Learning Training Institutes in Other Locations

Ahmedabad, Bangalore, Chengalpattu, Chennai, Hyderabad, Kothrud, NoidaPune, Thane, Thiruvananthapuram, TiruchchirappalliYelahanka, Andhra Pradesh, Anna Nagar, BhilaiCalicut, Chandigarh, Chromepet, Coimbatore, Dilsukhnagar, ECIL, Faridabad, Greater Warangal, Guduvanchery, Guntur, Gurgaon, Guwahati, Indore, Jaipur, Kalaburagi, Kanpur, Kharadi, Kochi, Kolkata, Kompally, Lucknow, Mangalore, Mumbai, Mysore, Nagpur, Nashik, Navi Mumbai, Patna, Porur, Raipur, Salem, Surat, Thoraipakkam, Trichy, Uppal, Vadodara, Varanasi, Vijayawada, Vizag, Tirunelveli, Aurangabad


Navigate to Address

360DigiTMG - Data Science, Data Scientist Course Training in Bangalore

No 23, 2nd Floor, 9th Main Rd, 22nd Cross Rd, 7th Sector, HSR Layout, Bengaluru, Karnataka 560102


Get Direction: Data Science Course

Make an Enquiry