Login
Congrats in choosing to up-skill for your bright career! Please share correct details.
Home / Blog / Interview Questions / Top AI & Deep Learning Interview Questions and Answers
Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of AiSPRY and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 18+ years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.
Table of Content
Let us assume our cost function is C(w) and our penalization c | w | 2. The iterations while performing gradient descent will follow the pattern:
w = w -grad(C)(w) — 2cw = (1–2c)w — grad(C)(w)
If you observe, the weight is multiplied by a factor < 1. Hence, this can be thought of as weight decay.
It is widely known that training Deep Learning networks with multiple (sometimes 10s of them) layers is very difficult because they will be highly sensitive to the initial weights and configuration.
When weights have updated the composition of the inputs to layers in the network can change after each mini-batch and it could be one of the possible reasons for its tough nature.
As a result of this, the learning algorithm will continue to chase a moving target forever. The internal covariate shift is known as the deviation in the input layers distribution.
To remediate this problem, batch normalization can be used. This technique refers to the standardization of the inputs after each batch before passing it through the next layer.
Since the input size and filter sizes are not a machine, we need to use a technique called Padding. This is the method of adding 0s to the input matrix so the filter input size and filter sizes match.
Image dimension = (n, n) = 3 X 3
Filter Dimension = (f,f) = 5 X 5
Padding = 1 (add 1 pixel all around the edges with value 0)
The output dimension will become (n+2p-f+1) X (n+2p-f+1) = 1 X 1
By taking small steps, the gradient descent algorithm works by decreasing the occurrence of error towards the local/global minima.
As per the results obtained from these steps, the weights and biases are updated in the network.
Exploding gradients - this occurs when the gradients become too large causing the weights and biases to overflow or become NaN values.
Vanishing gradients - this scenario is the exact opposite of the exploding gradient problem. Here, the steps become so small that the updates to the weights and biases are negligible. In this case, the network never reaches the minimum value.
There are multiple approaches to fix the exploding gradients problem-
The problem of vanishing gradients occurs in feed-forward networks (FFN) when the error that is back propagated through the hidden layers from the final layer on to the input layer, it becomes so exponentially small that there is no significance in updating the weights. To fix this the easiest way is to use the ReLU (Rectified Linear Unit) as the activating function.
Click here to learn Data Science in Hyderabad
Vanishing gradients occur in FFN (feed-forward networks) when the error that is back propagated becomes too small to force any significant updates in the weights.
It is widely known that training deep learning networks with multiple (sometimes 10s of them) layers is very difficult because they will be highly sensitive to the initial weights and configuration.
A probable reason for the tough nature is that the composition of the inputs to layers in the network can possibly change after each mini-batch when the weights are updated.As a result of this, the learning algorithm will continue to chase a moving target forever. Internal covariate shift is known as the deviation in the input layers distribution.
To remediate this problem, batch normalization can be used. This technique refers to standardization of the inputs after each batch before passing it through the next layer.
Consider an artificial neural network (ANN) with a multi-layer perceptron (MLP) model. The architecture is 128-500-500-2 : input size=128, hidden layers=2, # of neurons in hidden layers=500, output layer=2 (2 class classification).
In such a scenario, the non-trainable parameters could be
This is because, the values of the non-trainable parameters cannot be optimized with training data. When the model goes through back-propagation and updates the weights, that won't impact the hidden layers or the number of neurons in each layer which would be fixed for those epochs or model runs.
Stochastic Gradient Descent - In this variant, the gradients are calculated and weights updated for one training sample at a time
Batch Gradient Descent - in this variant, the gradients are calculated and the weights are updated for the entire datasets in a single shot
Mini-Batch Gradient Descent - in this variant, the gradients are calculated in batches of fixed batch size (which is a hyper parameter that could be tuned for optimal performance later). This variant is the best method to ensure proper utilization of computational resources.
Click here to learn Data Science in Bangalore
Didn’t receive OTP? Resend
Let's Connect! Please share your details here