CNN & its various Layers
Table of Content
The Human Brain is a powerful Machine, it is capable of seeing and capturing the world in seconds to process its meaning, without us even realizing the process behind it. But for Machines, the perception of the visual world is a little different and complex. Artificial Intelligence (AI) has become a significant factor in bridging the gap between human cognitive abilities and machines. Many researchers and AI enthusiasts are on a quest to discover the extraordinary innovative aspects of AI. One of them is the domain of Computer Vision. Inspired by the biological mechanism of the human brain and its perception of the visual world, researchers have developed a set of algorithms that can mimic the essential attributes found in the neural network of the human brain. Algorithms are a crucial part of AI, they are considered the core of deep learning. “An algorithm can be explained as a procedure, process, or rules followed by a computer in calculating or problem-solving operations”. To automate a task such as logical reasoning, data processing, or making machines autonomous, algorithms are employed. Deep learning and Machine Learning have a wide range of applications across the industries, such as self-driving vehicles, facial recognition, industrial safety, medical diagnosis, crime detection, and many more.
The Neural Network
To better appreciate, what deep learning is all about, it’s appropriate to have a better understanding of its parent field; i.e. Machine Learning. In simple words, it is defined as a scientific approach to solving problems and providing solutions using machines. The machine learns from experience gained while performing a task and based on the performance, it improves its learning to further enhance its task operations. Deep learning on the other hand is a subset of machine learning. “It is based on a set of algorithms that are designed to learn and perform high-level data abstraction and presentation using a computing model architecture with multiple layers of non-linear transformations.” These algorithms are usually termed Artificial Neural Networks (ANN). Just like our brain is capable of performing complex computations, scientists have developed an Artificial Neuron (also known as Perceptron) as a powerful modeling tool to compute complex computations in machines. It can be illustrated as a simple tree structure that has many input nodes and a single output node.
Figure 1 The Biological & Artificial Neuron (Source: Deep Learning in Python Tutorial, Datacomp)
Three terms are crucial to ANN: 1. Input: The data that we wish to feed the network is shown as X1, X2..., Xn in figure 1. The input might take the form of a picture, video, or text. It is applied to choose the result. The input value, or neuron, is a word used to describe a numeric number. Activation Purpose: The processing unit for producing output is shown in Figure 2: Structure of a neural network and prediction computation (Source: Basics of Neural Networks @JayAlamar). There are many different kinds of activation functions, including sigmoid, exponential, step, and others. To map the output, these activation functions are applied to the input. 3. Output: An activation function is activated for each input to produce its output. There are always input and output layers in a neural network. Between the input and output layers, in addition to this, there is an intermediate layer where the real processing or computing occurs. ’The Uncovered Layers. It receives data from the input layer, carries out the required computations, and then sends the finished product to the output layer for user review. Each input node/neuron or unit in a neural network is linked to a node from the following layer, and so on. (See Figure 3 for an illustration.
Each neuron link is assigned a particular ‘weight’ and ‘bias’ (referred to as w and b, as in figure.1&2). This is done to create learnable parameters for a machine learning model. Because a Neural Network is an adaptive system. The link between the nodes consists of weight, a value that controls the signal between the nodes. When the input is transmitted from one node to the other, the weights are applied to the inputs along with the bias. Weights act as the strength between two nodes, “weights decide how much influence the input will have on the output”. The higher the weight value of an input, the greater impact it will have on the network. Bias, is another parameter to adjust the output along with the weighted sum of inputs. It allows us to shift the activation function by adding a constant value to the input, it ensures that even when the inputs are zeros, there will be activation in the neuron. The model learns these parameters during the training process and updates them with each new training. The values are randomized initially, as the model continues to learn, the desired values are adjusted for the most accurate output. In mathematical terms, this computation is represented as,
Figure 2 Structure of a neural network and prediction calculation (Source: Basics of Neural Networks @JayAlamar)
Learning is one of neural networks' most important capabilities. It is an adaptable system that can modify its structure in response to incoming input. We can change the network's capacity to precisely learn and evolve its processing by changing the weights. If the model makes an incorrect prediction, the learning model adjusts by changing the weight values to increase its accuracy. Weights remain constant when the output is deemed satisfactory. Depending on the size or kind of the dataset, there could be a number of hidden or processing layers.
Figure 3 An Artificial Neural Network Layers (Source: DataWow, CNN Explained)
This computing process with interconnected layers and nodes works similarly to how our brain solves a problem. The machine similarly, using algorithms can recognize hidden meanings, patterns, correlations in raw data, cluster, and classify it while continuously learning and improving its learning capabilities. There are various classes of neural networks out there. One of the leading ones is Convolutional Neural Network or CNN used to solve Computer Vision complexities.
The Convolutional Neural Network - CNN
The biological visual perception process of living things served as the inspiration for CNN, a well-known deep learning architectural model. The goal is to give robots the capacity to "see" the environment with meaning, just as a person would. They use this capability to a wide range of activities, including object identification, self-driving automobiles, picture processing for medical diagnosis, and so forth. Working with multidimensional visual data is its intended use. They may also be used to non-visual data, such as time-series data, audio, and signals. CNN plays a major role in computer vision. By learning features directly, the CNN algorithm allows developers to achieve extremely accurate recognition results and do away with laborious feature extraction.
Figure 4 CNN and its Layers (Source: CNN Mathlab)
A convolutional neural network has multiple layers that all work together to train, recognise features, analyse images, and turn input into output. Convolutional layers, pooling layers, and a fully linked layer make up the majority of the CNN architecture's layers.
A CNN's inputs are composed of an image's numerical pixel values. Each of CNN's layers creates an activation or feature map when an image is fed into it as an input. It is a process that draws attention to an image's important details. Basic characteristics like edges are detected by the first layer. As we go further into the CNN, the layers start extracting higher-level characteristics like objects, faces, etc., which are then passed on to the next layer, where more sophisticated features are recovered.
Convolutional Layer (Feature Extraction)
The Convolutional Layer is also known as the feature extractor layer because here is where the low-level features of an image are extracted. The word ‘convolutional’ is a mathematical operation that combines two functions, 1. The image matrix and 2. a filter matrix values to produce a feature map. The computation takes place by sliding a filter matrix over the input. A Filter or a Kernel can be of a 5x5, 7x7, 3x3 matrix, in the illustration below, a 3x3 matrix filter is used to perform a convolutional operation to extract interesting features from the input. The result is a Feature Map. The green area is where the convolutional operation is taking place. It is performed by sliding the filter over the input. At every location, elementwise matrix multiplication and the results are summed to produce a feature map.
Figure 5 Convolutional Operation shown in 2D using 3x3 Filter (Source: Applied Deep Learning, Part 4 CNN, @Arden Dertat)
Stride and Padding are two operations that specify the movement of a convolutional filter at each step. Stride is the distance by which the kernel window moves. In Figure 5, we can see the window moves or shifts by 1 position at a time. The stride value by default is 1, there can be larger strides by increasing the value, the result can be a smaller feature map as it can skip potential features. To maintain the source dimensions, padding is used to surround the input with zeros. Illustrated in the figure below, the white area around the blue input is the Padding, this defines how the border is handled, and dimensions of the output match the input.
Figure 6 2D Convolution using a kernel size of 3, Stride of 1 and padding (Source: Types of Convolutions, @Paul-Louis, TowardsDataScience)
The purpose of this layer is to filter features. As the filter/kernel moves over the source, it looks for patterns. There are many predefined Kernels available to filter an image such as sharpen, blur, edge detection. They are used to detect features by altering the image. The next layer after the convolutional Operation in CNN is known as the Pooling Layer. Pooling works similarly to the convolutional operation.
Pooling Layer (Sub-Sampling)
To decrease the number of parameters and speed up computation, a pooling process is used. A convolved feature's dimensions are reduced to achieve this. The pooling layer, which down samples each feature map individually while maintaining the depth, is also known as subsampling. Max Pooling and Average Pooling are the two most prevalent types of pooling operations. The combining Figure 7 Max & Average Pooling Operation (Reference: CNN @Sumit Saha's Guide to Data Science) Below is an illustration of the Figure 8 Flattening Operation (Source: CNN Step 3, SuperDataScience), which takes the average of all the values from the convolved feature. A window is slid across the input in Max's pooling procedure, and the maximum value is taken from the window. There is no filter applied to the picture, but window size and stride value are given similarly to the convolution process. Max Along with a reduction in dimensions, denoising is also accomplished by pooling. In order to capture even more information in an image, the number of convolutional and pooling layers is raised as a picture becomes more complicated. But there are now fewer weights in the input since the input's dimensions have been shrunk. This decrease is noteworthy since CNN-connected architecture uses a lot of weights. The typical pooling process uses a 2x2 window, Stride 2, and no padding.
Figure 7 Max & Average Pooling Operation (Source: Guide to CNN @Sumit Saha, towardsdatascience)
Giving the input to the neural network for classification comes after the pooling process and once the learning model has mastered the features.
Fully Connected Layer (Classification)
The final output and classification of data are done via Fully Connected Layers. Once the output is generated after the convolution and pooling operations, the next step is to flatten the output. This is done to feed the pooling data into an artificial Neural network. In reality, the output is in the form of 3D volumes, this needs to be converted into an acceptable form of numbers through Flattening. That becomes the input to the fully connected layer. Flattening arranges a 3D image into a 1D or a Column vector. The figure below illustrates that multiple pooled feature maps are flattened; the result is an encoded long vector input that can now be passed through the fully connected neural network for further processing.
Figure 8 Flattening Operation (Source: CNN Step 3, SuperDataScience)
A fully connected network's primary objective is to take the input and combine the features into a greater range of attributes for effectively categorising pictures. This is necessary to obtain the final output since the data must be classified into the appropriate classes. In this procedure, a back-propagation method using the weights and bias we previously explained is used. Another procedure that assesses the prediction errors in this layer is the loss function. Up until it reaches the target state, the network optimises and repeatedly assesses its learning.
Figure 9 Fully Connected Network Operation (CNN Step 4, SuperDataScience)
In the diagram below, we can see the process of CNN and its layers. This article briefly discussed a few key concepts behind CNN in simplified terms. Many layers form a CNN architecture. It is a combination of feature extraction (Convolutional + Pooling Layer) and classification (Fully Connected Layer) and further operations. The Convolutional Layers are considered the building blocks of CNN as they detect meaningful features through layers. The Fully connected layers learn to use these extracted features to accurately classify the output. Artificial Intelligence Course in Bangalore
Artificial intelligence has undergone a significant shift as a result of the Convolutional Neural Network, or CNN. LeNet, AlexNet, VGGNet, DenseNet, and other prominent CNN designs were developed by academics to construct effective and improved deep learning algorithms. Numerous computer vision applications use them. Its accomplishments are outstanding and helpful in several areas. In recognising unusual patterns, a well trained convolutional network may even outperform a person.
Data Science Training Institutes in Other Locations
Agra, Ahmedabad, Amritsar, Anand, Anantapur, Bangalore, Bhopal, Bhubaneswar, Chengalpattu, Chennai, Cochin, Dehradun, Malaysia, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Gwalior, Hebbal, Hyderabad, Jabalpur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Khammam, Kolhapur, Kothrud, Ludhiana, Madurai, Meerut, Mohali, Moradabad, Noida, Pimpri, Pondicherry, Pune, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thane, Thiruvananthapuram, Tiruchchirappalli, Trichur, Udaipur, Yelahanka, Andhra Pradesh, Anna Nagar, Bhilai, Borivali, Calicut, Chandigarh, Chromepet, Coimbatore, Dilsukhnagar, ECIL, Faridabad, Greater Warangal, Guduvanchery, Guntur, Gurgaon, Guwahati, Hoodi, Indore, Jaipur, Kalaburagi, Kanpur, Kharadi, Kochi, Kolkata, Kompally, Lucknow, Mangalore, Mumbai, Mysore, Nagpur, Nashik, Navi Mumbai, Patna, Porur, Raipur, Salem, Surat, Thoraipakkam, Trichy, Uppal, Vadodara, Varanasi, Vijayawada, Vizag, Tirunelveli, Aurangabad
Navigate to Address
360DigiTMG - Data Analytics, Data Science Course Training Hyderabad
2-56/2/19, 3rd floor, Vijaya Towers, near Meridian School, Ayyappa Society Rd, Madhapur, Hyderabad, Telangana 500081