Introduction to Convolutional Neural Network(CNN)

Convolutional neural networks (CNNs)

Posted by : Md. Jahid Hasan at Apr 13, 2020

Category : CNN

👉Part : 01

✍ Topic : Convolutional neural networks (CNNs)

Introduction

Convolutional neural networks (CNNs) emerged from the study of the brain’s visual cortex, and they have been used in image recognition since the 1980s. These studies of the visual cortex inspired the neocognitron, introduced in 1980,4 which gradually evolved into what we now call convolutional neural networks. An important milestone was a 1998 paper by Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner, which introduced the famous LeNet-5 architecture, widely used to recognize handwritten check numbers.

Artificial Intelligence has been witnessing a monumental growth in bridging the gap between the capabilities of humans and machines. Researchers and enthusiasts alike, work on numerous aspects of the field to make amazing things happen. One of many such areas is the domain of Computer Vision. The agenda for this field is to enable machines to view the world as humans do, perceive it in a similar manner and even use the knowledge for a multitude of tasks such as Image & Video recognition, Image Analysis & Classification, Media Recreation, Recommendation Systems, Natural Language Processing, etc. The advancements in Computer Vision with Deep Learning has been constructed and perfected with time, primarily over one particular algorithm — a Convolutional Neural Network.

About CNN

A convolutional neural network (CNN) is a type of artificial neural network used in image recognition and processing that is specifically designed to process pixel data.

CNNs are powerful image processing, artificial intelligence (AI) tools that use deep learning to perform both generative and descriptive tasks.They power image search services, self-driving cars, automatic video classification systems, and more. Moreover, CNNs are not restricted to visual perception: they are also successful at other tasks, such as voice recognition or natural language processing (NLP).

A simple Architecture of CNN

Nowadays, Deep Convolutional Neural Network (also called ConvNet) leverage spatial information and are therefore very well suited for image classification. A CNN mainly comprised of three layers namely convolutional layer, pooling layer and fully connected layer. These three layers can be repeatedly used to form a deep CNN architecture shown in Fig. 1. A particular organization of these layers begins with input layer and finish up with output layer. Following subsections contains description of these layers.

👉 Input Layer

Feature is an important and unique property of an image. The first layer in every CNN is input layer .It retains the image’s raw pixel value. Input image can be fed to the input layer of the CNN by applying pre-processing technique in order to achieve better accuracy. Formation of an input image would-be (image height) x (image width) x (image depth). For RGB image, the image depth is equal to 3 but for grayscale images it would be 1.

👉 Convolution Layer

Convolutional layer is the major building block of CNN. It is consist of a set of learnable convolution filters or kernels. These filters can learn feature representations of the input image to create a feature map. Each neuron in a feature map is connected via set of trainable weights to a neighborhood of neurons in the previous layer. In order to obtain a new feature, the input feature maps are first convolved with a learned kernel and then the results are passed into a nonlinear activation function. We will get different feature maps by applying different kernels. The typical activation functions are sigmoid, tanh and ReLU [20]. Several features can be extracted at each location by applying different feature maps within the same convolutional layer.

👉 Pooling Layer

Pooling layer is responsible for extracting dominant features by reducing the spatial resolution of the feature maps. This helps to achieve spatial invariance to input distortions and translations as well as increased the computational performance. It is usually placed between convolutional layers. The size of feature maps in the pooling layer is determined according to the moving step of kernels. The typical pooling operations are average pooling and max pooling. We can extract the high-level characteristics of inputs by stacking several convolutional layers and pooling layers.

👉 Fully connected Layer

In a fully connected layer each neuron is connected to every neuron in the previous layer, and each connection has its own weight. Thus, it is very expensive in terms of memory (weights) and computation (connections).This layer flattens the input feature representation into a feature vector and performs the function of high level reasoning.

👉Output Layer

The last layer of CNN is output layer. It is responsible for producing the output probability of each given input class. A softmax unit is used to obtain the output probability. Softmax is commonly used because of it generate a well-performed probability distribution. The probability of each output classes sum up to 1. The class containing the largest value will be the correct class.

CNN applications into the following areas:

Image recognition and OCR
Object detection for self-driving cars
Face recognition on social media
Image analysis in healthcare
Face Frontal View Generation
Generate Cartoon Characters
Image-to-Image Translation
Text-to-Image Translation
Generate New Human Poses
3D Object Generation
Clothing Translation
Photograph Editing
Video Prediction
Photos to Emojis
Photo Inpainting
Super Resolution
Photo Blending