Face Recognition using Convolutional Neural Network in Tensorflow

We are going to learn building up CNN model in TensorFlow while working with the face dataset collected by AT&T laboratories Cambridge. This  is going to be a fun ride, we are going to clarify the concepts along with guiding you people about how to write the code. So just keep up with me while I walk you through this journey. Before you start with this I will suggest you have went though my article on TensorFlow so that you won’t find this journey hard.

Where is the data?

The GitHub link to my repository is down below you can fork it and the dataset is imported from the scikit-learn library  sklearn.datasets.fetch_olivetti_faces. I will suggest you keep the code open while I walk you through this journey.

Why and what the heck is this Convolution?


And in terms of the Neural Network this is the first layer of the CNN architecture. You must be wondering what are these numbers, where do we use them and from where are they coming. Good you are right exactly where I wanted you to be. Have you ever wondered what exactly is an image and what exactly is it made of? It actually is made of the three channels RED, GREEN and BLUE. By channel we mean a matrix with values ranging between 0 to 255. And by Convolution we mean multiplying all the channels with some small matrices and getting the different features important to differentiate an image from another.

Do you want to see an example of  what effect does convolution have on an image?

Yes yes I will show you the magic this convolution can do. Check what I did with David cameron’s image using my own custom convolutional filter and with scikit-learn library filter.

Code for Creating Custom Convolutional Filters :- https://github.com/SalilVishnuKapur/MachineLearning_Workbook/blob/master/ML_Assignment6/Soln1.ipynb

Pooling Layer concept is also important to understand to head forward, want to learn ?

Yes we have to learn it because without this we can’t finish this CNN journey. Pooling is just an easy concept, like it did came into picture when there was a need to find the most important part of the image or the least important. So likewise there is a max-pooling concept and min-pooling. Okay, first have a look at this below image.

So what are we getting from this left mini image matrix?

Max values from each coloured part, which is equivalent as extracting out the most significant colour from each part of image which we call as max pooling. And similarly the min pooling is defined as finding the least significant colour from each part of image.

How about starting with the thriller ride, writing down in TensorFlow the CNN model?

Now for this we just need to create a Computational Graph through which we can load data to it, make all the computations and then get the desired results.

Building Computational Graph for Face Detection :-

        1. Creating Placeholders for Loading the Data:    This 4096 is total size of pixels flattened from 1 image whose dimensions were 64*64, that we need to feed into our model. We are keeping None for passing as many images through these placeholders to our CNN. 40 refers to the number of class labels we have.

       2. Writing down the Input, Convolution and Pooling Operation Equation:

             So we are just reshaping the flattened input data into a shape of 64*64. And next we are just taking the weight variables, doing convolution, adding bias to it and finally applying a pooling layer on top of it.

       3. Creating the Utilities for the above functionalities in step 2:

             Weight and bias variable functions are for getting the initial variables values of any requested matrix shape. For doing convolution in all the directions we have this conv2d method which moves the filter all over the image using strides(define how to move the filter all over the image). And using max pooling we are defining a patch size and are taking  the maximum out of each patch of every image.

        4. Adding the Dropout Layer, connecting the layers to the Fully Connected Layer: 

             Now we connect the output channels from the last layer which is 128 with the fully connected output layer. So for this we have to get all the pixels from the image and create the weight and bias variable for the fully connected layers. We also add a dropout layer as well so as train our model with even less data, so that it becomes much more sharp.

        5. Running the computational graph :

                Here we first add to our previous computational graph the loss computation, then loss function optimization and finally executing this computational graph.

        6. Calculating the Prediction Accuracy of the Model:

                Here we are just restoring the stored session and calculating the accuracy of our model.

Code available on github :


Leave a Reply

Your email address will not be published. Required fields are marked *