In [12]:
%%html


# Convolutional Neural Networks

In the previous network we reduced our images to 128-dimensional vectors of eigenface weights.
This dimensionality reduction greatly reduces the amount of information that needs to be processed.
Applying traditional (fully connected) neural networks to raw images rapidly becomes impractical.
Even for our very small (128x128) images and a moderate sized hidden layer (say 256 elements) we would have 
\\(128 \times 128 \times 256 = 4,194,304\\) weights to learn.

Convolutional Neural Networks (CNNs) get around this by using two specific types of network architecture - *convolutional* layers and *pooling* layers.

A convolutional layer consists of small (almost always square) arrays of weights which are applied as a filter at each point in the image. 
Typically these are \\(3 \times 3\\) or \\(5 \times 5\\) filters, and the weights are learned through training.
Generally a convolutional layer will consist of a number of filters, so you might have a layer which consists of 128 \\(5 \times 5\\) filters, which is just 3200 weights to learn - fewer even than a fully connected network between two 64-unit layers.

A pooling layer performs data reduction, usually by taking the maximum value over some window. 
A \\(2 \times 2\\) max-pooling layer will divide its input into a grid of \\(2 \times 2\\) squares, and replace each set of 4 units by a single unit whose activation is the maximum of the inputs.
This reduces the amount of information to be processed by later layers, while keeping the strongest signals from earlier layers.

A CNN typically consists of interleaved convolutional and pooling layers, followed by one or more fully-connected layers to do final classification.

The final trick to getting CNNs to work is the RELU (rectified linear unit) activation function.
This is zero if the unit's total value is less than 0, and the identity function if it is positive.
RELU allows for non-linearity while keeping gradient computations simple (its gradient is 0 if negative, and 1 if positive).

The example below has the following layers:
- The input is \\(128 \times 128\\) images
- A convolutional layer with six \\(5 \times 5\\) filters produces a \\(128 \times 128 \times 6\\) array of output
- A max-pooling layer reduces this to \\(64 \times 64 \times 6\\)
- RELU is applied
- A convolutional layer with sixteen \\(5 \times 5\\) filters produces a \\(64 \times 64 \times 16\\) array of output
- A max-pooling layer reduces this to \\(32 \times 32 \times 16\\)
- RELU is applied
- The output is flattened into a 1D array
- A fully connected layer with 128 outputs is applied
- RELU is applied
- A final fully connected layer gives a single output 0/1 for male/female

Even this relative small network takes a long time to train, so the code below is set up to apply just a few epochs.

In [4]:
import numpy as np
import tensorflow as tf
import cv2

# Go through the file to count number of samples
n = 0
for line in open("MITFaces/faceDR"):
 n += 1

# Image data, n 128x128 images
I = np.zeros([n, 128, 128])
 
# Output labels
L = np.zeros(n) 

ix = 0
for line in open("MITFaces/faceDS"):
 parts = line.split()
 fileName = "MITFaces/rawdata/" + parts[0]
 fileIn = open(fileName, 'rb')
 rawdata = np.fromfile(fileIn, dtype=np.uint8,count=128*128)/255.0
 I[ix,:,:] = rawdata.reshape(128,128)
 fileIn.close() 
 if parts[2].startswith('female'):
 L[ix] = 1
 ix = ix+1

In [11]:
# ConvNet adapted from
# https://medium.com/data-science-group-iitr/building-a-convolutional-neural-network-in-python-with-tensorflow-d251c3ca8117

def new_conv_layer(input, num_input_channels, filter_size, num_filters, name):
 
 with tf.variable_scope(name) as scope:
 # Shape of the filter-weights for the convolution
 shape = [filter_size, filter_size, num_input_channels, num_filters]

 # Create new weights (filters) with the given shape
 weights = tf.Variable(tf.truncated_normal(shape, stddev=0.05))

 # Create new biases, one for each filter
 biases = tf.Variable(tf.constant(0.05, shape=[num_filters]))

 # TensorFlow operation for convolution
 layer = tf.nn.conv2d(input=input, filter=weights, strides=[1, 1, 1, 1], padding='SAME')

 # Add the biases to the results of the convolution.
 layer += biases
 
 return layer, weights
 

def new_pool_layer (input, name):
 
 with tf.variable_scope(name) as scope:
 # TensorFlow operation for convolution
 layer = tf.nn.max_pool(value=input, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
 
 return layer

def new_relu_layer(input, name):
 
 with tf.variable_scope(name) as scope:
 # TensorFlow operation for convolution
 layer = tf.nn.relu(input)
 
 return layer

def new_fc_layer(input, num_inputs, num_outputs, name):
 
 with tf.variable_scope(name) as scope:

 # Create new weights and biases.
 weights = tf.Variable(tf.truncated_normal([num_inputs, num_outputs], stddev=0.05))
 biases = tf.Variable(tf.constant(0.05, shape=[num_outputs]))
 
 # Multiply the input and weights, and then add the bias-values.
 layer = tf.matmul(input, weights) + biases
 
 return layer

# Training parameters
batchSize = 199 
batchesPerEpoch = 10

# Input layer - batchSize 128x128 images at a time
x = tf.placeholder(tf.float32, shape=[batchSize, 128, 128,1], name='x')

# Output layer - batchSize 1-D outputs
y = tf.placeholder(tf.float32, shape=[batchSize, 1], name='y')

 
# Convolutional Layer 1
layer_conv1, weights_conv1 = new_conv_layer(
 input=x, num_input_channels=1, 
 filter_size=5, num_filters=6, name ="conv1")

# Pooling Layer 1
layer_pool1 = new_pool_layer(layer_conv1, name="pool1")

# RelU layer 1
layer_relu1 = new_relu_layer(layer_pool1, name="relu1")

# Convolutional Layer 2
layer_conv2, weights_conv2 = new_conv_layer(
 input=layer_relu1, num_input_channels=6, 
 filter_size=5, num_filters=16, name= "conv2")

# Pooling Layer 2
layer_pool2 = new_pool_layer(layer_conv2, name="pool2")

# RelU layer 2
layer_relu2 = new_relu_layer(layer_pool2, name="relu2")

# Flatten Layer
num_features = layer_relu2.get_shape()[1:4].num_elements()
layer_flat = tf.reshape(layer_relu2, [-1, num_features])

# Fully-Connected Layer 1
layer_fc1 = new_fc_layer(layer_flat, num_inputs=num_features, num_outputs=128, name="fc1")

# RelU layer 3
layer_relu3 = new_relu_layer(layer_fc1, name="relu3")

# Fully-Connected Layer 2
layer_fc2 = new_fc_layer(input=layer_relu3, num_inputs=128, num_outputs=1, name="fc2")

loss = tf.reduce_mean(tf.squared_difference(layer_fc2, y))

# Training method - gradient descent minimisation
training_step = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

# Training
init = tf.global_variables_initializer()
sess = tf.Session()

sess.run(init)

for epoch in range(11):
 for batch in range(batchesPerEpoch):
 startIx = batch*batchSize
 endIx = (batch+1)*batchSize
 x_sample = I[startIx:endIx,:,:].reshape(batchSize, 128,128,1)
 y_sample = L[startIx:endIx].reshape(batchSize,1)
 sess.run(training_step, feed_dict={x: x_sample, y: y_sample})
 print("_"*80)
 print('Epoch: ', epoch)
 print('Loss: ', sess.run(loss, feed_dict={x: x_sample, y: y_sample}))
print("_"*80)

________________________________________________________________________________
Epoch: 0
Loss: 0.290118
________________________________________________________________________________
Epoch: 1
Loss: 0.284181
________________________________________________________________________________
Epoch: 2
Loss: 0.284118
________________________________________________________________________________
Epoch: 3
Loss: 0.284135
________________________________________________________________________________
Epoch: 4
Loss: 0.284149
________________________________________________________________________________
Epoch: 5
Loss: 0.284161
________________________________________________________________________________
Epoch: 6
Loss: 0.284168
________________________________________________________________________________
Epoch: 7
Loss: 0.284174
________________________________________________________________________________
Epoch: 8
Loss: 0.284176
_______________________________________________________