{ "cells": [ { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Convolutional Neural Networks\n", "\n", "In the previous network we reduced our images to 128-dimensional vectors of eigenface weights.\n", "This dimensionality reduction greatly reduces the amount of information that needs to be processed.\n", "Applying traditional (fully connected) neural networks to raw images rapidly becomes impractical.\n", "Even for our very small (128x128) images and a moderate sized hidden layer (say 256 elements) we would have \n", "\\\\(128 \\times 128 \\times 256 = 4,194,304\\\\) weights to learn.\n", "\n", "Convolutional Neural Networks (CNNs) get around this by using two specific types of network architecture - *convolutional* layers and *pooling* layers.\n", "\n", "A convolutional layer consists of small (almost always square) arrays of weights which are applied as a filter at each point in the image. \n", "Typically these are \\\\(3 \\times 3\\\\) or \\\\(5 \\times 5\\\\) filters, and the weights are learned through training.\n", "Generally a convolutional layer will consist of a number of filters, so you might have a layer which consists of 128 \\\\(5 \\times 5\\\\) filters, which is just 3200 weights to learn - fewer even than a fully connected network between two 64-unit layers.\n", "\n", "A pooling layer performs data reduction, usually by taking the maximum value over some window. \n", "A \\\\(2 \\times 2\\\\) max-pooling layer will divide its input into a grid of \\\\(2 \\times 2\\\\) squares, and replace each set of 4 units by a single unit whose activation is the maximum of the inputs.\n", "This reduces the amount of information to be processed by later layers, while keeping the strongest signals from earlier layers.\n", "\n", "A CNN typically consists of interleaved convolutional and pooling layers, followed by one or more fully-connected layers to do final classification.\n", "\n", "The final trick to getting CNNs to work is the RELU (rectified linear unit) activation function.\n", "This is zero if the unit's total value is less than 0, and the identity function if it is positive.\n", "RELU allows for non-linearity while keeping gradient computations simple (its gradient is 0 if negative, and 1 if positive).\n", "\n", "The example below has the following layers:\n", "- The input is \\\\(128 \\times 128\\\\) images\n", "- A convolutional layer with six \\\\(5 \\times 5\\\\) filters produces a \\\\(128 \\times 128 \\times 6\\\\) array of output\n", "- A max-pooling layer reduces this to \\\\(64 \\times 64 \\times 6\\\\)\n", "- RELU is applied\n", "- A convolutional layer with sixteen \\\\(5 \\times 5\\\\) filters produces a \\\\(64 \\times 64 \\times 16\\\\) array of output\n", "- A max-pooling layer reduces this to \\\\(32 \\times 32 \\times 16\\\\)\n", "- RELU is applied\n", "- The output is flattened into a 1D array\n", "- A fully connected layer with 128 outputs is applied\n", "- RELU is applied\n", "- A final fully connected layer gives a single output 0/1 for male/female\n", "\n", "Even this relative small network takes a long time to train, so the code below is set up to apply just a few epochs." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import tensorflow as tf\n", "import cv2\n", "\n", "# Go through the file to count number of samples\n", "n = 0\n", "for line in open(\"MITFaces/faceDR\"):\n", " n += 1\n", "\n", "# Image data, n 128x128 images\n", "I = np.zeros([n, 128, 128])\n", " \n", "# Output labels\n", "L = np.zeros(n) \n", "\n", "ix = 0\n", "for line in open(\"MITFaces/faceDS\"):\n", " parts = line.split()\n", " fileName = \"MITFaces/rawdata/\" + parts[0]\n", " fileIn = open(fileName, 'rb')\n", " rawdata = np.fromfile(fileIn, dtype=np.uint8,count=128*128)/255.0\n", " I[ix,:,:] = rawdata.reshape(128,128)\n", " fileIn.close() \n", " if parts[2].startswith('female'):\n", " L[ix] = 1\n", " ix = ix+1" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "________________________________________________________________________________\n", "Epoch: 0\n", "Loss: 0.290118\n", "________________________________________________________________________________\n", "Epoch: 1\n", "Loss: 0.284181\n", "________________________________________________________________________________\n", "Epoch: 2\n", "Loss: 0.284118\n", "________________________________________________________________________________\n", "Epoch: 3\n", "Loss: 0.284135\n", "________________________________________________________________________________\n", "Epoch: 4\n", "Loss: 0.284149\n", "________________________________________________________________________________\n", "Epoch: 5\n", "Loss: 0.284161\n", "________________________________________________________________________________\n", "Epoch: 6\n", "Loss: 0.284168\n", "________________________________________________________________________________\n", "Epoch: 7\n", "Loss: 0.284174\n", "________________________________________________________________________________\n", "Epoch: 8\n", "Loss: 0.284176\n", "________________________________________________________________________________\n", "Epoch: 9\n", "Loss: 0.284175\n", "________________________________________________________________________________\n", "Epoch: 10\n", "Loss: 0.284173\n", "________________________________________________________________________________\n" ] } ], "source": [ "# ConvNet adapted from\n", "# https://medium.com/data-science-group-iitr/building-a-convolutional-neural-network-in-python-with-tensorflow-d251c3ca8117\n", "\n", "def new_conv_layer(input, num_input_channels, filter_size, num_filters, name):\n", " \n", " with tf.variable_scope(name) as scope:\n", " # Shape of the filter-weights for the convolution\n", " shape = [filter_size, filter_size, num_input_channels, num_filters]\n", "\n", " # Create new weights (filters) with the given shape\n", " weights = tf.Variable(tf.truncated_normal(shape, stddev=0.05))\n", "\n", " # Create new biases, one for each filter\n", " biases = tf.Variable(tf.constant(0.05, shape=[num_filters]))\n", "\n", " # TensorFlow operation for convolution\n", " layer = tf.nn.conv2d(input=input, filter=weights, strides=[1, 1, 1, 1], padding='SAME')\n", "\n", " # Add the biases to the results of the convolution.\n", " layer += biases\n", " \n", " return layer, weights\n", " \n", "\n", "def new_pool_layer (input, name):\n", " \n", " with tf.variable_scope(name) as scope:\n", " # TensorFlow operation for convolution\n", " layer = tf.nn.max_pool(value=input, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')\n", " \n", " return layer\n", "\n", "def new_relu_layer(input, name):\n", " \n", " with tf.variable_scope(name) as scope:\n", " # TensorFlow operation for convolution\n", " layer = tf.nn.relu(input)\n", " \n", " return layer\n", "\n", "def new_fc_layer(input, num_inputs, num_outputs, name):\n", " \n", " with tf.variable_scope(name) as scope:\n", "\n", " # Create new weights and biases.\n", " weights = tf.Variable(tf.truncated_normal([num_inputs, num_outputs], stddev=0.05))\n", " biases = tf.Variable(tf.constant(0.05, shape=[num_outputs]))\n", " \n", " # Multiply the input and weights, and then add the bias-values.\n", " layer = tf.matmul(input, weights) + biases\n", " \n", " return layer\n", "\n", "# Training parameters\n", "batchSize = 199 \n", "batchesPerEpoch = 10\n", "\n", "# Input layer - batchSize 128x128 images at a time\n", "x = tf.placeholder(tf.float32, shape=[batchSize, 128, 128,1], name='x')\n", "\n", "# Output layer - batchSize 1-D outputs\n", "y = tf.placeholder(tf.float32, shape=[batchSize, 1], name='y')\n", "\n", " \n", "# Convolutional Layer 1\n", "layer_conv1, weights_conv1 = new_conv_layer(\n", " input=x, num_input_channels=1, \n", " filter_size=5, num_filters=6, name =\"conv1\")\n", "\n", "# Pooling Layer 1\n", "layer_pool1 = new_pool_layer(layer_conv1, name=\"pool1\")\n", "\n", "# RelU layer 1\n", "layer_relu1 = new_relu_layer(layer_pool1, name=\"relu1\")\n", "\n", "# Convolutional Layer 2\n", "layer_conv2, weights_conv2 = new_conv_layer(\n", " input=layer_relu1, num_input_channels=6, \n", " filter_size=5, num_filters=16, name= \"conv2\")\n", "\n", "# Pooling Layer 2\n", "layer_pool2 = new_pool_layer(layer_conv2, name=\"pool2\")\n", "\n", "# RelU layer 2\n", "layer_relu2 = new_relu_layer(layer_pool2, name=\"relu2\")\n", "\n", "# Flatten Layer\n", "num_features = layer_relu2.get_shape()[1:4].num_elements()\n", "layer_flat = tf.reshape(layer_relu2, [-1, num_features])\n", "\n", "# Fully-Connected Layer 1\n", "layer_fc1 = new_fc_layer(layer_flat, num_inputs=num_features, num_outputs=128, name=\"fc1\")\n", "\n", "# RelU layer 3\n", "layer_relu3 = new_relu_layer(layer_fc1, name=\"relu3\")\n", "\n", "# Fully-Connected Layer 2\n", "layer_fc2 = new_fc_layer(input=layer_relu3, num_inputs=128, num_outputs=1, name=\"fc2\")\n", "\n", "loss = tf.reduce_mean(tf.squared_difference(layer_fc2, y))\n", "\n", "# Training method - gradient descent minimisation\n", "training_step = tf.train.GradientDescentOptimizer(0.1).minimize(loss)\n", "\n", "# Training\n", "init = tf.global_variables_initializer()\n", "sess = tf.Session()\n", "\n", "sess.run(init)\n", "\n", "for epoch in range(11):\n", " for batch in range(batchesPerEpoch):\n", " startIx = batch*batchSize\n", " endIx = (batch+1)*batchSize\n", " x_sample = I[startIx:endIx,:,:].reshape(batchSize, 128,128,1)\n", " y_sample = L[startIx:endIx].reshape(batchSize,1)\n", " sess.run(training_step, feed_dict={x: x_sample, y: y_sample})\n", " print(\"_\"*80)\n", " print('Epoch: ', epoch)\n", " print('Loss: ', sess.run(loss, feed_dict={x: x_sample, y: y_sample}))\n", "print(\"_\"*80)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }