{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%html\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Neural Networks with Tensor Flow\n", "\n", "The descriptor for a face from Eigenfaces can be fed into any machine learning algorithm for classification.\n", "This could be a random forest, support vector machine. \n", "In this case we'll consider a traditional neural network, and then look at how *convolutional* networks make it practical for us to deal with image data.\n", "We'll be working with the TensorFlow library, and so to begin we'll look at a much smaller problem in neural networks - XOR.\n", "\n", "XOR is a classic problem in Neural Networks because it is a very simple problem that is not *linearly seperable*. \n", "That means that a simple network that connects the inputs (two values either 1 or 0) to the output (1 if exactly one of the inputs is 1, 0 otherwise) cannot solve it.\n", "However, a simple extension allowing hidden layers is sufficient." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The XOR Network\n", "\n", "The architecture for the XOR network is quite simple - there are two input units, two hidden units, and one output unit.\n", "The network is completely connected, so there are 4 weights (\\\\(2 \\times 2\\\\)) to learn between the input layer and the hidden layer, and 2 weights (\\\\(2 \\times 1\\\\)) between the hidden layer and the output unit.\n", "\n", "![XOR Network](images/xorNet.svg)\n", "\n", "In TensorFlow we don't explicitly represent individual units or weights, but view them as multidimensional arrays.\n", "This is where the name TensorFlow comes from - mathematically a vector is a 1D array, or 1st order tensor; a matrix is a 2D array or 2nd order tensor; and 3rd order and higher tensors can be represented by 3D or higher dimensional arrays.\n", "\n", "We begin, as usual, by importing some libraries - in this case `numpy` and `tensorflow`. We'll also want OpenCV once we get to faces" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import tensorflow as tf\n", "import cv2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we set up somewhere for the input and output to go.\n", "In general, this is going to be different samples through the training, so we make `placeholder`s to store them.\n", "We're going to train on all the possibilities at once, so there are 4 samples per training iteration and each sample has 2 dimensions (a and b for a XOR b).\n", "We'll also need 4 outputs per training iteration, but these are just one dimensional.\n", "\n", "The inputs and outputs will be considered floating point values, since we need continuous functions to compute gradients, and we'll call them \\\\(x\\\\) and \\\\(y\\\\) respectively." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Input layer - 4 two dimensional samples at a time\n", "x = tf.placeholder(tf.float32, shape=[4,2], name='x')\n", "\n", "# Output node - 4 one dimensional outputs\n", "y = tf.placeholder(tf.float32, shape=[4,1], name='y')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we set up the weights. \n", "We'll need two layers of weights, \\\\(w_1\\\\) are the weights from the input to the hidden layer, and \\\\(w_2\\\\) are the weights from the hidden layer to the output.\n", "To make things a little more general, we can set the number of hidden units to be a parameter.\n", "We need to initialise the weights, and we'll use a truncated normal distribution for that." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Number of hidden neurons\n", "nh = 2\n", "\n", "# Weights\n", "w1 = tf.Variable(tf.truncated_normal([2,nh]), name = 'w1')\n", "w2 = tf.Variable(tf.truncated_normal([nh,1]), name = 'w2')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can set up the network.\n", "For each network we specify how the value of each unit is computed.\n", "Typically the weights are applied to the previous layer's outputs by summing the products of weights and units' outputs, then they are passed through an *activation function*. \n", "We'll use \\\\(\\tanh\\\\) as our activation function." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Set up the hidden layer\n", "h = tf.tanh(tf.matmul(x, w1))\n", "\n", "# Set up the output layer\n", "y_est = tf.tanh(tf.matmul(h, w2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the output layer provides an estimate of \\\\(y\\\\), so we need a function to compare that estimate to the expected values. \n", "This is called the *loss* function, and we'll use the mean squared difference.\n", "We also need a method to minimise this loss function, and we'll use gradient descent." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Loss function - mean squared difference\n", "loss = tf.reduce_mean(tf.squared_difference(y_est, y))\n", "\n", "# Training method - gradient descent minimisation\n", "training_step = tf.train.GradientDescentOptimizer(1.0).minimize(loss)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now train the network by providing example input and output pairs.\n", "These are used to fill the placeholders \\\\(x\\\\) and \\\\(y\\\\), and the training method is used to reduce the loss function." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "________________________________________________________________________________\n", "Epoch: 0\n", " y_est: \n", " [ 0.]\n", " [ 0.06177974]\n", " [-0.00363681]\n", " [ 0.05911928]\n", " w1: \n", " [-0.10252517 -0.32058024]\n", " [-0.09108513 0.43799755]\n", " w2: \n", " [-0.25165987]\n", " [ 0.09466221]\n", "Loss: 0.47276\n", "________________________________________________________________________________\n", "Epoch: 1000\n", " y_est: \n", " [ 0.]\n", " [ 0.87483197]\n", " [ 0.87484235]\n", " [-0.14284287]\n", " w1: \n", " [-3.31105232 0.86222512]\n", " [-3.32401752 0.86235982]\n", " w2: \n", " [-5.74507236]\n", " [-6.27536106]\n", "Loss: 0.0129339\n", "________________________________________________________________________________\n", "Epoch: 2000\n", " y_est: \n", " [ 0.]\n", " [ 0.92088181]\n", " [ 0.92088515]\n", " [-0.11240981]\n", " w1: \n", " [-3.46395564 0.86743873]\n", " [-3.47355652 0.86751229]\n", " w2: \n", " [-6.63605738]\n", " [-7.18247938]\n", "Loss: 0.0062887\n", "________________________________________________________________________________\n", "Epoch: 3000\n", " y_est: \n", " [ 0.]\n", " [ 0.93723285]\n", " [ 0.93723488]\n", " [-0.09742302]\n", " w1: \n", " [-3.53892827 0.87426752]\n", " [-3.54720807 0.87432265]\n", " w2: \n", " [-7.12753153]\n", " [-7.67651987]\n", "Loss: 0.00434261\n", "________________________________________________________________________________\n", "Epoch: 4000\n", " y_est: \n", " [ 0.]\n", " [ 0.94619894]\n", " [ 0.94620019]\n", " [-0.08812159]\n", " w1: \n", " [-3.58886075 0.88014007]\n", " [-3.59635925 0.88018578]\n", " w2: \n", " [-7.47405243]\n", " [-8.02342606]\n", "Loss: 0.0033886\n", "________________________________________________________________________________\n", "Epoch: 5000\n", " y_est: \n", " [ 0.]\n", " [ 0.95207101]\n", " [ 0.95207226]\n", " [-0.08153133]\n", " w1: \n", " [-3.62633443 0.8851108 ]\n", " [-3.63330007 0.88515061]\n", " w2: \n", " [-7.74407291]\n", " [-8.29319954]\n", "Loss: 0.0028104\n", "________________________________________________________________________________\n", "Epoch: 6000\n", " y_est: \n", " [ 0.]\n", " [ 0.95629793]\n", " [ 0.95629889]\n", " [-0.07654067]\n", " w1: \n", " [-3.65633678 0.88940161]\n", " [-3.66289949 0.88943708]\n", " w2: \n", " [-7.96636581]\n", " [-8.51503754]\n", "Loss: 0.00241953\n", "________________________________________________________________________________\n", "Epoch: 7000\n", " y_est: \n", " [ 0.]\n", " [ 0.95953166]\n", " [ 0.95953262]\n", " [-0.07256035]\n", " w1: \n", " [-3.68134904 0.89316314]\n", " [-3.68760014 0.89319533]\n", " w2: \n", " [-8.15587139]\n", " [-8.70401096]\n", "Loss: 0.00213507\n", "________________________________________________________________________________\n", "Epoch: 8000\n", " y_est: \n", " [ 0.]\n", " [ 0.96211338]\n", " [ 0.96211416]\n", " [-0.06927154]\n", " w1: \n", " [-3.7027967 0.89649653]\n", " [-3.70878887 0.89652634]\n", " w2: \n", " [-8.3213768]\n", " [-8.86898136]\n", "Loss: 0.00191732\n", "________________________________________________________________________________\n", "Epoch: 9000\n", " y_est: \n", " [ 0.]\n", " [ 0.96423453]\n", " [ 0.96423519]\n", " [-0.0664968]\n", " w1: \n", " [-3.72156572 0.89950365]\n", " [-3.72733855 0.89953142]\n", " w2: \n", " [-8.46852779]\n", " [-9.01559448]\n", "Loss: 0.00174503\n", "________________________________________________________________________________\n", "Epoch: 10000\n", " y_est: \n", " [ 0.]\n", " [ 0.96601981]\n", " [ 0.96602011]\n", " [-0.06409619]\n", " w1: \n", " [-3.73824978 0.90223545]\n", " [-3.7438314 0.90226156]\n", " w2: \n", " [-8.6011467]\n", " [-9.14768696]\n", "Loss: 0.0016044\n", "________________________________________________________________________________\n" ] } ], "source": [ "x_sample = [[0,0], [0,1], [1,0], [1,1]]\n", "y_sample = [[0], [1], [1], [0]]\n", "\n", "init = tf.global_variables_initializer()\n", "sess = tf.Session()\n", "\n", "sess.run(init)\n", "\n", "for epoch in range(10001):\n", " sess.run(training_step, feed_dict={x: x_sample, y: y_sample})\n", " if epoch % 1000 == 0:\n", " print(\"_\"*80)\n", " print('Epoch: ', epoch)\n", " print(' y_est: ')\n", " for element in sess.run(y_est, feed_dict={x: x_sample, y: y_sample}):\n", " print(' ',element)\n", " print(' w1: ')\n", " for element in sess.run(w1):\n", " print(' ',element)\n", " print(' w2: ')\n", " for element in sess.run(w2):\n", " print(' ',element)\n", " print('Loss: ', sess.run(loss, feed_dict={x: x_sample, y: y_sample}))\n", "print(\"_\"*80)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Networks and Eigenfaces\n", "\n", "A network to classification on the basis of Eigenfaces can be made in a similar way.\n", "We will need more units at each layer, and will introduce biases to the activation functions, and add more layers.\n", "Firstly, though, we'll need some face data with labels attached.\n", "One such data set is available from [http://courses.media.mit.edu/2004fall/mas622j/04.projects/faces/](http://courses.media.mit.edu/2004fall/mas622j/04.projects/faces/) and has the following labels attached:\n", "- sex (male/female)\n", "- age (child/teen/adult)\n", "- race (white/black/asian/hispanic/other)\n", "- face (serious/smiling)\n", "- prop (list such as hat, moustache, glasses)\n", "\n", "The images themselves are \\\\(128 \\times 128\\\\) pixels in 'RAW' format - just a byte per pixel with no header information, and are divided into two sets and their descriptors are given in the files faceDR and faceDS.\n", "The following code reads the faces from faceDR into a data array \\\\(D\\\\), and the sex into a label array \\\\(L\\\\) " ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# Go through the file to count number of samples\n", "n = 0\n", "for line in open(\"MITFaces/faceDR\"):\n", " n += 1\n", "\n", "# Dimensionality is 128x128 pixels\n", "d = 128*128\n", " \n", "# Face data, one sample to a row\n", "F = np.zeros([n,d])\n", "# Label data - 0 for male, 1 for female\n", "L = np.zeros(n) \n", "\n", "ix = 0\n", "for line in open(\"MITFaces/faceDS\"):\n", " parts = line.split()\n", " fileName = \"MITFaces/rawdata/\" + parts[0]\n", " fileIn = open(fileName, 'rb')\n", " F[ix,:] = np.fromfile(fileIn, dtype=np.uint8,count=d)/255.0\n", " fileIn.close() \n", " if parts[2].startswith('female'):\n", " L[ix] = 1\n", " ix = ix+1\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now compute the Eigenface representation of D" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "m = np.mean(F, 0)\n", "Z = F - m\n", "\n", "meanImg = m.reshape(128,128)\n", "cv2.imshow('Display', meanImg)\n", "cv2.waitKey()\n", "\n", "C = np.matmul(Z, np.transpose(Z))/(n-1)\n", "\n", "# eigh is for symmetric matrices like C - more stable\n", "eVal, eVec = np.linalg.eigh(C)\n", "\n", "# Reverse order\n", "e = eVal[::-1]\n", "V = eVec[:,::-1]\n", "\n", "# Eigen faces\n", "U = np.matmul(np.transpose(V), Z)\n", "\n", "for i in range(0,n):\n", " U[i,:] /= np.linalg.norm(U[i,:])\n", " ef = U[i,:].reshape(128,128)/max(abs(U[i,:])) + 0.5\n", " cv2.imshow('Display', ef)\n", " if (i < 10):\n", " cv2.waitKey(1000)\n", " elif (i < 100):\n", " cv2.waitKey(100)\n", " else:\n", " cv2.waitKey(10)\n", "cv2.destroyAllWindows()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It should be clear from the images that the later eigenfaces don't convey a lot of information. We can quantify this from the eigenvalues, and it turns out that the first 128 eigenvalues carry about 95% of the information:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.95227129216014905" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sum(e[0:128])/sum(e)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "128 is a good computer-sciency number and 95% is a reasonable cut-off, so we can represent the images as 128-dimensonal weight vectors, \\\\(w_i\\\\), which we can arrange as the columns of a \\\\(128 \\times n\\\\) matrix \\\\(W = U^*Z^T\\\\), where \\\\(U^*\\\\) is the matrix made up of the first 128 rows of \\\\(U\\\\)." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(128, 1997)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "W = np.matmul(U[0:128, :], np.transpose(Z))\n", "W.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now set up a network that takes 128 input values (columns of \\\\(W\\\\)) and tries to estimate a single label (male/female as indicated in \\\\(L\\\\)).\n", "To train this we'll feed in batches of images from \\\\(W\\\\), but sadly there are 1997 training examples, which is a prime number. \n", "To keep things simple we'll just use the first 1990 samples and feed in 10 batches of 199 images (or eigenface weights) in each epoch.\n", "We'll also just measure our performance on the training data - we should ideally have seperate *validation* and *test* data sets.\n", "The *validation* data would be used to monitor our training to prevent over-fitting, while the *test* data would be used to give a final measure of performance." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "________________________________________________________________________________\n", "Epoch: 0\n", "Loss: 1.29814\n", "________________________________________________________________________________\n", "Epoch: 100\n", "Loss: 0.218479\n", "________________________________________________________________________________\n", "Epoch: 200\n", "Loss: 0.0455054\n", "________________________________________________________________________________\n", "Epoch: 300\n", "Loss: 0.0399744\n", "________________________________________________________________________________\n", "Epoch: 400\n", "Loss: 0.0296133\n", "________________________________________________________________________________\n", "Epoch: 500\n", "Loss: 0.0240649\n", "________________________________________________________________________________\n", "Epoch: 600\n", "Loss: 0.0216135\n", "________________________________________________________________________________\n", "Epoch: 700\n", "Loss: 0.0209576\n", "________________________________________________________________________________\n", "Epoch: 800\n", "Loss: 0.0205943\n", "________________________________________________________________________________\n", "Epoch: 900\n", "Loss: 0.020558\n", "________________________________________________________________________________\n", "Epoch: 1000\n", "Loss: 0.0154411\n", "________________________________________________________________________________\n", "Tensor(\"strided_slice:0\", shape=(10, 1), dtype=float32)\n", "Tensor(\"strided_slice_1:0\", shape=(10, 1), dtype=float32)\n" ] } ], "source": [ "batchSize = 199\n", "batchesPerEpoch = 10\n", "\n", "nh1 = 256 # First hidden layer\n", "nh2 = 256 # Second hidden layer\n", "nh3 = 64 # Final hidden layer\n", "\n", "# Input layer - batchSize 128-D samples at a time\n", "x = tf.placeholder(tf.float32, shape=[batchSize, 128], name='x')\n", "\n", "# Output layer - batchSize 1-D outputs\n", "y = tf.placeholder(tf.float32, shape=[batchSize, 1], name='y')\n", "\n", "# Weights\n", "w1 = tf.Variable(tf.truncated_normal([128,nh1]), name = 'w1')\n", "w2 = tf.Variable(tf.truncated_normal([nh1,nh2]), name = 'w2')\n", "w3 = tf.Variable(tf.truncated_normal([nh2,nh3]), name = 'w3')\n", "wo = tf.Variable(tf.truncated_normal([nh3,1]), name = 'wo')\n", "\n", "# Activation functions\n", "h1 = tf.tanh(tf.matmul(x,w1))\n", "h2 = tf.tanh(tf.matmul(h1,w2))\n", "h3 = tf.tanh(tf.matmul(h2,w3))\n", "y_est = tf.tanh(tf.matmul(h3,wo))\n", "\n", "# Loss function\n", "loss = tf.reduce_mean(tf.squared_difference(y_est, y))\n", "\n", "# Training method - gradient descent minimisation\n", "training_step = tf.train.GradientDescentOptimizer(0.1).minimize(loss)\n", "\n", "# Training\n", "init = tf.global_variables_initializer()\n", "sess = tf.Session()\n", "\n", "sess.run(init)\n", "\n", "for epoch in range(1001):\n", " for batch in range(batchesPerEpoch):\n", " startIx = batch*batchSize\n", " endIx = (batch+1)*batchSize\n", " x_sample = np.transpose(W[:, startIx:endIx])\n", " y_sample = L[startIx:endIx].reshape(199,1)\n", " sess.run(training_step, feed_dict={x: x_sample, y: y_sample})\n", " if epoch % 100 == 0:\n", " print(\"_\"*80)\n", " print('Epoch: ', epoch)\n", " print('Loss: ', sess.run(loss, feed_dict={x: x_sample, y: y_sample}))\n", "print(\"_\"*80)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "cv2.destroyAllWindows()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }