Wednesday, 15 February 2017

Perceptron Algorithm (Web App included)



                     Perceptron Algorithm

Introduction 

Perceptron Algorithm was the algorithm that forms the back bone  of a famous algorithm the Neural Network.During Its initial stages the algorithm was overrated for its capabilities,although later its limitations were acknowledged. This algorithm is an extremely simple one.
The Perceptron Algorithm is a Two class classification algorithm which can be extended to MultiClass by using either the One vs All method or the One vs One  , has three sub types:
  • Vanilla Algorithm
  • Voted Algorithm
  • Averaged Algorithm
The order simply represents the evolution of the algorithm to wards better results.Voted algorithm is generally not preferred because of it is less efficient computation wise  as well as there wouldn't be a single weight vector used for prediction instead there would be an array of such weight vectors which would be used for prediction thereby making it less efficient even data storage wise,when predicting the output ,which we would be covering  later on in the page.Finally before jumping onto the algorithm let me mention that we will just be seeing the Two class classification algorithms and that the Target variable  should contain numbers - 1 and  +1 (stored in vector 'y') , which represents two classes,the datapoint should be numeric (stored in vector 'X').

1)Vanilla Algorithm

We shall First discuss the Vanilla Algorithm:
What the algorithm does is that it first randomly initializes the weights ,biases ,well we can initialize it to zero for our purposes,then what it does is that it  iters through every point in the dataset and checks if the current weight vector classifies the datapoint right.If  the  data point is classified correctly then it doesn't do anything with the weight vector or the bias,else it updates the weight vector and the bias. The loop cycles over the complete dataset for a fixed number of iterations specified by the hyperparameter Maxiter.Finally  any new datapoint is classified by whether the sum of  dot product of weight vector with the datapoint and the bias is greater than or less than a particular threshold (for simplicity I have chosen zero)
Vanilla Training Algorithm:
// Vanilla algorithm pseudo code:
1) Randomly initialize weights W ,bias b, hyperparameter Maxiter
2) For a Fixed number of Iterations MaxIter{
3) For Every datapoint  X  in dataset starting form the first going till the end{
4) If  y(<W,X>+b)>0 then do nothing
5) Else W = W + y*X , b = b + y}}
6) return W,b
Vanilla prediction Algorithm:
if (<W,X>+b)>0 predict 1
else predict -1

Terms in the pseudocode

Let me first explain few terms, The term MaxIter is a hyperparameter  that is arbitrarily assigned ,later optimized to obtain the best performance, the term the weight vector,is the datapoint ,y is the target variable, the the term <W,X>   represents the dot product between the weight vector and the feature vector.

Disadvantages of the algorithm

Now you must have gotten a little overview of the algorithm ,let us take an example scenario : Let there exist a case in which even after looping through over 99% of the dataset the weight vector remains unchanged (ie The weight vector correctly classifies 99% of the datapoints) but the weight vector gets updated in the last 1% of the datapoints .Well these seems to be a terrible situation because  the updated  weight vectors may completely ruin the performance of the algorithm on the first 99% datapoints. Hence the algorithm needs to be modified ,hence emerged the Voted perceptron .

2) Voted Perceptron Algorithm

Here the algorithm is similar to that of the Vanilla perceptron except that for each and every weight vector we keep a count of how long the weight vector remains not updated.For example if a weight vector [1,2,3]  took  about 20 examples to get updated then to that weight vector is attached a number 20 which is its count or vote,if the weight vector got updated within one datapoint then its count is 1.Finally the prediction is made according to the below algorithm.
Voted Training Algorithm:
    //Initialize variables
1)  W = zero vector //Weight vector
    b = 0           //bias
    Wm = NA matrix  /*This is the matrix which contains the list of all weights
                      as well as the count appended to its end*/
    beta= NA vector /* This is a vector containing list of biases*/
    c = 0           /* Count variable-keeps count of how long a weight 
                       vector has survived */
    j = 0           // Weight Index
    //Main Algorithm
2) For a Fixed number of Iterations MaxIter{
3) For Every datapoint i in dataset {
4) If  y(<W,X>+b)>0 then c = c+1
5) Else  j=j+1 
        Wm[j,] = [W,c]  // storing the weight vector appended with the bias in the Weight matrix  
        beta[j] = b     // Storing the bias in beta 
        W = W + y*X     // Updating Weight vector   
        b = b + y      // Updating bias
        c=1             // Restarting the count variable
               } } 
6) return Wm,beta
Voted  prediction Algorithm:
if sum((Wm[,col]')*sign((Wm[,1:(col-1)]*X+b))) > 0 predict 1 
else predict -1

Terms in the pseudocode

All terms common with the vanilla algorithm remains the same, Wm is a weight matrix it contains the list of all the weight vectors encountered in the run along with its count appended to the weight vectors end,beta contains a list of biases encountered in the run.The sign  function returns the sign (1 if positive,-1 if negative ) elementwise ,the sum function returns the sum of all elements in the matrix. In the prediction Algorithm all the math operation are matrix operations.

Disadvantages of the algorithm

Although the performance is considerably better when compared to the vanilla algorithm,it is duly noted that the classification computational cost  is very high.Also that there requires greater storage space.

3) Averaged Perceptron Algorithm

The training algorithm is the same as that of the voted perceptron .Only the prediction algorithm is different.You may claim that if the training algorithm is similar then wouldn't this algorithm also require greater storage space ,actually the algorithm can be modified to eliminate such a storage  since the prediction algorithm is different.
Averaged  prediction Algorithm:
if sum((Wm[,col]')*((Wm[,1:(col-1)]*X+b))) > 0 predict 1 
else predict -1
Okay, Now that you have seen the prediction algorithm ,I leave it as an exercise for you to change the voted training algorithm to optimize it.

Web Application

Finally the  App webpage that implements the perceptron algorithm  : Perceptron.
You can use this sample dataset it is a randomly generated dataset ,it has no meaning to it.

WebApp Description

The app was written using R Language ,implemented in the shiny server.In the app page ,any dataset(anything less than 1mb,in csv format)onto which the algorithm has to be implemented can be uploaded.Then the results of the classification will be produced in the form of a Table which would contain many Classification Accuracy methods such as Precision,Recall,F1 scores,etc .Also Graphs  will be produced in real time which would help to visualize the accuracy.Currently the app only performs the Vanilla , averaged algorithms for efficiency purposes.


No comments:

Post a Comment