Sunday, 19 February 2017

CFD Tutorial - 2 (Laasonen scheme)

                           Computational Fluid Dynamics 

Problem statement:
Given a rod of Length  L ,with boundary conditions,initial conditions as follows:

Boundary condition:
\begin{eqnarray}
T(0,t) = 0^{\circ}C \\
T(L,t)= 1^{\circ}C
\end{eqnarray}

Initial condition:

\begin{eqnarray}
T(x,0) = 0^{\circ}C
\end{eqnarray}

Compute the temperature for t = 0s to 20s for various values of $\Delta T = 0.1s,0.01s,0.001s$ .

Governing Equations:
PDE:
\begin{eqnarray}
\frac{\delta T}{\delta t} &=& \alpha \frac{\delta ^2 T}{\delta x^2}
\end{eqnarray}

Finite difference formulation using BTCS scheme (Implicit)
\begin{equation}
-\gamma T^{n+1}_{i-1}+T^{n+1}_{i}(2 \gamma +1) -\gamma T^{n+1}_{i+1} = T^n_{i}
\end{equation}

Pseudo Code


  • Initialize the variables $\alpha ,\Delta t,T,\Delta x, N_x , L$ .
     (Note here T is a matrix with $N_x$ columns ,and $20/(\Delta t) +1= N_y$ rows)
  •  For n = 2 to $N_y$ .
     Solve the equation below for $T^{n+1}$ using the TDMA algorithm.
     $$AT^{n+1} = T^{n}$$
    Where,
    $A =\begin{bmatrix}
    1 & 0 &0& \cdots &\cdots& 0 \\
    -\gamma &2\gamma + 1 &-\gamma &0 &\cdots &0&\\
    0 & -\gamma &2\gamma + 1 &-\gamma & \ddots &0\\
    0 & 0&\ddots &\ddots &\ddots & \ddots \\
    0 & \cdots&\cdots &\cdots &0 & 1 \\
    \end{bmatrix}$

The TDMA consists of converting the Tridiagonal matrix to an upper triangular matrix , then solving the system of equations by back substitution.

TDMA Algorithm:


  •  For i = 2:$(N_x-1)$
     $A[i,:] = A[i,:] - A[i-1,:]\frac{A[i,i-1]}{A[i,i]}$
     $T[n-1,i] = T[n-1,i] - T[n-1,i-1]\frac{A[i,i-1]}{A[i,i]}$
  •  Back substitution.
  •  End

Results :

Below are sample Graphs showing the Unconditional stability of the Implicit schemes.




Matlab Code:

https://github.com/RAAKASH/Intro-to-CFD-/tree/master/Assignment%202.

Note : 
  • Assignment 1.m is the file of the previous assignment/tutorial in the form of a function.
  • Assignment 2.m is the file of the present assignment/tutorial in the form of a function.
  • BTCS.m  is the same Assignment 2.m but has been optimized in time to get faster results.

Friday, 17 February 2017

FTCS Tutorial -1

Computational Fluid Dynamics


Problem Statement

 Given a rod of Length  L ,with boundary conditions,boundary conditions as follows :  
   $T(0,t) = 0^{\circ}C $
   $T(1,t) = 0^{\circ}C $
Also the initial conditions are as follows:
  $T(x,0) = 0^{\circ}C $
Find the Temperature distribution for the complete length of the rod for a time period of 300s.

Governing Equations:

The physical equation is as follows from the below heat conduction equation.
$ \frac{\delta T}{\delta t} = \alpha \frac{\delta ^2 T}{\delta x^2}$ 

Now we are applying FTCS scheme to the above equation we get: 

$T_{i}^{n+1} = T_{i}^{n} + \alpha \Delta t \frac{(T_{i+1}^n - 2T_{i}^n + T_{i-1}^n)}{(\Delta x)^2}$

Pseudo code :


  1.  Initialize the variables $\alpha ,\Delta t,T,\Delta x,N_x,L$.
    (Note here T is a matrix with $N_x$ columns ,and $20/(\Delta t) +1= N_y$ rows)
  2.  For n = 2 to $N_y$ execute the statements 3 and 4.
  3.  For i = 2 to $N_x-1$ execute statement 4.
  4.  $T[i][n+1] = T[i][n] + \frac{\alpha \Delta t}{(\Delta x)^2} (T[i+1][n] - 2T[i][n] + T[i-1][n] )$
  5.  end


Stability of scheme : 

Let $\gamma = \alpha \frac{\Delta t}{ \Delta x^2}$.
Scheme is stable if $\gamma$ < $\frac{1}{2} $
Derivation : 
$$e(n,x) = A_n  e^{ikx}$$
Here $k$ is the wave number , $i = (-1)^{0.5}$
Therefore substituting this in the finite difference scheme  we get
$$ \frac{A_{n+1}}{A_{n}} = 1 + \frac{\alpha \Delta t}{\Delta x^2}\left ( e^{i k \Delta x} -2 + e^{-i k \Delta x}   \right )$$
$$ \frac{A_{n+1}}{A_{n}} = 1 + \frac{\alpha \Delta t}{\Delta x^2}\left (-4 sin^2 (k \frac{\Delta x}{2} )\right )$$


$$ \left \vert \frac{A_{n+1}}{A_{n}} \right \vert   \leq 1$$
$$  -1 \leq 1 + \frac{\alpha \Delta t}{\Delta x^2} \left (-4 sin ^2  (k \frac{\Delta x}{2} ) \right ) $$
Finally we get the required criterion. 

MATLAB Run : 





Web App Run:




MATLAB /Web App:





  

    

Wednesday, 15 February 2017

Perceptron Algorithm (Web App included)



                     Perceptron Algorithm

Introduction 

Perceptron Algorithm was the algorithm that forms the back bone  of a famous algorithm the Neural Network.During Its initial stages the algorithm was overrated for its capabilities,although later its limitations were acknowledged. This algorithm is an extremely simple one.
The Perceptron Algorithm is a Two class classification algorithm which can be extended to MultiClass by using either the One vs All method or the One vs One  , has three sub types:
  • Vanilla Algorithm
  • Voted Algorithm
  • Averaged Algorithm
The order simply represents the evolution of the algorithm to wards better results.Voted algorithm is generally not preferred because of it is less efficient computation wise  as well as there wouldn't be a single weight vector used for prediction instead there would be an array of such weight vectors which would be used for prediction thereby making it less efficient even data storage wise,when predicting the output ,which we would be covering  later on in the page.Finally before jumping onto the algorithm let me mention that we will just be seeing the Two class classification algorithms and that the Target variable  should contain numbers - 1 and  +1 (stored in vector 'y') , which represents two classes,the datapoint should be numeric (stored in vector 'X').

1)Vanilla Algorithm

We shall First discuss the Vanilla Algorithm:
What the algorithm does is that it first randomly initializes the weights ,biases ,well we can initialize it to zero for our purposes,then what it does is that it  iters through every point in the dataset and checks if the current weight vector classifies the datapoint right.If  the  data point is classified correctly then it doesn't do anything with the weight vector or the bias,else it updates the weight vector and the bias. The loop cycles over the complete dataset for a fixed number of iterations specified by the hyperparameter Maxiter.Finally  any new datapoint is classified by whether the sum of  dot product of weight vector with the datapoint and the bias is greater than or less than a particular threshold (for simplicity I have chosen zero)
Vanilla Training Algorithm:
// Vanilla algorithm pseudo code:
1) Randomly initialize weights W ,bias b, hyperparameter Maxiter
2) For a Fixed number of Iterations MaxIter{
3) For Every datapoint  X  in dataset starting form the first going till the end{
4) If  y(<W,X>+b)>0 then do nothing
5) Else W = W + y*X , b = b + y}}
6) return W,b
Vanilla prediction Algorithm:
if (<W,X>+b)>0 predict 1
else predict -1

Terms in the pseudocode

Let me first explain few terms, The term MaxIter is a hyperparameter  that is arbitrarily assigned ,later optimized to obtain the best performance, the term the weight vector,is the datapoint ,y is the target variable, the the term <W,X>   represents the dot product between the weight vector and the feature vector.

Disadvantages of the algorithm

Now you must have gotten a little overview of the algorithm ,let us take an example scenario : Let there exist a case in which even after looping through over 99% of the dataset the weight vector remains unchanged (ie The weight vector correctly classifies 99% of the datapoints) but the weight vector gets updated in the last 1% of the datapoints .Well these seems to be a terrible situation because  the updated  weight vectors may completely ruin the performance of the algorithm on the first 99% datapoints. Hence the algorithm needs to be modified ,hence emerged the Voted perceptron .

2) Voted Perceptron Algorithm

Here the algorithm is similar to that of the Vanilla perceptron except that for each and every weight vector we keep a count of how long the weight vector remains not updated.For example if a weight vector [1,2,3]  took  about 20 examples to get updated then to that weight vector is attached a number 20 which is its count or vote,if the weight vector got updated within one datapoint then its count is 1.Finally the prediction is made according to the below algorithm.
Voted Training Algorithm:
    //Initialize variables
1)  W = zero vector //Weight vector
    b = 0           //bias
    Wm = NA matrix  /*This is the matrix which contains the list of all weights
                      as well as the count appended to its end*/
    beta= NA vector /* This is a vector containing list of biases*/
    c = 0           /* Count variable-keeps count of how long a weight 
                       vector has survived */
    j = 0           // Weight Index
    //Main Algorithm
2) For a Fixed number of Iterations MaxIter{
3) For Every datapoint i in dataset {
4) If  y(<W,X>+b)>0 then c = c+1
5) Else  j=j+1 
        Wm[j,] = [W,c]  // storing the weight vector appended with the bias in the Weight matrix  
        beta[j] = b     // Storing the bias in beta 
        W = W + y*X     // Updating Weight vector   
        b = b + y      // Updating bias
        c=1             // Restarting the count variable
               } } 
6) return Wm,beta
Voted  prediction Algorithm:
if sum((Wm[,col]')*sign((Wm[,1:(col-1)]*X+b))) > 0 predict 1 
else predict -1

Terms in the pseudocode

All terms common with the vanilla algorithm remains the same, Wm is a weight matrix it contains the list of all the weight vectors encountered in the run along with its count appended to the weight vectors end,beta contains a list of biases encountered in the run.The sign  function returns the sign (1 if positive,-1 if negative ) elementwise ,the sum function returns the sum of all elements in the matrix. In the prediction Algorithm all the math operation are matrix operations.

Disadvantages of the algorithm

Although the performance is considerably better when compared to the vanilla algorithm,it is duly noted that the classification computational cost  is very high.Also that there requires greater storage space.

3) Averaged Perceptron Algorithm

The training algorithm is the same as that of the voted perceptron .Only the prediction algorithm is different.You may claim that if the training algorithm is similar then wouldn't this algorithm also require greater storage space ,actually the algorithm can be modified to eliminate such a storage  since the prediction algorithm is different.
Averaged  prediction Algorithm:
if sum((Wm[,col]')*((Wm[,1:(col-1)]*X+b))) > 0 predict 1 
else predict -1
Okay, Now that you have seen the prediction algorithm ,I leave it as an exercise for you to change the voted training algorithm to optimize it.

Web Application

Finally the  App webpage that implements the perceptron algorithm  : Perceptron.
You can use this sample dataset it is a randomly generated dataset ,it has no meaning to it.

WebApp Description

The app was written using R Language ,implemented in the shiny server.In the app page ,any dataset(anything less than 1mb,in csv format)onto which the algorithm has to be implemented can be uploaded.Then the results of the classification will be produced in the form of a Table which would contain many Classification Accuracy methods such as Precision,Recall,F1 scores,etc .Also Graphs  will be produced in real time which would help to visualize the accuracy.Currently the app only performs the Vanilla , averaged algorithms for efficiency purposes.


Kmeans Algorithm (Web App included)



                                    K-Means Algorithm

Supervised learning algorithms are those algorithms that require a previous labeling or existence of a target variable , for example in my previous blog on the Perceptron algorithm each and every data point had a label attached to it ,other examples can be that to predict the price of any object given certain features of the object by running the necessary supervised algorithm on objects  for which the prices were known . Long story short ,the supervised algorithms are those that would require a tag on each and every data point.
Unsupervised algorithms are those that do not require such a tagging of data points to make a prediction . We would see a specific algorithm of this category :K means clustering

INTRODUCTION

This algorithm is a classification algorithm that segregates data points based on how they form clusters in the n-dimensional space  ,points that are closer together form groups or clusters .To get a Visual representation let us look into an  3-D plot  .
webkmeans
MatLab code 3-D Graph
.
The above 3-D plot shows the segregation of students based on their scores (in the above case there are three test scores taken into consideration) ,although there aren't clear clusters ,the algorithms forms its own.
If you look better into the 3-D plot you would see  3  X  marks apart from all those circles ,these X marks are those that define the territory of the data points (ie) the data points closest to the X mark would belong to that particular X-mark(hence a particular category) ,so the question is how to find those X marks ?
Why determine those X marks? 
 It is necessary to find the position of those X marks to categorize the data points.
  So let us look into the algorithm.

Algorithm

1. Initially , Randomly assign values to the K vertices in the n dimensional space.
2. Form clusters by assigning the data points which are close to the K vertices.
3. Find the mean of the closest vertices and reassign the K vertices.
4. If there are no changes in the K vertices ,then break out of the loop.
5. Else Go to step 2
The above algorithm might give you a clear ideas as to how it functions  , there may be questions of convergence , performances . About convergence : The algorithm always converges ,this fact can be easily proved ,the performance is kind of decent and highly depends upon the initial values of the K vertices ,the results can be disastrous too !!! if the initial K vertices are not assigned properly (given the simplicity of the algorithm  ).
It should also be noted that for every iteration the summation of  Eulerian distances(aka Cost) of the datapoints from the respective K -Vertices monotonically decreases (ie) the algorithm  converges.

Proof of Convergence

1)Each data point is assigned to its closest vertex hence the cost reduces.
2)Also since the vertex is reassigned to the Centroid of the closest vertices , the cost further reduces.
Hence we have proved that the cost monotonically decreases. Also since the data set is finite ,there are only a finite set of possible clusterings( $latex  K^{n} $ , here K is the number of vertices,n is the number of data points)  hence the algorithm would converge in finite number of steps .

Further information

At times it is necessary to choose the perfect number of vertices for clustering,
in such cases such a pre-deterministic number can be chosen depending upon the Cost vs K .
Kmeans
Cost vs K

So by looking at the above graph we can see an elbow ,typically the number of vertices (K)
at which the elbow occurs will be taken as the ideal value,there are situations for which such an elbow wouldn't be apparent ,in such cases random heuristics should be used.

MatLab Code

This is a sample Matlab code that implements this algorithm ,you would get the same graph as above, try tinkering with it. The program shows how the vertices move towards the least cost configuration iteration by iteration (you need to press enter to move from one iteration to next).
The data set used is From the KDD -Cup . Code Link
Note: The convergence condition doesn't imply the attainment of the least cost configuration.
Things to be tried  by you :     1) Find an algorithm for least cost configuration.

Web App

The Web application can be found here.


Regression Models

There are two schools of thoughts when it comes to any analytics models .
  1. The data obtained from the experiment is accurate  , the model that is applied on it isn't accurate hence the errors .
  2. The data obtained from the experiment has errors but the model applied accurate .
There are many regression models , for starters we are going to discuss about Ordinary Least Squares (OLS) , Total Least Squares (TLS).
  • OLS  :  OLS follows the first school of thought .
    The optimization problem is :
    $ min(\sum _{i=1}^{N} (y_i - \alpha x_i)^2)$ .
    Now we are going to employ Matrix manipulation (ie)
    $ Y  = [y_1,y_2,....,y_n]^T$ ,
    $ X  = [\vec{x_1},\vec{x_2},....,\vec{x_n},\vec{1} ]^T$ ,
    $ \alpha  = [\alpha _{1},\alpha _{2},....,\alpha _{n},\beta]^T$  .
    As can be seen the subscripts are the indices of the datapoints.Now the problem becomes  $  E= min((X \alpha - Y)^T  (X \alpha - Y))$ .Now expanding E , setting $ \vec{\nabla}(E) = 0 $ we get :
    $ \implies  \vec{ \nabla }((X \alpha  )^T(X \alpha )-(X \alpha )^T Y-Y^T (X \alpha )+Y^T  Y) = 0$
    $ \implies  \vec{ \nabla }(\alpha ^T (X^T X) \alpha - 2 (Y^T  X)\alpha + Y^T Y )= 0$
    $  \implies ( 2(X^T X) \alpha - 2 (Y^T  X)^T= 0$
    $  \implies ( 2(X^T X) \alpha =2 (Y^T  X)^T= 0$
    $  \implies  \alpha =  (X^T X)^{-1} ( X^T Y)            -  (1)$Hence as can be seen the above , equation (1) can be used for finding the regression model.
  • TLS  :    This model follows 2nd school of thought.
    Optimisation problem   :  $ min(\sum _{i=1}^N ((y_i - \hat{y_i})^2 + (x_i^1 - \hat{x_i^1})^2+ (x_i^2 - \hat{x_i^2})^2 ....  + (x_i^N - \hat{x_i^M})^2))$  .
    Subject to $ y_i^* = \alpha _ 1 x_1^* + \alpha _ 2 x_2^* ... \alpha _ n x_n^*$.Here  , $ \hat{x_i} ,\hat{y_i}$ are the correct data points that we are trying to predict.Let us now put the data points in a matrix form :
    $  Z = [X, Y] =[\vec{x^1}, \vec{x^2},... \vec{x^M}, \vec{y}]$
    The rank of [X,Y] matrix is $ (M+1)$ considering $ M<n$ .
    The problem is thus reformulated as follows :
    Find the smallest  value of $ \hat{Z}$  which reduces the rank of the matrix $ Z + \hat{Z}$ to  M .
    The above formulation , using   Eckart-Young theorem   can be used to obtain the solution of both $ \alpha$ as well as the $ [\hat{Z} + Z]$ matrix .
    Final Algorithm 1)    $ [U,S,V]  =  svd(Z)$ .
    2)     $ \alpha = -V[1:(N-1),(M+1) ]/V[N,(M+1)]$ .
    3)     $ \hat{Z} = Z V[ : ,M+1] V[ : ,M+1]^T$
    Matlab code :
    [U,S,V] =svd(D(1:m1,:)); % SVD decomposition
    reg = V(:,end);          % M+1 th column extraction
    reg = (-reg/reg(end));   % Dividing by the last value 
    reg = reg(1:(end-1));    % Final regression model
    d = D + (-D*V(:,end))*(V(:,end)') ; % Transforming Data to new basis
Note: the obtained regression model  applied on the new set of data would get bad results. As can be seen if the $ X$ is transformed to $ \hat{X}$ it would give you better results than OLS , but computation of $ \hat{X}$ is not possible on new set of data.