In supervised learning, every example in our dataset was labelled, which means the ‘correct answer’ has been told to the computer. Therefore, the algorithm will predict the results following the dataset has been given. If you want to predict a target value, you need to look into the supervised learning algorithm. In supervised learning, there are two main tasks of the algorithm which are classification and regression problem. For classification problem, the prediction output is discrete value. For regression problem, the prediction output is a continuous value. This is the main difference between these two problems [1].

If supervised learning is decided to use for classification or regression problem, the next step is to train the algorithm and let it learn from the dataset. The data to use training the algorithm is called training set. A training set is a set of many training examples which have the different features and one labelled value (target variable) [1]. In general, the total data set is divided into three parts which are training set, cross validation set and test set. Each of these occupies 60%, 20% and 20% of the total data set size, respectively. How to use these three sets will be discussed later. The learning algorithm uses these training examples to get the hypothesis which is result function. This function can take a new data as the input to predict the output. In Supervised learning area, there are many different powerful algorithms but they cannot be all in this report. In this part, the linear regression, logistic regression, neural networks and Support Vector Machines (SVMs) are introduced as follow.

The linear regression model is a simple and easy model in the supervised learning algorithm. The hypothesis model of linear regression is $$ h=Theta_0+theta_1*X_1+…+theta_n*X_n$$ where $X_n$ is the value of feature n and $theta$ is the parameters of the model.

[1]P. Harrington, Machine learning in action. Shelter Island, N.Y.: Manning Publications Co., 2012.