After we have our Libraries and Dataset imported to the Python, we are going to split the dataset into Training Set and Testing Set.
In the previous section, we have our data separated as independent variable (X) and dependent variable (Y).
The function ‘train_test_split’ from sklearn.cross_validation is going help us split the data into four sectors easily.
- Training Set of X
- Testing Set of X
- Training Set of Y
- Testing Set of Y
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv(‘Data.csv’)
x = dataset.iloc[:,:-1].values
#Splitting training set and testing set
from sklearn.cross_validation import train_test_split
xtrain, xtest, ytrain, ytest =train_test_split(x,y,test_size=0.25)
Other Sections on Linear Regression :
Step 2.) Split Dataset into Training Set and Testing Set