Simple Linear Regression in Python – Step 2.) Split the Training Set and Testing Set

with No Comments

After we have our Libraries and Dataset imported to the Python, we are going to split the dataset into Training Set and Testing Set.

In the previous section, we have our data separated as independent variable (X)  and dependent variable (Y).

The function ‘train_test_split’ from sklearn.cross_validation is going help us split the data into four sectors easily.

  1. Training Set of X
  2. Testing Set of X
  3. Training Set of Y
  4. Testing Set of Y

#Import Libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

#Import data
dataset = pd.read_csv(‘Data.csv’)
x = dataset.iloc[:,:-1].values
y =dataset.iloc[:,1].values

#Splitting training set and testing set
from sklearn.cross_validation import train_test_split
xtrain, xtest, ytrain, ytest =train_test_split(x,y,test_size=0.25)

Read more on Spliting Dataset into Training Set and Testing Set

Leave a Reply