Multivariate Linear Regression in Python – Step 3.) Split the Training Set and Testing Set

with No Comments

Training Set and Testing Set can be split easily in python.

xtrain, xtest, ytrain, ytest are created using train_test_split function all at once.

This is one of the fastest method to handle training set data and testing set data.

Example on Split Training Set and Testing Set in Python

#Import libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

#Import data
dataset = pd.read_csv(‘multivariate_data.csv’)
x = dataset.iloc[:,:-1].values
y =dataset.iloc[:,4].values

#Encode Categorical Data using LabelEncoder and OneHotEncoder
from sklearn.preprocessing import LabelEncoder,OneHotEncoder
labelencoder_x=LabelEncoder()
x[:,3]=labelencoder_x.fit_transform(x[:,3])
onehotencoder=OneHotEncoder(categorical_features =[3])
x=onehotencoder.fit_transform(x).toarray()

#Remove Dummy Variable Trap
x=x[:, 1:]

#splitting training set and testing set
from sklearn.cross_validation import train_test_split
xtrain, xtest, ytrain, ytest =train_test_split(x,y,test_size=0.2)

Leave a Reply