Start with Importing the libraries and Importing the Data
- It is so easy to create a model of Principal Component Analysis. Just like other Regression or Machine Learning Model, we have to start with importing the Libraries and the Data.
- Numpy is the library that does the scientific calculation. We definitely need that for a PCA model.
- Pandas is the library for handling Data. We use Pandas for all the Regression Model or Machine Learning Model.
- For example, we are going to import the CSV data into python using Pandas.
- In the example below, we have declared Pandas as pd.
- We will import the CSV data by calling the “read_csv” module from pandas.
- “dataset” is the variable that stores all of our csv data.
Dependent Variables vs Independent Variables
- iloc is the function that we use to split the data from CSV file into X and Y in python.
- X is a set of Independent Variables.
- Y is the Dependent Variable.
- In the example below, column 0 to column 12 (end before column 13) are the Independent variables.
- Hence we are going to take all the rows and column 0 to 12 as “X”.
- We are going to find the relationship between the X and Y.
# Import the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Import Data
dataset = pd.read_csv(‘PCA data.csv’)
X = dataset.iloc[:, 0:13].values
y = dataset.iloc[:, 13].values
Other Sections on Principal Component Analysis :
Step 1.) Import Libraries and Import Dataset