Start Tutorial On Python Pandas
Iam going to start the session on Python Pandas Tutorial,
which is very commonly used in Python Machine Learning(ML) and Artificial
Intelligence(AI) field’s.Basically Pandas used for creating DataFrames in the
projects.
Introduction
Pandas is an open source Python
library which provides data analysis and manipulation in Python programming language.
Basically this library is used in the data representation, filtering, and statistical programming. The most important topic in pandas is the DataFrame where you store and play with the data.
In this tutorial, we will show you guys, what is the DataFrame, how you guys will create it
by python pandas libray, how we will export our data to different outputs, and
how we will manipulate thevalues/data.
I will teach you each step-by-step contents of the
Pandas :-
Table of Contents
·
Install pandas on system
·
Excel file’s
·
CSV file’s
·
Text file’s
·
Use SQL
·
Select columns
·
Select rows by value
·
Select row by index
·
Change column data-type
·
Use the function to columns and rows
·
Sorting Values and Sorting the Columns
·
Drop/Remove duplicate values
·
Drop duplicates by column’s
·
Delete column’s
·
Delete rows
·
Sum the column’s
·
Count unique values in columns
·
Subset rows
·
Changes in Excel
·
Changes in CSV
·
Changes in SQL
·
Changes in JSON
·
Make changes in HTML file by pandas
Install pandas on
system
We will use it by python and we can install the
pandas library in the Python by using pip.
To install it on the system we have to install it by command’s on the
Linux/Window on our PC.Command is given below :-
With it, we can install pandas by using conda in the
anaconda like this :-
$ conda install pandas
Excel file’s
We can read the Excel file contents easily by an
Excel file by using the read_excel() method of
pandas. For this,we have to read contents.We
need to import a specific module called xlrd.
Install xlrd byusing pip:
$ pip install xlrd

We have to use some step’s to read Excel file’s.you guys have to know how to read from an Excel file’s :-
· First we have to craete a Excel Sheet and after that we could start our operation’s.
·
Import the
pandas module in the Python :-
import pandas
·
We have to
wrote the name of the Excel file and with it
the sheet number we need to read data from to the read_excel() method.
pandas.read_excel('Excel_file.xlsx',
'First_Sheet')
If we have to check the data type of the output
using this type of result on the screen, The result will show something like
that :-
<class 'pandas.core.frame.DataFrame'>
This output is called a DataFrame.This is the basic unit of pandas and we will discuss this
unit till the end of the tutorial.
DataFrame
The DataFrame is
a 2 Dimensional structure where we can
store data of different types. DataFrame is similar to a SQL table or an Excel spreadsheet.
CSV file’s
If we want to read from a CSV file, we could use the
read_csv() method in the pandas.
Import the pandas module in the python :-
import pandas
Now call the read_csv() method like this :-
pandas.read_csv('data.csv')
Text file’s
We could also use the
read_csv() method of pandas to read fromthe text file’s.Let’s see the
example in the python program :-
import pandas
pandas.read_csv('text.txt')
This text file’sare treated same like a CSV file’s because we have same comma separated
elements in the file. The file also may use another delimiter such as a
semicolon, tab, etc.
Since pandas has no idea about the delimiter, it
translates the tab to \t.
To define the tab character as a delimiter, pass the
delimiter argument like this :-
pandas.read_csv('text.txt', delimiter='\t')
Use SQL

We will use the read_sql() method of pandas to read the SQL from an SQL database. I will show the program in the example in python :-
import sqlite3
import pandas
conn =
sqlite3.connect('emp_database.db')
pandas.read_sql('select * from Employee', conn)
In this example, we are using a database called SQLite3 database that uses a table,that name
is“Employee”.Here we are using the read_sql() method of pandas, then we used some
queries and a connection object to the read_sql() method.
The query fetches all the data on the table of the database.
Select columns
To select the columns from the table, we will use
the following query to display contents :-
select Name, designation from Employee;
Now, we will see how we will done same in the
python.The pandas code will be use like this, show below :-
pandas.read_sql('select Name, designation from Employee',
conn)
We can also select a column from a table by
accessing the data frame. We have shown codes , which is given below :-
i =
pandas.read_sql('select * from Employee', conn)
i['Name']
Select rows
by value
First, we have to create a DataFrame
from which we will select rows.
To create a DataFrame, we have to do that,code ‘s
are given below :-
import pandas
data = {'name':
['naveen', 'tyagi', 'caption'], 'age': [23, 45, 65], 'designation': ['Boss',
'Manager', 'Clerk']}
df = pandas.DataFrame(data)
In this code, we created a DataFrame with three
columns and three rows using the DataFrame() method of pandas.
To select a row tht based on some value, following
statement use to execute :-
df.loc[df['name'] == 'naveen']
df.loc[] or
DataFrame.loc[] is a Boolean array that can be used to access rows or
columns by values or labels. In the above code, the row will be fetched where
the name equals caption.
Select row
by index
To select a row by its index, we could use by either
use the slicing (:) operator or the df.loc[] array.
Code’s are given :-
data = {'name':
['naveen', 'tyagi', 'caption'], 'age': [23, 45, 65], 'designation': ['Boss',
'Manager', 'Clerk']}
df = pandas.DataFrame(data)
We created a DataFrame.
Now let’s access a row using df.loc[] :-
df.loc[1]
Now we will fetched a row of the dictionary that
name is data. We can do the same using the
slicing operator or indexing as following is given :-
df[1:2]
Change column
data-type
The Data type of a column can be changed by using astype() attribute of DataFrame.
To check the data type of the columns, we use the dtypes
attribute of DataFrame.
df.dtypes
Now to convert the data type from one to another :-
df.name = df.name.astype(str)
We fetched the column ‘name’ from our DataFrame and
changed its data type from object to string.
Use the
function to columns and rows
To use the
function on the column’sandthe row’s, you can use the apply() method of DataFrame.
Use the following code’s as a example below :-
data = {'first':
[1, 2, 3], 'second': [4, 5, 6], 'third': [7, 8, 9]}
df = pandas.DataFrame(data)
We created a DataFrame and added values of integer
type in the rows. How we willuse the function for
example square root on the values, we will import the numpy module to
use the sqrt function it look like this :-
import numpy as np
To use the sum function, tyou have to use the
following code :-
df.apply(np.sum)
To apply the function to a specific column, you can
specify the column like this :-
df['first'].apply(np.sqrt)
Sorting
Values and Sorting the columns
To sorting the values and sorting the columns in a DataFrame, use the sort_values()
method of the DataFrame.
Now we have to create a Dataframe like this, which
is shown below :-
data = {'first':
[1, 2, 3], 'second': [4, 5, 6], 'third': [7, 8, 9]}
df = pandas.DataFrame(data)
Now to sorting the values :-
df.sort_values(by=['first'])
The sort_values() method
has an attribute ‘by’ show only
those values which is necessary. In the above code, the values are
sorted by column first. To sort by multiple columns, the code will be given
below :-
df.sort_values(by=['first', 'second'])
If you want to sort in descending order, set
ascending attributed of sort_values()to False as follows :-
df.sort_values(by=['first'], ascending=False)
Drop/Remove
duplicate values
To delete/drop duplicate rows from a DataFrame, use the drop_duplicates()
method of the DataFrame.
Use the following codes, which is given below :-
import pandas
data = {'name':
['naveen', 'tyagi', 'caption',’naveen’], 'age': [23, 45, 65, 23],
'designation': ['Boss', 'Manager', 'Clerk', ‘Boss’]}
df = pandas.DataFrame(data)
Here we created a DataFrame with a duplicate row. To
check if any duplicate rows are present in the DataFrame, use the duplicated() method of the DataFrame.
df.duplicated()
It can be seen that the last row is a duplicate. To
drop or remove this row, execute the following codes in the line, which is
shown here :-
df.drop_duplicates()
Drop
duplicates by column’s
We have the file values where the column values are
the same and we want to delete them. We can drop a row by column by passing the
name of the column we need to delete.
We have shown a
DataFrame, which is given below :-
import pandas
data = {'name':
['naveen', 'tyagi', 'caption',’naveen’], 'age': [23, 45, 65, 23],
'designation': ['Boss', 'Manager', 'Clerk', ‘Boss’]}
df = pandas.DataFrame(data)
Here you can see that naveen is repeated two times.
If you want to remove duplicate by column, just pass the column name as follows
:-
df.drop_duplicates(['name'])
Delete column’s

To delete an entire column or row, we can use the drop() method of the DataFrame by specifying the name of the column or row.
Use the following code, which is given below :-
df.drop(['designation'], axis=1)
In this code, we are shown above, we are deleting
the column named ‘designation’. The axis
argument is necessary here. If the axis value is 1
it means we want to delete columns, if axis value
is 0 it means that row will be deleted. In axis values, 0 is for index
and 1 is for columns.
Delete rows
We can use the drop() method
to drop or delete a row by passing the index value which is given on the row.
Use the following code, which is given below :-
import pandas
data = {'name':
['naveen', 'tyagi', 'caption'], 'age': [23, 45, 65], 'designation': ['Boss',
'Manager', 'Clerk']}
df = pandas.DataFrame(data)
To drop a row with index 0 where the name is naveen,
age is 23 and the designation is Boss, use the following code, which is given
below :-
df.drop([0])
Let’s create a DataFrame where the indexes are the
names :-
data = {'name':
['naveen', 'tyagi', 'caption'], 'age': [23, 45, 65], 'designation': ['Boss', 'Manager',
'Clerk']}
df = pandas.DataFrame(data, index = ['naveen', 'tyagi',
'caption'])
Now we can delete a row with a certain value. For example, if we want to delete a row where the
name is tyagi, then the code will be given below :-
df.drop(['tyagi'])
You can also delete a range of row as :-
df.drop(df.index[[0, 1]])
This will delete rows from index 0 to 1 and one row
left only since our DataFrame composed of 3 rows.
If you want to delete the last row from the
DataFrame and do not know what is the total number of rows then you can use the
negative indexing as below :-
df.drop(df.index[-1])
-1 deletes the last row. Similarly -2 will delete
last 2 rows and so on.
Sum the column’s
You can use the sum() method of the DataFrame to sum
the column items.
Suppose we have the following DataFrame :-
data = {'first':
[1, 2, 3], 'second': [4, 5, 6], 'third': [7, 8, 9]}
df = pandas.DataFrame(data)
Now to sum the items of column A, use the following
line of code, which is given below :-
df['first'].sum()
You can also use the apply()
method of the DataFrame and pass in the sum
method of numpy to sum the values.
Count unique
values in columns
To count unique values in a column, you can use the nunique() method of the DataFrame.
Use the following code, which is given below :-
data = {'first':
[1, 2, 2], 'second': [4, 5, 6], 'third': [7, 8, 9]}
df = pandas.DataFrame(data)
To count the unique values in column first :-
df['first'].nunique()
As you can see, column first has only 2 unique
values 2 and 2 and another 2 is a duplicate that’s why we have 2 in the output.
If you want to count all the values in a column, you
can use the count() method which is given
below :-
df['first'].count()
Subset rows
To select a subset of a DataFrame,
You can use the square brackets.
For example, we
have a DataFrame that contains some integers. We can select or subset a row
like this :-
df.[start:count]
The start point will be included in the subset but
stop point is not included. For example, to
select 3 rows starting from the first row, you will write like this :-
df[0:3]
That code means start from the first row which is 0
and select 3 rows.
Similarly, to select the first 2 rows, you will
write :-
df[0:2]
To select or subset the last row, use the negative
indexing as :-
df[-1:]
Changes in Excel
To change/modify a DataFrame of an Excel sheet, we
can use the to_excel() method.
To write to an Excel sheet, you have to open the
sheet and to open an Excel sheet we will have to import
openpyxl module.
Install openpyxl
using pip:
$ pip install openpyxl
Use the following code, which is given below :-
import openpyxl
data = {'name':
['naveen', 'tyagi', 'caption'], 'age': [23, 45, 65], 'designation': ['Boss',
'Manager', 'Clerk']}
df =
pandas.DataFrame(data)
df.to_excel("sheet.xlsx", "page1")
Changes in CSV
Now we have learn how we covert or show a DataFrame to CSV,
you can use the to_csv() method as in the
following codes, which is given below :-
df.to_csv("file.csv")
Changes in SQL
To convert the data to SQL, we can use the to_sql() method.
Use the following code, which is given below :-
import sqlite3
import pandas
conn =
sqlite3.connect('emp_database.db')
data = {'name':
['naveen', 'tyagi', 'caption'], 'age': [23, 45, 65], 'designation': ['Boss',
'Manager', 'Clerk']}
df =
pandas.DataFrame(data)
df.to_sql('name', conn)
In this code, we created a connection with the use
of sqlite3
database. Then we created a DataFrame
with three rows and three columns.
Finally, we used the to_sql()method
of our DataFrame (df) and passed the name of
the table where the data will be stored along with the connection object.
Changes in JSON
You can use the to_json()
method to convert and create a
DataFrame to write to a JSON file.
Use the following code, which is given below :-
df.to_json("file.json")
In this line of code, the name of the JSON file is
passed as an argument. The DataFrame will be stored in the JSON file.
Make changes in HTML file by pandas

You can use the to_html() method of the DataFrame to create an HTML file with the DataFrame content.
Use the following code, which is given below :-
df.to_html("index.html")
We can use pandas and work on it is very easy. I
fell very nice to work with pandas and it’s
give me happiness and increase my experience on python pandas.pandas
DataFrame is a very flexible piece library you can ever use in your life.
I hope you find the tutorial useful,plz comment here >>>
If you Guys have any queries or want to read some new tutorials, Comment
If you Guys have any queries or want to read some new tutorials, Comment
0 Comments