Pandas Tutorial


    Start Tutorial On Python Pandas


Iam going to start the session on Python Pandas Tutorial, which is very commonly used in Python Machine Learning(ML) and Artificial Intelligence(AI) field’s.Basically Pandas used for creating DataFrames in the projects.

Introduction


Pandas is an open source Python library which provides data analysis and manipulation in Python programming language.




Basically this  library is used in the data representation, filtering, and statistical programming. The most important topic in pandas is the DataFrame where you store and play with the data.

In this tutorial, we will show you guys, what is the DataFrame, how you guys will create it by python pandas libray, how we will export our data to different outputs, and how we will manipulate thevalues/data.

I will teach you each step-by-step contents of the Pandas :-

Table of Contents

·     Install pandas on system
·     Excel file’s
·     CSV file’s
·     Text file’s
·     Use SQL
·     Select columns
·     Select rows by value
·     Select row by index
·     Change column data-type
·      Use the function to columns and rows
·     Sorting Values and Sorting the Columns
·     Drop/Remove duplicate values
·     Drop duplicates by column’s
·     Delete column’s
·     Delete rows
·     Sum the column’s
·     Count unique values in columns
·     Subset rows
·      Changes in Excel
·     Changes in CSV
·     Changes in SQL
·     Changes in JSON
·      Make changes in HTML file by pandas

Install pandas on system

We will use it by python and we can install the pandas library in the Python by using pip. To install it on the system we have to install it by command’s on the Linux/Window on our PC.Command is given below :-

$ pip install pandas
With it, we can install pandas by using conda in the anaconda like this :-

$ conda install pandas

Excel file’s

We can read the Excel file contents easily by an Excel file by using the read_excel() method of pandas. For this,we have to read contents.We  need to import a specific module called xlrd.

Install xlrd byusing pip:
$ pip install xlrd






We have to use some step’s to read Excel file’s.you guys have to know how to read from an Excel file’s :-






·         First we have to craete a Excel Sheet and after that we could start our operation’s.

·         Import the pandas module in the Python :-

import pandas
·         We have to wrote the name of the Excel file and with it  the sheet number we need to read data from to the read_excel() method.

pandas.read_excel('Excel_file.xlsx', 'First_Sheet')
If we have to check the data type of the output using this type of result on the screen, The result will show something like that :-

<class 'pandas.core.frame.DataFrame'>
This output is called a DataFrame.This is the basic unit of pandas and we will discuss this unit till the end of the tutorial.

DataFrame

The DataFrame is a  2 Dimensional structure where we can store data of different types. DataFrame is similar to a SQL table or an Excel spreadsheet.

CSV file’s

If we want to read from a CSV file, we could use the read_csv() method in the pandas.

Import the pandas module in the python :-

import pandas
Now call the read_csv() method like this :-

pandas.read_csv('data.csv')

Text file’s

We could also use the read_csv() method of pandas to read fromthe text file’s.Let’s see the example in the python program :-

import pandas

pandas.read_csv('text.txt')
This text file’sare treated same like a CSV file’s because we have same comma separated elements in the file. The file also may use another delimiter such as a semicolon, tab, etc.

Since pandas has no idea about the delimiter, it translates the tab to \t.
To define the tab character as a delimiter, pass the delimiter argument like this :-

pandas.read_csv('text.txt', delimiter='\t')

Use SQL



We will use the read_sql() method of pandas to read  the SQL from an SQL database. I will show the program in the example in python :-

import sqlite3

import pandas

conn = sqlite3.connect('emp_database.db')

pandas.read_sql('select * from Employee', conn)
In this example, we are using a database called SQLite3 database that uses a table,that name is“Employee”.Here we are using the read_sql() method of pandas, then we used some queries and a connection object to the read_sql() method. The query fetches all the data on the table of the database.

Select columns

To select the columns from the table, we will use the following query to display contents :-

select Name, designation from Employee;
Now, we will see how we will done same in the python.The pandas code will be use like this, show below :-

pandas.read_sql('select Name, designation from Employee', conn)
We can also select a column from a table by accessing the data frame. We have shown codes , which is given below :-

i = pandas.read_sql('select * from Employee', conn)

i['Name']

Select rows by value

First, we have to create a DataFrame from which we will select rows.
To create a DataFrame, we have to do that,code ‘s are given below :-

import pandas

data = {'name': ['naveen', 'tyagi', 'caption'], 'age': [23, 45, 65], 'designation': ['Boss', 'Manager', 'Clerk']}

df = pandas.DataFrame(data)
In this code, we created a DataFrame with three columns and three rows using the DataFrame() method of pandas.

To select a row tht based on some value, following statement  use to execute :-

df.loc[df['name'] == 'naveen']
df.loc[] or DataFrame.loc[] is a Boolean array that can be used to access rows or columns by values or labels. In the above code, the row will be fetched where the name equals caption.

Select row by index

To select a row by its index, we could use by either use the slicing (:) operator or the df.loc[] array.

Code’s are given :-

data = {'name': ['naveen', 'tyagi', 'caption'], 'age': [23, 45, 65], 'designation': ['Boss', 'Manager', 'Clerk']}

df = pandas.DataFrame(data)
We created a DataFrame. Now let’s access a row using df.loc[] :-

df.loc[1]
Now we will fetched a row of the dictionary that name is data. We can do the same using the slicing operator or indexing as following is given :-

df[1:2]

Change column data-type

The Data type of a column can be changed by using astype() attribute of DataFrame. To check the data type of the columns, we use the dtypes attribute of DataFrame.

df.dtypes
Now to convert the data type from one to another :-

df.name = df.name.astype(str)
We fetched the column ‘name’ from our DataFrame and changed its data type from object to string.

Use the function to columns and rows

To use the  function on the column’sandthe row’s, you can use the apply() method of DataFrame.

Use the following code’s as a example below :-

data = {'first': [1, 2, 3], 'second': [4, 5, 6], 'third': [7, 8, 9]}

df = pandas.DataFrame(data)
We created a DataFrame and added values of integer type in the rows. How we willuse the function for example square root on the values, we will import the numpy module to use the sqrt function it look like this :-

import numpy as np

df.apply(np.sqrt)
To use the sum function, tyou have to use the following code :-

df.apply(np.sum)

To apply the function to a specific column, you can specify the column like this :-

df['first'].apply(np.sqrt)

Sorting Values and Sorting the columns

To sorting the values and sorting the columns in a DataFrame, use the sort_values() method of the DataFrame.

Now we have to create a Dataframe like this, which is shown below :-

data = {'first': [1, 2, 3], 'second': [4, 5, 6], 'third': [7, 8, 9]}

df = pandas.DataFrame(data)
Now to sorting the values :-

df.sort_values(by=['first'])
The sort_values() method has an attribute ‘by’  show only those values which is necessary. In the above code, the values are sorted by column first. To sort by multiple columns, the code will be given below :-

df.sort_values(by=['first', 'second'])
If you want to sort in descending order, set ascending attributed of sort_values()to False as follows :-

df.sort_values(by=['first'], ascending=False)

Drop/Remove duplicate values

To delete/drop duplicate rows from a DataFrame, use the drop_duplicates() method of the DataFrame.

Use the following codes, which is given below :-

import pandas

data = {'name': ['naveen', 'tyagi', 'caption',’naveen’], 'age': [23, 45, 65, 23], 'designation': ['Boss', 'Manager', 'Clerk', ‘Boss’]}

df = pandas.DataFrame(data)
Here we created a DataFrame with a duplicate row. To check if any duplicate rows are present in the DataFrame, use the duplicated() method of the DataFrame.

df.duplicated()
It can be seen that the last row is a duplicate. To drop or remove this row, execute the following codes in the line, which is shown here :-

df.drop_duplicates()

Drop duplicates by column’s

We have the file values where the column values are the same and we want to delete them. We can drop a row by column by passing the name of the column we need to delete.

We have shown a  DataFrame, which is given below :- 

import pandas

data = {'name': ['naveen', 'tyagi', 'caption',’naveen’], 'age': [23, 45, 65, 23], 'designation': ['Boss', 'Manager', 'Clerk', ‘Boss’]}

df = pandas.DataFrame(data)
Here you can see that naveen is repeated two times. If you want to remove duplicate by column, just pass the column name as follows :-

df.drop_duplicates(['name'])

Delete column’s



To delete an entire column or row, we can use the drop() method of the DataFrame by specifying the name of the column or row.

Use the following code, which is given below :-

df.drop(['designation'], axis=1)
In this code, we are shown above, we are deleting the column named ‘designation’. The axis argument is necessary here. If the axis value is 1 it means we want to delete columns, if axis value is 0 it means that row will be deleted. In axis values, 0 is for index and 1 is for columns.

Delete rows

We can use the drop() method to drop or delete a row by passing the index value which is given on the row.

Use the following code, which is given below :-

import pandas

data = {'name': ['naveen', 'tyagi', 'caption'], 'age': [23, 45, 65], 'designation': ['Boss', 'Manager', 'Clerk']}

df = pandas.DataFrame(data)
To drop a row with index 0 where the name is naveen, age is 23 and the designation is Boss, use the following code, which is given below :-

df.drop([0])
Let’s create a DataFrame where the indexes are the names :-

data = {'name': ['naveen', 'tyagi', 'caption'], 'age': [23, 45, 65], 'designation': ['Boss', 'Manager', 'Clerk']}

df = pandas.DataFrame(data, index = ['naveen', 'tyagi', 'caption'])
Now we can delete a row with a certain value. For example, if we want to delete a row where the name is tyagi, then the code will be given below :-

df.drop(['tyagi'])

You can also delete a range of row as :-

df.drop(df.index[[0, 1]])
This will delete rows from index 0 to 1 and one row left only since our DataFrame composed of 3 rows.

If you want to delete the last row from the DataFrame and do not know what is the total number of rows then you can use the negative indexing as below :-

df.drop(df.index[-1])
-1 deletes the last row. Similarly -2 will delete last 2 rows and so on.

Sum the column’s

You can use the sum() method of the DataFrame to sum the column items.

Suppose we have the following DataFrame :-

data = {'first': [1, 2, 3], 'second': [4, 5, 6], 'third': [7, 8, 9]}

df = pandas.DataFrame(data)
Now to sum the items of column A, use the following line of code, which is given below :-

df['first'].sum()
You can also use the apply() method of the DataFrame and pass in the sum method of numpy to sum the values.

Count unique values in columns

To count unique values in a column, you can use the nunique() method of the DataFrame.

Use the following code, which is given below :-

data = {'first': [1, 2, 2], 'second': [4, 5, 6], 'third': [7, 8, 9]}

df = pandas.DataFrame(data)
To count the unique values in column first :-

df['first'].nunique()
As you can see, column first has only 2 unique values 2 and 2 and another 2 is a duplicate that’s why we have 2 in the output.

If you want to count all the values in a column, you can use the count() method which is given below :-

df['first'].count()

Subset rows

To select a subset of a DataFrame, You can use the square brackets.
For example, we have a DataFrame that contains some integers. We can select or subset a row like this :-

df.[start:count]
The start point will be included in the subset but stop point is not included. For example, to select 3 rows starting from the first row, you will write like this :-

df[0:3]
That code means start from the first row which is 0 and select 3 rows.
Similarly, to select the first 2 rows, you will write :-

df[0:2]
To select or subset the last row, use the negative indexing as :-

df[-1:]

Changes in Excel

To change/modify a DataFrame of an Excel sheet, we can use the to_excel() method.

To write to an Excel sheet, you have to open the sheet and to open an Excel sheet we will have to import openpyxl module.

Install openpyxl using pip:
$ pip install openpyxl
Use the following code, which is given below :-

import openpyxl

data = {'name': ['naveen', 'tyagi', 'caption'], 'age': [23, 45, 65], 'designation': ['Boss', 'Manager', 'Clerk']}

df = pandas.DataFrame(data)

df.to_excel("sheet.xlsx", "page1")

Changes in CSV

Now we have learn how we covert or show a DataFrame to CSV, you can use the to_csv() method as in the following codes, which is given below :-

df.to_csv("file.csv")

Changes in SQL

To convert the data to SQL, we can use the to_sql() method.

Use the following code, which is given below :-

import sqlite3

import pandas

conn = sqlite3.connect('emp_database.db')

data = {'name': ['naveen', 'tyagi', 'caption'], 'age': [23, 45, 65], 'designation': ['Boss', 'Manager', 'Clerk']}

df = pandas.DataFrame(data)

df.to_sql('name', conn)
In this code, we created a connection with the use of  sqlite3 database. Then we created a DataFrame with three rows and three columns.

Finally, we used the to_sql()method of our DataFrame (df) and passed the name of the table where the data will be stored along with the connection object.

Changes in JSON

You can use the to_json() method to convert and create a  DataFrame to write to a JSON file.

Use the following code, which is given below :-

df.to_json("file.json")
In this line of code, the name of the JSON file is passed as an argument. The DataFrame will be stored in the JSON file.

Make changes in HTML file by pandas



You can use the to_html() method of the DataFrame to create an HTML file with the DataFrame content.

Use the following code, which is given below :-

df.to_html("index.html")
We can use pandas and work on it is very easy. I fell very nice to work with pandas and it’s  give me happiness and increase my experience on python pandas.pandas DataFrame is a very flexible piece library you can ever use in  your life.

I hope you find the tutorial useful,plz comment here >>>

If you Guys have any queries or want to read some new tutorials, Comment 


Reactions

Post a Comment

0 Comments