Use Different Formats To Import Data in Python
Welcome back friends one’s again Iam come with a new
topic and Today topic is on explaination
on various methods to read data in
Python. Data could be in any of the popular formats - CSV,
TXT, XLS/XLSX (Excel), sas7bdat (SAS), Stata, Rdata (R) etc. Loading
data in python environment is the most initial step of analyzing data.
Importing
Data into Python
When we
importing external files, we need to check the following points : -
·
Check whether
header row exists or not
·
Treatment of
special values as missing values
·
Consistent data
type in a variable (column)
·
Date Type
variable in consistent date format.
·
No truncation
of rows while reading external data
Contents
·
Import
CSV files from PC
·
Import
File from URL by user
·
Read
Text File in python
·
Read
Excel File in python
·
Read
delimited file in python
·
Read
SAS File in python
·
Read
Stata File in python
·
Importing
R Data Files
·
Reading
SQL Tables
·
Import
ing Datas from SPSS Files
·
Read
example of rows and columns
·
Skip
rows while importing
·
Specify
values as missing values
Installing
pandas Package
pandas is a very powerful data analysis package. It
makes data exploration and manipulation easy. It has several functions to read
data from various sources.
If you are using Anaconda, pandas must be already
installed. You need to load the package by using the following command -
import pandas as pd
If pandas package is not installed, you can install it
by running the following code in Ipython Console. If you are using Spyder, you
can submit the following code in Ipython console within Spyder.
$ pip install pandas
If you are using Anaconda, you can try the following
line of code to install pandas -
$ conda install pandas
1. Import
CSV files from PC
We have to use a Path because It is important to note
that a singlebackslash(/)is working until we
are specifying the file path. Example is
given below :-
import pandas as pd
value=
pd.read_csv("home/file.csv")
If no header (title) in raw data file
value=
pd.read_csv("home/file.csv", header = None)
You need to include header = None option to tell
Python there is no column name (header) in data.
Add
Column Names
We can include column names by using names= option.
value1=
pd.read_csv("home/file.csv", header = None), names = ['ID', 'first_name', 'salary'])
The variable names can also be added separately by
using the following command.
value1.columns = ['ID',
'first_name', 'salary']
2. Import
File from URL by user
You don't need to perform additional steps to fetch
data from URL. Simply put URL in read_csv() function (applicable only for CSV
files stored in URL).
value=
pd.read_csv("home/file.csv")
3. Read
Text File in python
We can use read_table() function to pull data from
text file. We can also use read_csv() with sep= "\t" to read data
from tab-separated file.
value=
pd.read_csv("home/file.txt")
value=
pd.read_csv("home/file.txt", sep = “\t”)
4. Read
Excel File in python
The read_excel() function can be used to import excel
data into Python.
value=
pd.read_excel("https://www.enemy.com/dnew_data.xls",sheetname="Data",
skiprows=2)
If you do not specify name of sheet in sheetname=
option, it would take by default first sheet.
5. Read
delimited file in python
Assume you want to import a file that is separated
with white spaces.
Value1 =
pd.read_excel("https://www.enemy.com/dnew_data.xls",sep="\s+",
header = None)
To
include variable names, use the names= option which is given below: -
Value2 =
pd.read_excel("https://www.enemy.com/dnew_data.xls",sep="\s+",
names=['a', 'b', 'c', 'd'])
6. Read
SAS File in python
Before we start our next thing we have to know, We can
import SAS data file by using read_sas() function.
value4 =
pd.read_sas('data.sas7bdat')
If you have a large SAS File, you can try package
named pyreadstat which is faster than pandas. It is equivalent to haven package
in R which provides easy and fast way to read data from SAS, SPSS and Stata. To
install this package, you can use the command pip install pyreadstat
import pyreadstat
df, meta =
pyreadstat.read_sas7bdat('data.sas7bdat')
print(df.head())
print(meta.column_names)
print(meta.column_labels)
print(meta.number_rows)
print(meta.number_columns)
7. Read
Stata Filesin Python
We can load Stata data file via read_stata() function.
value5 =
pd.read_stata('data.dta')
pyreadstat package lets you to pull value labels from
stata files.
import pyreadstat
df, meta =
pyreadstat.read_dta("data.dta")
To get labels, set apply_value_formats as TRUE
df, meta =
pyreadstat.read_dta("data.dta", apply_value_formats=True)
8.
Importing R Data Files
Using pyreadr package, you can load .RData and .Rds
format files which in general contains R data frame. You can install this
package using the command is given below : -
$ pip install pyreadr
With the use of read_r( )
function, we can import R data format files.
import pyreadr
value =
pyreadr.read_r('home/ravi/a.RData')
print(value.keys())
df =
result["df"]
Same code you have to use or you can read .Rds
formatted file.
9.
Reading SQL Tables
We can
extract table from SQL database (SQL Server/ sqlite3). See the program which is
given below: -
SQL
Server
You can read data from tables stored in SQL Server by
building a connection. You need to have server, User ID (UID), database details
to establish connection.
import pandas as pd
import pyodbc
conn =
pyodbc.connect("Driver={SQL
Server};Server=serverName;UID=UserName;PWD=Password;Database=database;")
df =
pd.read_sql_query('select * from dbo.Table WHERE ID > 10', conn)
df.head()
Sqlite3
Suppose you have .db extension file which is a
database file and you want to extract data from it.
import sqlite3
from pandas.io import sql
conn =
sqlite3.connect('C:/Users/Deepanshu/Downloads/flight.db')
query = "SELECT *
FROM flight"
results =
pd.read_sql(query, con=conn)
print results.head()
Explanation
of databases :-
·
UdaExec
provides DevOps support features such as configuration and logging.
·
You can assign
any name and version in appName and version
·
logConsole=False
tells Python not to log to the console.
·
system="tdprod"
refers to name of the system we are connecting using ODBC as the connection
method
·
USEREGIONALSETTINGS="N"
is used to ensure that float values can be loaded and make decimal separator be
‘.’
10.
Importing Datas from SPSS Files
import pyreadstat
df, meta =
pyreadstat.read_sav("file.sav", apply_value_formats=True)
If you don't want value labels, make
apply_value_formats as False.
11. Read
example of rows and columns
By specifying nrows= and usecols=, you can fetch
specified number of rows and columns.
value =
pd.read_csv("http://winterseller.com/data/value.csv", nrows=5,
usecols=(1,4,10))
nrows = 5 implies you want
to import only first 5 rows and usecols= refers to specified columns you want
to import.
12. Skip
rows while importing
Assume you want to skip first 10 rows and wants to
read data from 9th row (9th row would be a header row)
value8 =
pd.read_csv("http://winterseller.com/data/value.csv", skiprows=10)
13. Specify
values as missing values
By including na_values= option, you can specify values
as missing values. In this case, we are telling python to consider dot (.) as
missing cases.
value_data =
pd.read_csv("data/value.csv", na_values=['.'])
That's all for today,GoodBye Friend's :-
Guys I hope you like this tutorial or this tutorial is helpful to you, then please Share it with your friends……...
That's all for today,GoodBye Friend's :-
Guys I hope you like this tutorial or this tutorial is helpful to you, then please Share it with your friends……...
If you guys do not understand the concept of this program or your program is not worked properly...
Leave a Comment below friends I will give you guys, all of your answers.
Leave a Comment below friends I will give you guys, all of your answers.
0 Comments