Use CURD Operations In Hive By Python
Hello Friends, Today my
tutorial are based on those guys who worked on HIVE
and use CURD
Operations on it by Python Programming Language.I saw many developers use HIVE on their projects, and their many
developers have problems.
So,without wasting time let’s start tutorial but first we have to
know what is HIVE :-
Introduction
Hive is a data warehouse infrastructure tool to
process structured data in Hadoop. It resides on top of Hadoop to summarize Big
Data, and makes querying and analyzing easy.
We know Hive
is a type of data warehouse software which have some facilitates of
reading, writing, and managing large datasets residing in distributed storage
using SQL. Structure can be projected onto data already in storage.
A command
line tool and JDBC driver are provided to connect users to Hive.
Use Hive In Python
In python we have to use Hive and the easiest way is to use Hive
in python is use PyHive :-
To install we will have to need these libraries :-
$ pip install sasl
$ pip install thrift
$ pip install
thrift-sasl
$ pip install PyHive
After installation, you can connect to Hive like
this:-
from pyhive import hive
conn =
hive.Connection(host="HOST_ADDRESS", port=PORT,
username="Username")
Now that you have the hive connection, you have
options how to use it. You can just straight-up query:-
cursor = conn.cursor()
cursor.execute("SELECT
database FROM table_name")
for result in
cursor.fetchall():
use_result(result)
Or we can use the connection to make a Pandas dataframe….
import pandas as pd
df =
pd.read_sql("SELECT database FROM table_name", conn)
CURD Opeartion’s On Hive
If you want to perform Hive CRUD using ACID
operations, we have to know some basics of the Hive CURD Operation on Hive.
In order to perform
CREATE, SELECT, UPDATE, DELETE, We have to ensure while creating the table with
the following conditions :-
·
File format
should be in ORC file format with
TBLPROPERTIES(‘transactional’=’true’)
·
Table should
be CLUSTERED BY with some Buckets,
please refer the below CREATE TABLE statement.
How to Check Hive Version?
$ hive –version
Hive 0.14.0.2.2.0.0-2041
HIVE CREATE TABLE
Let us CREATE Database table with following fields Values
CREATE TABLE Database
(
STD_ID INT,
STD_NAME STRING,
AGE INT,
ADDRESS STRING
)
CLUSTERED BY (ADDRESS)
into 3 buckets
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED as orc
tblproperties('transactional'='true');
HIVE INSERT INTO TABLE
Here let us see how to INSERT rows Manually into
Newly Created Database Table.
use the following Hiveql to insert the data into Database table.
INSERT INTO TABLE
Database VALUES (1,'rau',14,'name'),
> (2,'mannu',15,'29
pallu'),
> (3,'raman',27,'34
new jarv'),
> (4,'jay',34,'uttar
pradesh'),
> (5,'kannu',38,'new
delhi');
HIVE SELECT
Let us select all the rows from Database table :-
SELECT * FROM Database;
HIVE UPDATE
Let us try to update student id from 5 to 100 using
UPDATE.
update Database> SET
std_id = 110 > WHERE std_id = 5;
update Database
> SET std_id = 100
> WHERE std_id = 5;
SELECT FROM STUDENT TABLE
SELECT * FROM Database;
HIVE DELETE FROM TABLE
Let us delete the row where STD_ID =10.
DELETE FROM Database>
where std_id=5; Usage: delete [FILE|JAR|ARCHIVE] []* Query returned non-zero code: 1, cause:
null
DELETE FROM Database>
where std_id=10;
DELETE FROM Database
> where std_id=105;
Usage: delete
[FILE|JAR|ARCHIVE] []*
Query returned non-zero code: 1, cause: null
DELETE FROM Database
> where std_id=110;
Let us try to delete all the Rows from STUDENT Table
:-
DELETE FROM Database;
Let us check whether all the values are removed or
not .
The DELETE statement has removed all the rows
successfully.
SELECT * FROM Database;
If you Guys like my tutorial plz share it with your best friends↝↜ that's all for today,
Thank you Guys and Good Bye...
>>> Compare two voices by python
0 Comments