Hive tutorial | basic hive tutorial
Updated:01/20/2021 by Computer Hope
Developed by Facebook, Apache Hive Licence umder for Apache License 2.0
hive is a data warehousing infrastructure based on Apache Hadoop.
Hadoop provides massive scale out and fault tolerance capabilities for data storage and processing on commodity hardware.
Hive is designed to enable easy data summarization, ad-hoc querying and analysis of large volumes of data.
It provides SQL which enables users to do ad-hoc querying, summarization and data analysis easily.
At the same time, Hive's SQL gives users multiple places to integrate their own functionality to do custom analysis,
such as User Defined Functions (UDFs).
It allows users to query large datasets stored in Hadoop's HDFS and other compatible file systems using a SQL-like language called HiveQL. Here's a basic tutorial to get you started with Hive.
This concept is help in hive interview process.
Hive is not designed for online transaction processing. It is best used for traditional data warehousing tasks.
What is Architecture & Modes
Hive supports primitive and complex data types, as described below.
- Integers
- TINYINT—1 byte integer
- SMALLINT—2 byte integer
- INT—4 byte integer
- BIGINT—8 byte integer
- Boolean type
- BOOLEAN—TRUE/FALSE
- Floating point numbers
- FLOAT—single precision
- DOUBLE—Double precision
- Fixed point numbers
- DECIMAL—a fixed point value of user defined scale and precision
- String types
- STRING—sequence of characters in a specified character set
- VARCHAR—sequence of characters in a specified character set with a maximum length
- CHAR—sequence of characters in a specified character set with a defined length
- Date and time types
- TIMESTAMP — A date and time without a timezone ("LocalDateTime" semantics)
- TIMESTAMP WITH LOCAL TIME ZONE — A point in time measured down to nanoseconds ("Instant" semantics)
- DATE—a date
- Binary types
- BINARY—a sequence of bytes
Complex Types can be built up from primitive types and other composite types using:
- Structs
- Maps
- Arrays
Creating Tables in hive
hive> CREATE TABLE page_view(viewTime INT, userid BIGINT, page_url STRING, referrer_url STRING, ip STRING COMMENT 'IP Address of the User')COMMENT 'This is the page view table'PARTITIONED BY(dt STRING, country STRING)STORED AS SEQUENCEFILE;
Altering Tables in hive
To rename existing table to a new name. If a table with new name already exists then an error is returned:
ALTER TABLE old_table_name RENAME TO new_table_name;
To rename the columns of an existing table. Be sure to use the same column types, and to include an entry for each preexisting column:
ALTER TABLE old_table_name REPLACE COLUMNS (col1 TYPE, ...);
To add columns to an existing table:
ALTER TABLE tab1 ADD COLUMNS (c1 INT COMMENT 'a new int column', c2 STRING DEFAULT 'def val');
Droping Tables
Dropping tables is fairly trivial. A drop on the table would implicitly drop any indexes(this is a future feature) that would have been built on the table. The associated command is:
hive> DROP TABLE pv_users;
To dropping a partition. Alter the table to drop the partition.
hive>ALTER TABLE pv_users DROP PARTITION (ds='2008-08-08')
Load the data into internal table
hive> LOAD DATA INPATH '/user/developer/data.txt' INTO table pv_users;
External table
hive>CREATE EXTERNAL TABLE developer_view(id INT,Name STRING)
Row format delimitedFields terminated by '\t'LOCATION '/user/developer/external;
Load the data into External table
hive> LOAD DATA INPATH '/user/developer/data.txt' INTO table developer_view;
Conclusion
This is merely an introduction to Hive to get you going. As you explore more, Hive provides a plethora of additional capabilities and functionalities for Hadoop data analysis and manipulation.
In this article ,Hive supports various SQL-like operations such as aggregations, joins, filtering, sorting, and more. You can perform these operations to analyze your data.
Exit Hive: Once you're done, you can exit the Hive shell by typing exit;.