Hive tutorial | basic hive tutorial

Updated:01/20/2021 by Computer Hope

Developed by Facebook, Apache Hive Licence umder for  Apache License 2.0 hive is a data warehousing infrastructure based on Apache Hadoop. Hadoop provides massive scale out and fault tolerance capabilities for data storage and processing on commodity hardware.
Hive is designed to enable easy data summarization, ad-hoc querying and analysis of large volumes of data. It provides SQL which enables users to do ad-hoc querying, summarization and data analysis easily. At the same time, Hive's SQL gives users multiple places to integrate their own functionality to do custom analysis, such as User Defined Functions (UDFs).
It allows users to query large datasets stored in Hadoop's HDFS and other compatible file systems using a SQL-like language called HiveQL. Here's a basic tutorial to get you started with Hive.
This concept is help in hive interview process.

Hive is not designed for online transaction processing. It is best used for traditional data warehousing tasks.

What is Architecture & Modes

What is  hive , Hive table , hive query spark hadoop big data

Hive supports primitive and complex data types, as described below.

Integers
TINYINT—1 byte integer
SMALLINT—2 byte integer
INT—4 byte integer
BIGINT—8 byte integer
Boolean type
BOOLEAN—TRUE/FALSE
Floating point numbers
FLOAT—single precision
DOUBLE—Double precision
Fixed point numbers
DECIMAL—a fixed point value of user defined scale and precision
String types
STRING—sequence of characters in a specified character set
VARCHAR—sequence of characters in a specified character set with a maximum length
CHAR—sequence of characters in a specified character set with a defined length
Date and time types
TIMESTAMP — A date and time without a timezone ("LocalDateTime" semantics)
TIMESTAMP WITH LOCAL TIME ZONE — A point in time measured down to nanoseconds ("Instant" semantics)
DATE—a date
Binary types
BINARY—a sequence of bytes
Complex Types can be built up from primitive types and other composite types using:
Structs
Maps
Arrays

Creating Tables in hive

hive> CREATE TABLE page_view(viewTime INT, userid BIGINT, page_url STRING, referrer_url STRING, ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
STORED AS SEQUENCEFILE;

Altering Tables in hive

To rename existing table to a new name. If a table with new name already exists then an error is returned:
ALTER TABLE old_table_name RENAME TO new_table_name;
To rename the columns of an existing table. Be sure to use the same column types, and to include an entry for each preexisting column:
ALTER TABLE old_table_name REPLACE COLUMNS (col1 TYPE, ...);
To add columns to an existing table:
ALTER TABLE tab1 ADD COLUMNS (c1 INT COMMENT 'a new int column', c2 STRING DEFAULT 'def val');

Droping Tables

Dropping tables is fairly trivial. A drop on the table would implicitly drop any indexes(this is a future feature) that would have been built on the table. The associated command is:
hive> DROP TABLE pv_users;
To dropping a partition. Alter the table to drop the partition.
hive>ALTER TABLE pv_users DROP PARTITION (ds='2008-08-08')

Load the data into internal table

hive> LOAD DATA INPATH '/user/developer/data.txt' INTO table pv_users;

External table

hive>CREATE EXTERNAL TABLE developer_view(id INT,Name STRING)
Row format delimited
Fields terminated by '\t'
LOCATION '/user/developer/external;

Load the data into External table

hive> LOAD DATA INPATH '/user/developer/data.txt' INTO table developer_view;

Features of Hive

  • It stores schema in a database and processed data into HDFS. It is designed for OLAP
  • It provides SQL type language for querying called HiveQL or HQL. It is familiar, fast, scalable, and extensible
  • File-formats: Hive provides support for various file formats such as textFile, ORC, Avro Files, SequenceFile, Parquet, RCFile, LZO Compression etc.
  • Hive supports ETL operations. Hive is an effective ETL tool
  • Hive allows us to run Ad-hoc queries which are the loosely typed command or query whose value depends on some variable for the data analysis
  • Hive can be used for Data Visualization. Integrating Hive with Apache Tez will provide the real time processing capabilities.

Conclusion

This is merely an introduction to Hive to get you going. As you explore more, Hive provides a plethora of additional capabilities and functionalities for Hadoop data analysis and manipulation.
In this article ,Hive supports various SQL-like operations such as aggregations, joins, filtering, sorting, and more. You can perform these operations to analyze your data.
Exit Hive: Once you're done, you can exit the Hive shell by typing exit;.

Latest Technical Post