Comparison chart showing the use of operators like LIKE, IN, and BETWEEN in HiveQL.
Diagram showing the workflow of filtering data using the WHERE clause in HiveQL

To get started with Hive, ensure you have a Hadoop cluster set up and running. Once configured, install Hive and use it to query and manage your data.
CREATE TABLE IF NOT EXISTS users (
id INT,
name STRING,
email STRING
)
STORED AS TEXTFILE;
INSERT INTO TABLE users
VALUES (1, 'rahul Doe', '[email protected]'),
(2, 'Jane Smith', '[email protected]'),
(3, 'Bob Johnson', '[email protected]');
The WHERE clause in Hive filters data based on specific conditions. It enables selection of only those rows that meet the specified criteria. It supports various operators, including:
=, <, >, <=, >=, <> (not equal)LIKEINBETWEENWHERE
SELECT *
FROM developer
WHERE name = 'Rahul Doe';
This query retrieves all records where the name column matches 'John Doe'.
You can combine conditions using logical operators:
SELECT *
FROM developer
WHERE name = 'rahul Doe' AND email LIKE '%@example.com';
This query fetches records where name is 'John Doe' and the email ends with '@example.com'.
IN Operator
SELECT *
FROM developer
WHERE id IN (1, 3);
This query selects users with id values of 1 or 3.
BETWEEN Operator
SELECT *
FROM developer
WHERE id BETWEEN 1 AND 2;
This query retrieves users with id values between 1 and 2 (inclusive).
Understanding the WHERE clause is essential for querying large datasets efficiently in Hive. Using filters correctly ensures optimal performance and quick retrieval of relevant records.
By applying these concepts, you can efficiently manage and analyze data in your Hive tables, ensuring meaningful insights and improved performance.