Step-by-Step Guide: Built-in Functions in Hive

9/11/2025

Built-in Functions in Hive

Go Back

Step-by-Step Guide: Built-in Functions in Hive

Apache Hive provides a rich set of built-in functions to perform various operations on data. These functions are categorized for easier understanding and usage.

Hive Built-in Functions are predefined functions readily available within Apache Hive for performing various operations on data stored in Hadoop Distributed File System (HDFS). These functions allow users to process and manipulate data using SQL-like queries without needing to write complex custom code.

 


  Built-in Functions in Hive

Step 1: What are Built-in Functions in Hive?

  • Hive provides a wide range of built-in functions to manipulate and analyze data.

  • These functions simplify operations like string processing, date calculations, mathematical computations, and aggregations.


Step 2: String Functions

Common string functions used to manipulate text data.

SELECT CONCAT('Hello', ' ', 'World');           -- Output: Hello World
SELECT LENGTH('Hive');                         -- Output: 4
SELECT LOWER('HIVE');                          -- Output: hive
SELECT UPPER('hive');                          -- Output: HIVE
SELECT SUBSTR('Hadoop', 2, 4);                 -- Output: adoo
SELECT TRIM('  data  ');                       -- Output: data
FunctionDescriptionExampleOutput
CONCATConcatenate stringsCONCAT('Hello','World')HelloWorld
LENGTHReturns string lengthLENGTH('Hive')4
LOWERConverts to lowercaseLOWER('HIVE')hive
UPPERConverts to uppercaseUPPER('hive')HIVE
SUBSTRExtract substringSUBSTR('Hadoop',2,4)adoo
TRIMRemove leading/trailing spacesTRIM(' data ')data

Step 3: Date Functions

Useful for working with date and time data.

SELECT CURRENT_DATE();                          -- Returns current date
SELECT CURRENT_TIMESTAMP();                     -- Returns current timestamp
SELECT YEAR('2025-09-12');                      -- Output: 2025
SELECT MONTH('2025-09-12');                     -- Output: 9
SELECT DAY('2025-09-12');                       -- Output: 12
SELECT DATEDIFF('2025-09-12','2025-09-01');     -- Output: 11
FunctionDescriptionExampleOutput
CURRENT_DATEReturns current system dateCURRENT_DATE()2025-09-12
CURRENT_TIMESTAMPReturns current system timestampCURRENT_TIMESTAMP()2025-09-12 12:00:00
YEARExtract year from dateYEAR('2025-09-12')2025
MONTHExtract month from dateMONTH('2025-09-12')9
DAYExtract day from dateDAY('2025-09-12')12
DATEDIFFDifference between two datesDATEDIFF('2025-09-12','2025-09-01')11

Step 4: Mathematical Functions

Perform arithmetic and advanced mathematical calculations.

SELECT ROUND(123.456, 2);                       -- Output: 123.46
SELECT CEIL(4.3);                               -- Output: 5
SELECT FLOOR(4.9);                              -- Output: 4
SELECT SQRT(25);                                -- Output: 5
SELECT POWER(2, 3);                             -- Output: 8
SELECT ABS(-15);                                -- Output: 15
FunctionDescriptionExampleOutput
ROUNDRounds number to decimalsROUND(123.456,2)123.46
CEILCeiling valueCEIL(4.3)5
FLOORFloor valueFLOOR(4.9)4
SQRTSquare rootSQRT(25)5
POWERExponentiationPOWER(2,3)8
ABSAbsolute valueABS(-15)15

Step 5: Conditional Functions

Used to apply conditional logic inside queries.

SELECT IF(10 > 5, 'TRUE', 'FALSE');              -- Output: TRUE
SELECT COALESCE(NULL, 'Hive', 'Spark');           -- Output: Hive
SELECT NVL(NULL, 'default');                      -- Output: default
SELECT CASE WHEN salary > 50000 THEN 'High' ELSE 'Low' END FROM employees;
FunctionDescriptionExampleOutput
IFConditional checkIF(10>5,'TRUE','FALSE')TRUE
COALESCEReturns first non-null valueCOALESCE(NULL,'Hive')Hive
NVLReplaces NULL with given valueNVL(NULL,'default')default
CASEMultiple condition checkCASE WHEN salary>50000 THEN 'High' ELSE 'Low' ENDHigh/Low

Step 6: Aggregation Functions

Used to summarize data across rows.

SELECT COUNT(*) FROM employees;                   -- Returns total rows
SELECT SUM(salary) FROM employees;                -- Returns total salary
SELECT AVG(salary) FROM employees;                -- Returns average salary
SELECT MAX(salary) FROM employees;                -- Returns highest salary
SELECT MIN(salary) FROM employees;                -- Returns lowest salary
FunctionDescriptionExampleOutput
COUNTCounts total rowsCOUNT(*)100
SUMSum of valuesSUM(salary)500000
AVGAverage of valuesAVG(salary)5000
MAXMaximum valueMAX(salary)90000
MINMinimum valueMIN(salary)10000

Best Practices

  • Use appropriate data types to avoid type conversion errors.

  • Apply functions on filtered data to improve performance.

  • Use built-in functions in SELECT, WHERE, and GROUP BY clauses for efficiency.

  • Refer to SHOW FUNCTIONS; in Hive to see all available functions.


This tutorial helps you understand and use Hive’s built-in functions effectively to clean, transform, and analyze your data.