ACID Transactions in Hive: A Complete Guide

Introduction

Apache Hive has been widely used for batch processing and analytical queries on large datasets stored in Hadoop. However, traditional Hive lacked support for row-level operations such as INSERT, UPDATE, and DELETE, which are critical for transactional workloads. To overcome this limitation, Hive introduced ACID (Atomicity, Consistency, Isolation, Durability) transactions, enabling it to handle both analytical and transactional use cases effectively.

With ACID support, Hive ensures:

Atomicity → Transactions are processed completely or not at all.
Consistency → Data transitions from one valid state to another.
Isolation → Multiple transactions execute independently without conflicts.
Durability → Once committed, data remains safe even after failures.

In this tutorial, we will explore ACID transactions in Hive, their configuration, usage examples, and best practices.

Diagram explaining ACID transactions in Hive

1. What are ACID Transactions in Hive?

ACID transactions allow Hive tables to support row-level operations, making Hive suitable for OLTP (Online Transactional Processing) alongside OLAP (Online Analytical Processing).

This means you can now:

Insert new records
Update existing records
Delete unwanted records
Merge datasets efficiently

ACID support is available only for transactional tables in Hive.

2. Enabling ACID Transactions in Hive

By default, ACID transactions are disabled in Hive. To enable them, update the following configuration properties in hive-site.xml or via Hive CLI:

SET hive.support.concurrency = true;
SET hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.compactor.initiator.on = true;
SET hive.compactor.worker.threads = 1;

Key Points:

hive.support.concurrency → Enables concurrent transactions.
hive.txn.manager → Ensures Hive uses the transactional manager.
hive.compactor.* → Manages compaction for transactional tables.

3. Creating Transactional Tables in Hive

To use ACID transactions, tables must be created as transactional:

CREATE TABLE employees (
    emp_id INT,
    emp_name STRING,
    emp_salary DOUBLE
) CLUSTERED BY (emp_id) INTO 4 BUCKETS
STORED AS ORC
TBLPROPERTIES ('transactional' = 'true');

Requirements:

Tables must use ORC format.
Tables must be bucketed.
'transactional' = 'true' property must be set.

4. Examples of ACID Operations

Insert Operation

INSERT INTO employees VALUES (101, 'Alice', 75000);

Update Operation

UPDATE employees SET emp_salary = 80000 WHERE emp_id = 101;

Delete Operation

DELETE FROM employees WHERE emp_id = 101;

Merge Operation (Hive 2.2+)

MERGE INTO employees e
USING new_data n
ON e.emp_id = n.emp_id
WHEN MATCHED THEN UPDATE SET e.emp_salary = n.emp_salary
WHEN NOT MATCHED THEN INSERT VALUES (n.emp_id, n.emp_name, n.emp_salary);

5. Compaction in Hive

ACID operations generate multiple delta files, which can affect performance. Hive uses compaction to merge small files into bigger ones:

Minor Compaction → Merges delta files.
Major Compaction → Merges base and delta files.

Run compaction manually if needed:

ALTER TABLE employees COMPACT 'major';

6. Best Practices for Using ACID in Hive

Always use ORC format with bucketing for transactional tables.
Schedule regular compactions to improve query performance.
Avoid too many small transactions → batch inserts are better.
Use MERGE for efficient upserts instead of multiple updates.
Enable vectorized execution for faster performance with ORC.

Conclusion

ACID transactions in Hive bring strong transactional capabilities to big data environments. By enabling insert, update, delete, and merge operations, Hive can now handle both analytical (OLAP) and transactional (OLTP) workloads efficiently. Proper configuration and best practices ensure high performance and data reliability in large-scale production environments.

Table of content

Introduction to Apache Hive
- Hive Introduction
Hive Architecture and Components
Hive Modes
- Local Mode
- Distributed Mode
Installation and Setup
Working with Hive Tables
HiveQL Basics
Advanced Hive Concepts
File Formats in Hive
Hive Functions
Integrating Hive with Other Tools
Hive Interview Questions
- Hive Questions
Best Practices in Hive
FAQs and Common Errors in Hive
Frequently Asked Questions

Resources and References