ACID Transactions in Hive: A Complete Guide

8/25/2025

Diagram explaining ACID transactions in Hive

Go Back

ACID Transactions in Hive: A Complete Guide


Introduction

Apache Hive has been widely used for batch processing and analytical queries on large datasets stored in Hadoop. However, traditional Hive lacked support for row-level operations such as INSERT, UPDATE, and DELETE, which are critical for transactional workloads. To overcome this limitation, Hive introduced ACID (Atomicity, Consistency, Isolation, Durability) transactions, enabling it to handle both analytical and transactional use cases effectively.

With ACID support, Hive ensures:

  • Atomicity → Transactions are processed completely or not at all.

  • Consistency → Data transitions from one valid state to another.

  • Isolation → Multiple transactions execute independently without conflicts.

  • Durability → Once committed, data remains safe even after failures.

In this tutorial, we will explore ACID transactions in Hive, their configuration, usage examples, and best practices.


Diagram explaining ACID transactions in Hive

1. What are ACID Transactions in Hive?

ACID transactions allow Hive tables to support row-level operations, making Hive suitable for OLTP (Online Transactional Processing) alongside OLAP (Online Analytical Processing).

This means you can now:

  • Insert new records

  • Update existing records

  • Delete unwanted records

  • Merge datasets efficiently

ACID support is available only for transactional tables in Hive.


2. Enabling ACID Transactions in Hive

By default, ACID transactions are disabled in Hive. To enable them, update the following configuration properties in hive-site.xml or via Hive CLI:

SET hive.support.concurrency = true;
SET hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.compactor.initiator.on = true;
SET hive.compactor.worker.threads = 1;

Key Points:

  • hive.support.concurrency → Enables concurrent transactions.

  • hive.txn.manager → Ensures Hive uses the transactional manager.

  • hive.compactor.* → Manages compaction for transactional tables.


3. Creating Transactional Tables in Hive

To use ACID transactions, tables must be created as transactional:

CREATE TABLE employees (
    emp_id INT,
    emp_name STRING,
    emp_salary DOUBLE
) CLUSTERED BY (emp_id) INTO 4 BUCKETS
STORED AS ORC
TBLPROPERTIES ('transactional' = 'true');

 Requirements:

  • Tables must use ORC format.

  • Tables must be bucketed.

  • 'transactional' = 'true' property must be set.


4. Examples of ACID Operations

Insert Operation

INSERT INTO employees VALUES (101, 'Alice', 75000);

Update Operation

UPDATE employees SET emp_salary = 80000 WHERE emp_id = 101;

Delete Operation

DELETE FROM employees WHERE emp_id = 101;

Merge Operation (Hive 2.2+)

MERGE INTO employees e
USING new_data n
ON e.emp_id = n.emp_id
WHEN MATCHED THEN UPDATE SET e.emp_salary = n.emp_salary
WHEN NOT MATCHED THEN INSERT VALUES (n.emp_id, n.emp_name, n.emp_salary);

5. Compaction in Hive

ACID operations generate multiple delta files, which can affect performance. Hive uses compaction to merge small files into bigger ones:

  • Minor Compaction → Merges delta files.

  • Major Compaction → Merges base and delta files.

Run compaction manually if needed:

ALTER TABLE employees COMPACT 'major';

6. Best Practices for Using ACID in Hive

  • Always use ORC format with bucketing for transactional tables.

  • Schedule regular compactions to improve query performance.

  • Avoid too many small transactions → batch inserts are better.

  • Use MERGE for efficient upserts instead of multiple updates.

  • Enable vectorized execution for faster performance with ORC.


Conclusion

ACID transactions in Hive bring strong transactional capabilities to big data environments. By enabling insert, update, delete, and merge operations, Hive can now handle both analytical (OLAP) and transactional (OLTP) workloads efficiently. Proper configuration and best practices ensure high performance and data reliability in large-scale production environments.

Table of content