Hive Locks

Published Jun 28 2022 03:59 AM 998 Views

Locks in database can be either Read Lock or Write Lock. Locks are used when concurrent applications tries to access the same table. Locks prevents data from being corrupted or invalidated when multiple users try to reach while others write to database.

Multiple shared(S) locks can be acquired at the same time, whereas exclusive(X) lock blocks all other locks.

 

In the Shared lock there are two types one is Shared_read and Shared_write. Shared_read means anyother shared_read and shared_write query can run at a time. Shared_write lock which means any other shared_read can be performed but no shared_write lock can acquire at that time.

In Exclusive locks no shared_read or shared_write can perform at the same time.

There are three types of lock state:

   (a) Acquired - transaction initiator hold the lock
   (b) Waiting - transaction initiator is waiting for the lock
   (c) Aborted - the lock has timed out but has not yet been cleaned

 

Below are the hive commands and it's lock type:

                  HIVE COMMAND                           LOCK TYPE
select in Table1 shared_read
Insert into table1 shared_read
Insert overwrite on table1 exclusive
Update on table1 shared_write
Alter table rename on table1 exclusive
select a partition P1 on a table1 shared_read lock on Table1 and partition P1
Insert into table2(partition P2) select....Table1 shared lock on partition P1 on T2, T1 and T1.P1 and exclusive lock on T2.P2
Drop table

exclusive

 

Let's look at few examples for shared_read, shared_write and exclusive lock. Let's take two tables sample1 and sample2.

 

1) Initially when no query is running on the table there shouldn't be any lock. This can be verified using show locks <table_name>; command

Now will look at the example for shared_read lock:

 

Example-1:

Open two terminals and run the select * from sample1 command on one terminal and insert query on other terminal. As the query1 acquires a shared_read lock it will allow the second query to execute. Both the insert and select query acquires a shared_read lock.

 

Terminal-1:

 

 

0: jdbc:hive2://zk0-parser.mbnioixhzipermnly3> insert into sample1 values(8,'xxx');
INFO  : Compiling command(queryId=hive_20220601103700_7997a3d8-102c-48c7-a96f-6adcfb01fd94): insert into sample1 values(8,'xxx')
INFO  : Semantic Analysis Completed (retrial = false)
.....
.....
INFO  : Completed executing command(queryId=hive_20220601103700_7997a3d8-102c-48c7-a96f-6adcfb01fd94); Time taken: 16.636 seconds
INFO  : OK
No rows affected (17.318 seconds)

 

 

Terminal-2:

 

 

0: jdbc:hive2://zk0-parser.mbnioixhzipermnly3> select * from sample1;
INFO  : Compiling command(queryId=hive_20220601103704_6104c121-5371-4bd7-bda5-c9fe8252c4bf): select * from sample1
INFO  : Semantic Analysis Completed (retrial = false)
......
......
INFO  : Completed executing command(queryId=hive_20220601103704_6104c121-5371-4bd7-bda5-c9fe8252c4bf); Time taken: 13.62 seconds
INFO  : OK

 

 

To identify what locks has been acquired you can either check the HS2 logs or use the explain locks command.

 

HS2 log:
======
INSERT INTO TABLE:

 

 

2022-06-01T10:37:01,107  INFO [HiveServer2-Background-Pool: Thread-40355] lockmgr.DbTxnManager: Setting lock request transaction to txnid:41 for queryId=hive_20220601103700_7997a3d8-102c-48c7-a96f-6adcfb01fd94
2022-06-01T10:37:01,107  INFO [HiveServer2-Background-Pool: Thread-40355] lockmgr.DbLockManager: Requesting: queryId=hive_20220601103700_7997a3d8-102c-48c7-a96f-6adcfb01fd94 LockRequest(component:[LockComponent(type:SHARED_READ, level:TABLE, dbname:_dummy_database, tablename:_dummy_table, operationType:SELECT, isTransactional:false), LockComponent(type:SHARED_READ, level:TABLE, dbname:default, tablename:sample1, operationType:INSERT, isTransactional:true, isDynamicPartitionWrite:false)], txnid:41, user:hive, hostname:hn0-parser.mbnioixhzipermnly3wmgelmfc.tx.internal.cloudapp.net, agentInfo:hive_20220601103700_7997a3d8-102c-48c7-a96f-6adcfb01fd94)
2022-06-01T10:37:01,140  INFO [HiveServer2-Background-Pool: Thread-40355] lockmgr.DbLockManager: Response to queryId=hive_20220601103700_7997a3d8-102c-48c7-a96f-6adcfb01fd94 LockResponse(lockid:39, state:ACQUIRED)
2022-06-01T10:37:01,154  INFO [HiveServer2-Background-Pool: Thread-40355] ql.Driver: Executing command(queryId=hive_20220601103700_7997a3d8-102c-48c7-a96f-6adcfb01fd94): insert into sample1 values(8,'xxx')
....
....
2022-06-01T10:37:17,791  INFO [HiveServer2-Background-Pool: Thread-40355] lockmgr.DbTxnManager: Stopped heartbeat for query: hive_20220601103700_7997a3d8-102c-48c7-a96f-6adcfb01fd94

 

 

SELECT * FROM TABLE:

 

 

2022-06-01T10:37:05,095  INFO [HiveServer2-Background-Pool: Thread-40365] lockmgr.DbTxnManager: Setting lock request transaction to txnid:42 for queryId=hive_20220601103704_6104c121-5371-4bd7-bda5-c9fe8252c4bf
2022-06-01T10:37:05,096  INFO [HiveServer2-Background-Pool: Thread-40365] lockmgr.DbLockManager: Requesting: queryId=hive_20220601103704_6104c121-5371-4bd7-bda5-c9fe8252c4bf LockRequest(component:[LockComponent(type:SHARED_READ, level:TABLE, dbname:default, tablename:sample1, operationType:SELECT, isTransactional:true)], txnid:42, user:hive, hostname:hn0-parser.mbnioixhzipermnly3wmgelmfc.tx.internal.cloudapp.net, agentInfo:hive_20220601103704_6104c121-5371-4bd7-bda5-c9fe8252c4bf)
2022-06-01T10:37:05,127  INFO [HiveServer2-Background-Pool: Thread-40365] lockmgr.DbLockManager: Response to queryId=hive_20220601103704_6104c121-5371-4bd7-bda5-c9fe8252c4bf LockResponse(lockid:40, state:ACQUIRED)
2022-06-01T10:37:05,136  INFO [HiveServer2-Background-Pool: Thread-40365] ql.Driver: Executing command(queryId=hive_20220601103704_6104c121-5371-4bd7-bda5-c9fe8252c4bf): select * from sample1

 

 

 

Here the insert query lock acquired at 10:37:01 and released at 10:37:17. Also we can see that the select command for the sample ran successfully and acquired lock at 10:37:05 even before releasing the lock by the first query. This is because insert query acquires a shared_read lock.

 

The other way to find the kind of lock for a query is using explain locks command as below:

 

 

0: jdbc:hive2://zk0-parser.mbnioixhzipermnly3> explain locks insert into sample1 values(8,'xxx');
....
....
INFO  : Completed executing command(queryId=hive_20220601105505_4be7b686-bcc4-4be9-ab3a-88153dfe5f69); Time taken: 0.09 seconds
INFO  : OK
+----------------------------------------------+
|                   Explain                    |
+----------------------------------------------+
| LOCK INFORMATION:                            |
| _dummy_database._dummy_table -> SHARED_READ  |
| default.sample1 -> SHARED_READ               |
+----------------------------------------------+
3 rows selected (4.316 seconds)


0: jdbc:hive2://zk0-parser.mbnioixhzipermnly3> explain locks select * from sample1;
....
....
INFO  : OK
+---------------------------------+
|             Explain             |
+---------------------------------+
| LOCK INFORMATION:               |
| default.sample1 -> SHARED_READ  |
+---------------------------------+
2 rows selected (0.615 seconds)

 

 

EXAMPLE 2:

Let's run update query on the table sample1 from two terminal.

 

TERMINAL 1:

 

 

0: jdbc:hive2://zk0-parser.mbnioixhzipermnly3> update sample1 set name='kkk' where id=8;
INFO  : Compiling command(queryId=hive_20220601110457_8d1f61ce-a415-426f-a5ab-d65375d6a0a8): update sample1 set name='kkk' where id=8
......
......
INFO  : Completed executing command(queryId=hive_20220601110457_8d1f61ce-a415-426f-a5ab-d65375d6a0a8); Time taken: 19.651 seconds
INFO  : OK
No rows affected (20.711 seconds)

 

 

TERMINAL 2:

 

 

0: jdbc:hive2://zk0-parser.mbnioixhzipermnly3> update sample1 set name='fff' where id=2;
INFO  : Compiling command(queryId=hive_20220601110501_2d637b0c-0a1f-4071-bf12-8e64d151c33a): update sample1 set name='fff' where id=2
INFO  : Semantic Analysis Completed (retrial = false)
......
......
INFO  : Completed executing command(queryId=hive_20220601110501_2d637b0c-0a1f-4071-bf12-8e64d151c33a); Time taken: 17.564 seconds
INFO  : OK
ERROR : FAILED: Hive Internal Error: org.apache.hadoop.hive.ql.lockmgr.LockException(Transaction manager has aborted the transaction txnid:48.  Reason: Aborting [txnid:48,48] due to a write conflict on default/sample1 committed by [txnid:47,48] u/u)
org.apache.hadoop.hive.ql.lockmgr.LockException: Transaction manager has aborted the transaction txnid:48.  Reason: Aborting [txnid:48,48] due to a write conflict on default/sample1 committed by [txnid:47,48] u/u
....
....
Caused by: TxnAbortedException(message:Aborting [txnid:48,48] due to a write conflict on default/sample1 committed by [txnid:47,48] u/u)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$commit_txn_result$commit_txn_resultStandardScheme.read(ThriftHiveMetastore.java)

 

 

Here we see that the second query got failed because when one query acquires a shared_write lock then no other query with shared_write lock can't run at that time. From the error we can clearly see that it is giving a lock exception and aborting the transaction with txnid 48 due to conflict with the txnid 47.

 

HIVESERVER2 LOG:

=====

FIRST SUBMITTED QUERY:

 

 

2022-06-01T11:04:58,718  INFO [HiveServer2-Background-Pool: Thread-40018] lockmgr.DbLockManager: Requesting: queryId=hive_20220601110457_8d1f61ce-a415-426f-a5ab-d65375d6a0a8 LockRequest(component:[LockComponent(type:SHARED_WRITE, level:TABLE, dbname:default, tablename:sample1, operationType:UPDATE, isTransactional:true, isDynamicPartitionWrite:false)], txnid:47, user:hive, hostname:hn1-parser.mbnioixhzipermnly3wmgelmfc.tx.internal.cloudapp.net, agentInfo:hive_20220601110457_8d1f61ce-a415-426f-a5ab-d65375d6a0a8)
2022-06-01T11:04:58,748  INFO [HiveServer2-Background-Pool: Thread-40018] lockmgr.DbLockManager: Response to queryId=hive_20220601110457_8d1f61ce-a415-426f-a5ab-d65375d6a0a8 LockResponse(lockid:44, state:ACQUIRED)
2022-06-01T11:04:58,758  INFO [HiveServer2-Background-Pool: Thread-40018] ql.Driver: Executing command(queryId=hive_20220601110457_8d1f61ce-a415-426f-a5ab-d65375d6a0a8): update sample1 set name='kkk' where id=8

 

 

SECOND SUBMITTED QUERY:

 

 

2022-06-01T11:05:02,209  INFO [HiveServer2-Background-Pool: Thread-40031] lockmgr.DbTxnManager: Setting lock request transaction to txnid:48 for queryId=hive_20220601110501_2d637b0c-0a1f-4071-bf12-8e64d151c33a
2022-06-01T11:05:02,209  INFO [HiveServer2-Background-Pool: Thread-40031] lockmgr.DbLockManager: Requesting: queryId=hive_20220601110501_2d637b0c-0a1f-4071-bf12-8e64d151c33a LockRequest(component:[LockComponent(type:SHARED_WRITE, level:TABLE, dbname:default, tablename:sample1, operationType:UPDATE, isTransactional:true, isDynamicPartitionWrite:false)], txnid:48, user:hive, hostname:hn1-parser.mbnioixhzipermnly3wmgelmfc.tx.internal.cloudapp.net, agentInfo:hive_20220601110501_2d637b0c-0a1f-4071-bf12-8e64d151c33a)
2022-06-01T11:05:02,245  INFO [HiveServer2-Background-Pool: Thread-40031] lockmgr.DbLockManager: Response to queryId=hive_20220601110501_2d637b0c-0a1f-4071-bf12-8e64d151c33a LockResponse(lockid:45, state:WAITING)
....
....
2022-06-01T11:05:45,664 ERROR [HiveServer2-Background-Pool: Thread-40031] ql.Driver: FAILED: Hive Internal Error: org.apache.hadoop.hive.ql.lockmgr.LockException(Transaction manager has aborted the transaction txnid:48.  Reason: Aborting [txnid:48,48] due to a write conflict on default/sample1 committed by [txnid:47,48] u/u)
org.apache.hadoop.hive.ql.lockmgr.LockException: Transaction manager has aborted the transaction txnid:48.  Reason: Aborting [txnid:48,48] due to a write conflict on default/sample1 committed by [txnid:47,48] u/u

 

 

From the logs we can see that the first submitted query lock state is acquired and for the second query lock state is waiting.

 

Now let's run the update command on one terminal and select command on the other terminal. This time both the query succeeded because other shared_read queries(like select , insert) can be performed but no shared_write lock queries like(another update,insert overwrite) can't be performed.

 

 

0: jdbc:hive2://zk0-parser.mbnioixhzipermnly3> update sample1 set name='sss' where id=3;
INFO  : Compiling command(queryId=hive_20220601112616_6268aa16-9f46-4ceb-8048-3e6ed89eb8f7): update sample1 set name='sss' where id=3
INFO  : Semantic Analysis Completed (retrial = false)
....
....
INFO  : Completed executing command(queryId=hive_20220601112616_6268aa16-9f46-4ceb-8048-3e6ed89eb8f7); Time taken: 17.013 seconds
INFO  : OK
No rows affected (17.842 seconds)

0: jdbc:hive2://zk0-parser.mbnioixhzipermnly3> select * from sample1;
INFO  : Compiling command(queryId=hive_20220601112619_5f1c2033-7fd5-486d-ba88-23a9a39166be): select * from sample1
INFO  : Semantic Analysis Completed (retrial = false)
....
....
INFO  : Completed executing command(queryId=hive_20220601112619_5f1c2033-7fd5-486d-ba88-23a9a39166be); Time taken: 13.228 seconds
INFO  : OK

 

 

Example -3:

Now let's have a look at the exclusive lock:

 

 

0: jdbc:hive2://zk0-parser.mbnioixhzipermnly3> alter table test_sample add columns (xxx string);
INFO  : Compiling command(queryId=hive_20220606092610_29f3e65a-ff48-4a5b-a8bc-0e575fcc1815): alter table test_sample add columns (xxx string)
.....
.....
INFO  : Completed executing command(queryId=hive_20220606092610_29f3e65a-ff48-4a5b-a8bc-0e575fcc1815); Time taken: 0.541 seconds
INFO  : OK
No rows affected (1.214 seconds)

 

 

 

HIVESERVER2 LOGS:

 

 

2022-06-06T09:26:11,042  INFO [HiveServer2-Background-Pool: Thread-32233] lockmgr.DbTxnManager: Setting lock request transaction to txnid:89 for queryId=hive_20220606092610_29f3e65a-ff48-4a5b-a8bc-0e575fcc1815
2022-06-06T09:26:11,042  INFO [HiveServer2-Background-Pool: Thread-32233] lockmgr.DbLockManager: Requesting: queryId=hive_20220606092610_29f3e65a-ff48-4a5b-a8bc-0e575fcc1815 LockRequest(component:[LockComponent(type:EXCLUSIVE, level:TABLE, dbname:default, tablename:test_sample, operationType:NO_TXN, isTransactional:true, isDynamicPartitionWrite:false)], txnid:89, user:hive, hostname:hn0-parser.mbnioixhzipermnly3wmgelmfc.tx.internal.cloudapp.net, agentInfo:hive_20220606092610_29f3e65a-ff48-4a5b-a8bc-0e575fcc1815)

 

 

 

Common steps to check when dealing with ACID lock issues:

Lock issues can be divided into 2 types:

1) Queries unable to acquire locks
2) Queries not releasing locks upon completion/termination

 

You can check the below points to troubleshoot the issue:

1) From the 'Explain Locks' output, check what all locks are needed.
2) From the 'show locks' output, check if there are already existing locks on the table/partition being held by some other query. If so, check if the other query is still running and is a valid query.
We will need to capture a 'show locks' output in order to see what are the current locks being acquired by other queries. The 'blocked by' column in the output will also help us determine if a particular query is unable to acquire locks because of some other query already holding a lock on that particular table/partition.

 

Below is a sample output of 'show locks' command:

 

 

| Lock ID | Database | Table | Partition | State | Blocked By | Type | Transaction ID | Last Heartbeat | Acquired At | User | Hostname | Agent Info | 
| 1518.1 | xxxx | xxxx_xxxxxx | NULL | ACQUIRED | | SHARED_READ | NULL | 1562222941000 | 1562222941000 | xxxxxxxx | xxxxxxxx.cloudera.com | xxxxxxxxxxxxxx_f951bb5b-b19b-47ba-a147-a2a3fbce5ed3 

 

 

 

Additionally, we can also check the 'Acquired At' and 'Last Heartbeat' columns to identify old/stale locks. If it is old then it is a stale lock and we can clean up manually.

 

3) To get the list of locks any Hive query will acquire, we can run the 'Explain Locks' command as mentioned earlier.

4) Also check the current/default value of 'hive.txn.timeout' and 'hive.timedout.txn.reaper.interval' will need to be tuned to allow for sufficient time to acquire all the locks. Locks are acquired sequentially for one table or partition at a time. So depending on the number of tables/partitions, the time taken to acquire locks will increase. By default hive.txn.timeout is set to 300s (5 mins) and hive.timedout.txn.reaper.interval is set to 180s (3 mins).

5) We will need to first identify the progress of the query and whether it completed successfully or not. This can be verified from /var/log/hive/hiveserver2.log OR hiveserver2Interactive.log (LLAP) OR hive.log (Hive CLI).Typically a query does not release lock when it does not complete successfully and terminates abruptly. Reviewing the above mentioned logs will help in isolating this.

6) Additionally if there is an error/issue communicating with backend DB while trying to release locks, or while acquiring locks then check logs of backend DB as well. Also check whether the backend DB is configured correctly to handle the load.

7) For stale locks, check the HIVE_LOCKS table and identify if there are old/stale locks existing. To clear old/stale locks, follow the below steps:

FROM HIVE:

 

 

hive>show locks;

 

 

FROM MYSQL:

 

 

mysql> select hl_lock_ext_id from HIVE_LOCKS where HL_TABLE=’prcs_task’;

 

 

Then delete the locks:

 

 

delete from hive_locks where hl_lock_ext_id = 125542 
delete from hive_locks where hl_lock_ext_id = 12739

 

 

NOTE:

Kindly make sure whether you are deleting the stale lock.

 

REFERENCE LINK:

https://cwiki.apache.org/confluence/display/Hive/Locking 

 

 

 

Co-Authors
Version history
Last update:
‎Jun 28 2022 04:25 AM
Updated by: