Keep your MySQL workloads running smoothly and efficiently by monitoring and managing the history list length (HLL) metric.
Efficient database management often requires digging deeply into metrics that reflect the internal state of the database server. While Azure Database for MySQL is already tuned for the best performance out of the box, workloads can vary significantly. One common performance issue arises from open transactions, those that stay active unintentionally because of bugs, errors in application logic, or a lack of proper transaction handling. Open transactions can hold undo logs longer than necessary, degrading a server’s performance over time.
With MySQL, understanding and managing these open transactions is critical. One way to gauge their impact is by viewing the history list length (HLL), which tracks the number of undo log records that are used to store earlier versions of rows in the database. This blog explores the significance of the HLL metric, its implications for performance, and how to use it for troubleshooting with Azure Database for MySQL.
HLL metric in Azure Database for MySQL
Monitoring the MySQL history list length metric is crucial because an excessively increasing HLL value can degrade database performance by:
- Increasing the overhead for undo log management, which leads to slower query processing.
- Consuming more storage in the InnoDB system tablespace.
You can get insights into the state of the undo log system view by checking the HLL using the SHOW ENGINE INNODB STATUS command in MySQL. To make monitoring easier, Azure Database for MySQL exposes this value using a dedicated metric, the MySQL HLL metric.
This metric is particularly important in the Repeatable Read isolation level, as it relies on the Multi-Version Concurrency Control (MVCC) to maintain consistent snapshots, which causes undo logs to persist for longer periods if transactions are not committed promptly. InnoDB cannot purge undo records (transaction history) from other transactions until the “hung” or “open” transactions are completed. For more information, see Transaction isolation levels in the MySQL documentation.
Troubleshooting with HLL and process lists
When the MySQL HLL metric increases beyond normal values, indicating potential performance issues, you can use Information_schema, Processlist, or Performance_schema to analyse further and identify:
- Long-running transactions causing history list length to grow.
- Blocking threads that may stall operations.
Identifying long running and active transactions
To view all currently running transactions, execute the following query on the INNODB_TRX table:
mysql> SELECT * FROM information_schema.innodb_trx\G
To list threads with open transactions, run the following query:
mysql> SELECT t.trx_id AS Transaction_ID, t.trx_state AS Transaction_State, t.trx_started AS Transaction_Start_Time, p.id AS Thread_ID, p.user AS User, p.host AS Host, p.time AS Thread_Runtime, p.info AS Query FROM information_schema.innodb_trx t JOIN information_schema.processlist p ON t.trx_mysql_thread_id=p.id WHERE t.trx_state='RUNNING' ORDER BY t.trx_started ASC;
The query results provide a list of all currently running transactions and their associated details, including the transaction ID, state, start time, and the thread executing the transaction.
Detecting blocking and blocked threads
To identify transactions waiting for locks and those causing the blockage, use the following queries based on the mentioned version.
MySQL version 8.0+:
mysql> SELECT wt.trx_id AS Waiting_Transaction_ID, wt.trx_mysql_thread_id AS Waiting_Thread_ID, wt.trx_query AS Waiting_Query, bt.trx_id AS Blocking_Transaction_ID, bt.trx_mysql_thread_id AS Blocking_Thread_ID, bt.trx_query AS Blocking_Query FROM performance_schema.data_lock_waits lw INNER JOIN information_schema.innodb_trx bt ON bt.trx_id = lw.blocking_engine_transaction_id INNER JOIN information_schema.innodb_trx wt ON wt.trx_id = lw.requesting_engine_transaction_id;
MySQL version 5.7:
mysql> SELECT req_trx.trx_id AS Waiting_Transaction_ID, req_trx.trx_mysql_thread_id AS Waiting_Thread_ID, req_trx.trx_query AS Waiting_Query, blk_trx.trx_id AS Blocking_Transaction_ID, blk_trx.trx_mysql_thread_id AS Blocking_Thread_ID, blk_trx.trx_query AS Blocking_Query FROM information_schema.innodb_lock_waits lw INNER JOIN information_schema.innodb_trx blk_trx ON blk_trx.trx_id = lw.blocking_trx_id INNER JOIN information_schema.innodb_trx req_trx ON req_trx.trx_id = lw.requesting_trx_id;
Mitigating blocked transactions
- Kill problematic queries: Identify and terminate problematic threads from above queries.
- Use CALL mysql.az_kill_query(<thread_id>); on your Azure MySQL server to terminate offending queries. Ensure this is a last resort after confirming the issue.
- Optimize transactions:
- Commit transactions promptly.
- Break large transactions into smaller units to avoid long-lived row locks.
Monitoring and setting alerts on the HLL metric
To ensure that your database stays performant, effectively monitor and manage the HLL metric by:
- Monitoring the HLL metric regularly
- Regularly track the MySQL HLL metric in the Azure portal under Metrics.
- Compare current values with historical trends to identify workload anomalies or sustained growth in the metric.
- Unusual spikes often indicate long-running transactions or frequent updates/deletes not being optimized.
- Setting alerts for proactive management
- Use Azure Monitor to set up alerts for this metric.
- Define alert thresholds tailored to your workload. For example, trigger an alert if the metric exceeds a baseline that could indicate potential slowness, depending on your workload.
- Alerts help notify your team of issues before they affect end users, allowing you to take immediate action. For more information, see Configure Metrics Alerts - Azure Database for MySQL - Flexible Server | Microsoft Learn
It’s essential to adjust your monitoring and alert thresholds based on the specific workload patterns of your MySQL instance. For OLTP workloads, where transactions are frequent and can be long-running, HLL is more likely to grow rapidly. In these environments, proactive monitoring and timely transaction management are critical to prevent performance bottlenecks. OLAP workloads, on the other hand, tend to generate fewer updates, but they can still experience significant growth in HLL due to large analytical queries or batch jobs. Be sure to monitor the metric for any unexpected spikes during these operations and adjust alerting thresholds accordingly.
Conclusion
Effectively understanding the history list length metric is crucial for maintaining optimal MySQL server performance, especially in high-transaction environments. In Azure Database for MySQL, monitoring the HLL metric is even easier with integrated monitoring and alerting capabilities that provide real-time insights. By using the queries to identify long-running transactions and detect blocking threads, you can proactively diagnose and resolve potential performance bottlenecks before they impact your workloads. Regularly reviewing and monitoring the HLL metric, combined with setting up customized alerts, ensures that your server remains healthy and efficient.
If you have any queries or suggestions, please let us know by leaving a comment below or by contacting directly us at AskAzureDBforMySQL@service.microsoft.com. Thank you!