Monitor MySQL server performance using performance_schema and Azure Log Analytics

Published 12-20-2020 10:08 PM 1,887 Views
Microsoft

Let's be honest, performance tuning can be stressful. Luckily, MySQL natively provides a system schema called performance_schema to log all the server runtime information. Using performance_schema, you can do a lot of magic work to understand your server slowness. Further, leveraging the power of Azure Log Analytics, you can achieve even more!

 

This blog will illustrate how to automatically upload the output of performance_schema metrics to Azure Log Analytics and then use Log Analytics to generate different types of report for performance monitor/investigating purpose.

 

Azure DB for MySQL offers some metrics in Azure Portal that allows users to check the CPU, Memory,  Connection activity etc. However, at times, this information may not be sufficient enough to understand why CPU utilization is going high. With the help of performance_schema, you will learn more about your MySQL server runtime status. 

 

As many of you may already know, the following command in MySQL will return a long list of server metrics.

 

SHOW GLOBAL STATUS;

 

 However, the challenges with the results of the earlier command is:

  • The value of each metric is an cumulative value since the server ran, which means all of those metric value will keep increasing until a restart is triggered. This makes it hard to determine the change that reflects the server running status;

  • There is no historical value to compare and draw any conclusions.

Therefore, it will be very handy if we can record historical data and data changes, it will allow us to perform data analysis to understand what changed and draw some conclusions. Luckily, Azure Log Analytics can help us achieve it!

 

Here is the idea:

  1. Get an output of global status with change data captured

  2. Get the data frequently

  3. Automatically upload them in Azure Log Analytics workspace.

  4. Use KUSTO query to generate report based on those metrics to fulfill different purposes.

Now let's getting start!

Understand the data to collect for monitoring !!!

As mentioned, the metrics are returned by below SQL statement.

 

SHOW GLOBAL STATUS;

 

Since the returned value is not historical, the idea applied here is

  1. Get the current output and save the data into a table

  2. After some time, get another output and do the subtraction with the data stored in Step #1 based on metric name

  3. After subtraction, update the value in Step #1 as the new benchmark for later use 

The first step is to create a table used to storing the data, just like the sample below. 

 

CREATE DATABASE globalstatus;
USE globalstatus;
/* 
-- metric_name will be matched the VARIABLE_NAME in performance_schema.global_status
-- origin_metric_value will be copied directly from performance_schema.global_status when calling
*/
Create table my_global_status (
metric_name varchar(64) NOT NULL UNIQUE, 
origin_metric_value varchar(1024)
); 

 

The data type I select for each column is the same as the types defined in performance_schema global_status table to ensure consistency (Below is the default definition of the table global_status in performance_schema).

 

Capture.PNG

 

Then we will need insert the current output from global status into the table we created and use it to store historical data. As the output of global status comes from performance_schema, it can simply be achieved by the following SQL statement (Please note, the global status can be queried from information_schema as well. However, information_schema will be deprecated so you may want to consider using performance_schema instead.)

 

-- Copy metric value from performance_schema
INSERT INTO my_global_status (metric_name, origin_metric_value) SELECT * FROM  performance_schema.global_status; 

 

You can check if the insert is successful by querying both information_schema and your new created table to compare.

 

The next step is to extract the values changed. Before doing so, let's evaluate this approach from performance perspective to ensure it doesnt impact the MySQL server performance negatively. We call the global status frequently (I would say per minute), and we will also need to update the historical metric value table at a similar frequency. To minimize the performance impact, it is better to avoid disk IO frequently. So let's decide to calculate the metric value change on the fly and do not store it anywhere in the MySQL server.

 

Below is the SQL query I used to do so. The idea here is to use the currently queried output and subtract it from the stored values on the fly.

 

/*
-- To get the difference between current metric value and previous checked value
1, get the current value from performance_schema
2, get the current stored value in metric_value column
3, if metric_name matched VARIABLE_NAME from performance_schema, do the subtraction
4, insert the substracted value into the column of the table we created above [OPTIONAL]
*/
-- 3 get the subtraction
SELECT m.metric_name, g.VARIABLE_VALUE - m.origin_metric_value AS metric_value FROM 
--  1 get the current value from performance_schema
(SELECT VARIABLE_NAME, VARIABLE_VALUE FROM performance_schema.global_status) AS g, 
-- 2 get the current stored value in metric_value column
(SELECT metric_name, origin_metric_value FROM globalstatus.my_global_status) AS m
WHERE m.metric_name = g.VARIABLE_NAME ;

 

Once the above is done, please don't forget to update the historical table in the same time because it will be used as a benchmark for next call.

 

UPDATE globalstatus.my_global_status m, performance_schema.global_status g SET m.origin_metric_value = g.VARIABLE_VALUE Where m.metric_name = g.VARIABLE_NAME;

 

Get the data changes on the fly and persist the changes into a log file for analysis !!! 

We now have the historical data collected, and we can get the data change, so then we will need to make it running automatically to save the change output.

The idea is 

  1. Query the global_status every minute and get the data change at the same pace (you can define your own preferred frequency) 
  2. Save the output into a log file

Let's first work on how to generate the log file manually once. If this can be done, then an automation work can be down via multiple ways.

 

As explained above, we are trying to avoid disk IOs on your Azure DB for MySQL. Therefore I would suggest to save the output into an log file stored in an VM. The first thought came into my mind is the SQL clause like 

 

SELECT ... INTO OUTFILE ...

 

Unfortunately, as a PaaS , this is not supported because the host machine cannot be accessed to save any output in a PaaS environment. However, mysql.exe utility provides the similar feature that works perfectly smooth with other file management commands in a terminal. 

 

Taking an example of Linux, mysql.exe with -e option will allow users to execute SQL queries and get returns in a terminal, like below.

 

x_xiao2_bnu@xixia-ubt18-orcas:~$ mysql -h servername.mysql.database.azure.com -u user@servername -pXXXXXX -e "SHOW GLOBAL STATUS;"
mysql: [Warning] Using a password on the command line interface can be insecure.
+-----------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name                                 | Value                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+-----------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Aborted_clients                               | 5301                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
| Aborted_connects                              | 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
| Binlog_cache_disk_use                         | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
| Binlog_cache_use                              | 666                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
| Binlog_io_read                                | 24462                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
| Binlog_io_reads                               | 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
| Binlog_io_writes                              | 723                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
| Binlog_io_written                             | 1833438         
...                                              

 

And it also allows users to save the output into a preferred file directory when querying at the same time:

 

x_xiao2_bnu@xixia-ubt18-orcas:~$ mysql -h shawnx-db.mysql.database.azure.com -u x_xiao2_bnu@shawnx-db -pXXXXXXXX -e "SELECT * FROM information_schema.global_status; " >> /home/x_xiao2_bnu/mysql_shawn_status/innodb_status.log

 

Please note that the ">>" used here means that the output will be appended in the log, and if not exists, will create one. In contrast, ">" means that create a new file and overwrite the exists one. Since we will constantly get the output, we should use ">>" here to append each historical metric value in the log. Using the same approach, you can capture the data changes in a log file using the command below:

 

x_xiao2_bnu@xixia-ubt18-orcas:~$ mysql \
  -h shawnx-db.mysql.database.azure.com \
  -u x_xiao2_bnu@shawnx-db \
  -pXXXXXXXXX \
  -e "\
    USE globalstatus;SELECT m.metric_name, g.VARIABLE_VALUE - m.origin_metric_value AS metric_value FROM (SELECT VARIABLE_NAME, VARIABLE_VALUE FROM performance_schema.global_status) AS g, (SELECT metric_name, origin_metric_value FROM globalstatus.my_global_status) AS m WHERE m.metric_name = g.VARIABLE_NAME ; UPDATE globalstatus.my_global_status m, performance_schema.global_status g SET m.origin_metric_value = g.VARIABLE_VALUE Where m.metric_name = g.VARIABLE_NAME;" >> /home/x_xiao2_bnu/mysql_shawn_status/mysql_global_status.log

 

Now the problem is that, though an output of the metric value can be logged, the generated file contains plain metric value change. To make the metric value analyzable, we should add timestamp for each called metrics. 

 

Since the returned output have many rows (378 actually), to avoid confusions with rows appended after each execution, I decided to insert a timestamp at the beginning of each row as a prefix and the timestamp should be the time when fetching the output.  To achieve this, a script to iterate each row will be needed and the sample script could be the one below:

 

while IFS= read -r line; do
    printf '%s %s\n' "$(date +%Y-%m-%dT%H:%M:%S)" "$line";
done

 

To further make this work with the mysql.exe -e, I created the above script as a function and call the function along with mysql SQL execution:

 

querytime() {
  while IFS= read -r line; do
    printf '%s %s\n' "$(date +%Y-%m-%dT%H:%M:%S)" "$line";
  done
}

mysql \
  -h servername.mysql.database.azure.com \
  -u user@servername \
  -pXXXXX\
  -e "\
    USE globalstatus;SELECT m.metric_name, g.VARIABLE_VALUE - m.origin_metric_value AS metric_value FROM (SELECT VARIABLE_NAME, VARIABLE_VALUE FROM performance_schema.global_status) AS g, (SELECT metric_name, origin_metric_value FROM globalstatus.my_global_status) AS m WHERE m.metric_name = g.VARIABLE_NAME ; UPDATE globalstatus.my_global_status m, performance_schema.global_status g SET m.origin_metric_value = g.VARIABLE_VALUE Where m.metric_name = g.VARIABLE_NAME;" | querytime >> /home/x_xiao2_bnu/mysql_shawn_status/mysql_global_status.log

 

Then the output should be looks something like below

Capture.PNG

 

The last step of this part is to make this process automatically repeat every minute. In Linux, the most popular scheduler is crontab. If we can ask crontab to help run this command every minute, we will get the time-lined metric value change in one log file that could be used for later analysis.

 

So here listed what I did:

  • Created a bash file (.sh) with below script:
    #!/bin/bash
    
    querytime() {
      while IFS= read -r line; do
        printf '%s %s\n' "$(date +%Y-%m-%dT%H:%M:%S)" "$line";
      done
    }
    
    mysql \
      -h servername.mysql.database.azure.com \
      -u user@servername \
      -pXXXXXX \
      -e "\
        USE globalstatus;SELECT m.metric_name, g.VARIABLE_VALUE - m.origin_metric_value AS metric_value FROM (SELECT VARIABLE_NAME, VARIABLE_VALUE FROM performance_schema.global_status) AS g, (SELECT metric_name, origin_metric_value FROM globalstatus.my_global_status) AS m WHERE m.metric_name = g.VARIABLE_NAME ; UPDATE globalstatus.my_global_status m, performance_schema.global_status g SET m.origin_metric_value = g.VARIABLE_VALUE Where m.metric_name = g.VARIABLE_NAME;" | querytime >> /home/x_xiao2_bnu/mysql_shawn_status/mysql_global_status.log
    ​

 

  • Made this file runnable in Linux:

    Capture.PNG
  • Schedule in crontab to run this script every minute (this configuration is every easy and there are tons of detailed instructions online)

    Capture.PNG

Please note that, if this script kept running, the log file will constantly grow. To save the storage space, I would suggest to have another process to purge the log by deleting some old data. For example, run a cleaner job every day to delete first 10k rows. There will be one of the many ways to do this but lets not focus it here in this article.

 

Use Azure Log Analytics for faster and easier analytics !!!

In the last part, it is time to constantly upload the log generated from above steps to Azure Log Analytics workspace. Before setting this up, the first step is to link your VM used for data collection where the log stored and your Azure Log Analytics Workspace.

  1. If you don't already have Log Analytics workspace, then create one in Portal.
  2. After creation, navigate to Log Analytics workspace in your Azure Portal and select Virtual Machine from the left side blade menu. The right side should list out your Azure VM. Select the VM where the performance_schema global_status data change log saved. Then a service called OMS agent will be installed on selected VM, so this may take some time.

    Capture.PNG

Once the above is completed, please refer to the document at https://docs.microsoft.com/en-us/azure/azure-monitor/platform/data-sources-custom-logs#step-1-open-t... to configure custom log upload. This will help you walk through the process and your log will automatically uploaded to your Log Analytics workspace.

 

Analyze the data !!!

Till now, the most of the steps are completed and we are now reaching the final step -  analyzing the data. If you have never got chances to play around with Log Analytics, I would recommend you refer to the document Log Analytics tutorial - Azure Monitor, which should be a good start. For our task here, you can first verify if your log started to show in custom log session:

 

Capture.PNG

 

If the answer is yes, then Congratulations. You are now ready to monitor and investigate your Azure MySQL performance. Get your hands dirty with data analysis !


Let's see how our log looks like in Azure Log Analytics:

 

Capture.PNG

As seen above, if we queried the custom log with no filter conditions, it will return all columns but only RawData column is what we need to care about because it is our logged global metrics value. So we will need to clean this data to format it in a more read-friendly format (The query used below can be found in the documentation Overview - Azure Data Explorer | Microsoft Docs )

 

MyGlobalSatus_CL
| extend originalEventTimestamp = todatetime(extract("([0-9-A-Za-z:]+)", 1, RawData, typeof(string)))
| extend metric_name = tostring(split(split(RawData," ",1)[0], "	", 0)[0])
| extend metric_value = toreal(split(split(RawData," ",1)[0], "	", 1)[0])
| project originalEventTimestamp, metric_name, metric_value

 

And the output should be like:
a.PNG

 

Now it looks much better and cleaner. We can play around with this data now.

First let's try to get innodb data IO request and I am sharing below query I used:

 

MyGlobalSatus_CL
| extend originalEventTimestamp = todatetime(extract("([0-9-A-Za-z:]+)", 1, RawData, typeof(string)))
| extend metric_name = tostring(split(split(RawData," ",1)[0], '\t', 0)[0])
|  extend metric_value = toreal(split(split(RawData," ",1)[0], "	", 1)[0])
| where  metric_name in ('Binlog_io_read', 'Binlog_io_written', 'Innodb_data_read', 'Innodb_data_reads', 'Innodb_data_writes', 'Innodb_data_written', 'Innodb_log_writes', 'Innodb_log_written')
| extend TIMESTAMP = bin(originalEventTimestamp, 30s)
| extend Binlog_io_read_tmp = iff(metric_name =~ 'Binlog_io_read', metric_value, 0.0),
         Binlog_io_written_tmp = iff(metric_name =~ 'Binlog_io_written', metric_value, 0.0),
         Innodb_data_read_tmp = iff(metric_name =~ 'Innodb_data_read', metric_value, 0.0),
         Innodb_data_reads_tmp = iff(metric_name =~ 'Innodb_data_reads', metric_value, 0.0),
         Innodb_data_writes_tmp = iff(metric_name =~ 'Innodb_data_writes', metric_value, 0.0),
         Innodb_data_written_tmp = iff(metric_name =~ 'Innodb_data_written', metric_value, 0.0),
         Innodb_log_writes_tmp = iff(metric_name =~ 'Innodb_log_writes', metric_value, 0.0),
         Innodb_log_written_tmp = iff(metric_name =~ 'Innodb_log_written', metric_value, 0.0)
| summarize Binlog_io_read_in_MB = max(Binlog_io_read_tmp)/1024/1024,
            Binlog_io_written_in_MB = max(Binlog_io_written_tmp)/1024/1024,
            Innodb_data_read_in_MB = max(Innodb_data_read_tmp)/1024/1024,
            Innodb_data_reads = max(Innodb_data_reads_tmp),
            Innodb_data_writes = max(Innodb_data_writes_tmp),
            Innodb_data_written_in_MB = max(Innodb_data_written_tmp)/1024/1024,
            Innodb_log_writes = max(Innodb_log_writes_tmp),
            Innodb_log_written_in_MB = max(Innodb_log_written_tmp)/1024/1024
             by TIMESTAMP
| order by TIMESTAMP asc
| render timechart

 

 The output generated a graph like:

b.PNG

At the bottom, you can select or unselect the counters to display only concerned metrics. This will help effectively check how much IO on your InnoDB engine.

 

Similarly, below query can be used to check the number of DDL or DML queries:

 

 

 

q.PNG

 

This will tell you how your workload changed and what types of queries contributed the most load. If you observed a correlation with your CPU usage, then it is caused by some queries.

 

Below is to check logical read for your MySQL. If the value of innodb_buffer_pool_read_requests is high, then it proves that you have full table scan (at least large table scan) or may be missing indexes on some tables, which will consume a lot of CPU.

 

MyGlobalSatus_CL
| extend originalEventTimestamp = todatetime(extract("([0-9-A-Za-z:]+)", 1, RawData, typeof(string)))
| extend metric_name = tolower(tostring(split(split(RawData," ",1)[0], "	", 0)[0]))
| extend metric_value = toreal(split(split(RawData," ",1)[0], "	", 1)[0])
| where metric_name in ('innodb_buffer_pool_read_requests', 'innodb_buffer_pool_reads')
| extend TIMESTAMP = bin(originalEventTimestamp, 1s)
| project TIMESTAMP, originalEventTimestamp, metric_name, metric_value
| extend Innodb_buffer_pool_read_requests_tmp = iff(metric_name =~ 'Innodb_buffer_pool_read_requests', metric_value, 0.0), 
         Innodb_buffer_pool_reads_tmp = iff(metric_name =~ 'Innodb_buffer_pool_reads', metric_value, 0.0)
| summarize Innodb_buffer_pool_read_requests = max(Innodb_buffer_pool_read_requests_tmp), 
            Innodb_buffer_pool_reads = max(Innodb_buffer_pool_reads_tmp)
 by TIMESTAMP
| order by TIMESTAMP asc
| render timechart 

 

Last But Not Least!

Please note that, based on usage and impact of VM, network latency, and MySQL load, there could be some differences when fetching those metrics value and when getting the subtractions. So there may be some scenarios that the value does not reflect the real case correctly. There could be many ways to get those data and put into analysis.

 

I hope the above idea could somewhat inspire you to explore more.

 

Thank you!

Version history
Last update:
‎Dec 20 2020 11:04 PM
Updated by: