Improving query response times on TEXT fields in Azure Database for MySQL
Published Mar 10 2021 11:48 AM 3,745 Views
Microsoft

With MySQL 5.7 and lower versions, you may have noticed that response times increase when querying TEXT columns in Azure Database for MySQL. In this blog post, we’ll take a deeper look at this behavior and some of the best practices to accommodate these fields.

 

While processing statements, MySQL server creates internal temporary tables that are stored in memory rather than on disk, so they are faster to query. These temporary tables store the intermediate results of a query or an aggregation of tables that include UPDATES or UNION statements.

 

Note: For more information about scenarios in which the MySQL engine creates temporary tables, see Internal Temporary Table Use in MySQL.

 

Users can also create temporary tables to gather results from complex searches that involve multiple queries, or to serve as staging tables.

 

However, MySQL memory tables don’t support BLOB or TEXT datatypes. If temporary tables have instances of TEXT column in the result of a query being processed, they get stored as a temporary table on disk rather than in memory. Every disk access is an expensive operation and incurs a performance penalty.

 

Azure Database for MySQL uses remote storage to provide the benefits of data-redundancy, high-availability, and reliability. The MySQL physical data and log files are stored on Azure Storage, independently from the database server.

 

Follow the steps below to determine if your queries are resulting in temporary tables, and if so, whether those tables are being created on disk or in memory.

 

Is my query resulting in a temporary table?

 

To determine whether a query is resulting a temporary table, run an Explain plan on the query:

 

EXPLAIN <query>;
EXPLAIN select name, count(*) from new_table group by name;

 

If the Explain plan includes an Extra column displaying the text Using temporary, then the query is using a temporary table to compute the result.

 

Picture1a.png

 

Is my query resulting in a disk table?

 

Let’s take a look at how to determine whether a temporary table is being created on disk or in memory. Consider a table called new_table created using the following schema and add some data:

 

CREATE TABLE `new_table` (
`id` int(10) unsigned NOT NULL,
  `name` text(500) NOT NULL,
  `address` text(500) NOT NULL
)

 

To obtain a table count of temporary tables, I’ll run the following query:

 

SHOW SESSION STATUS LIKE 'created_tmp%tables';
 

Picture2.png

 

Note that there are currently 45 disk and 64 in-memory temporary tables.

 

Next, I’ll run the query I want to assess. In this case, the query performs a full-table scan on new_table and aggregates the number of times a particular name is present, where name is a column of TEXT datatype.

 

select name, count(*) from new_table group by name;
 

Picture3.png

 

Now, I’ll rerun the query I used initially:

 

SHOW SESSION STATUS LIKE 'created_tmp%tables';
 

Picture4.png

 

After rerunning the query, you’ll see that there disk temporary table count has increased to 46, so the query on new_table resulted in the creation of a disk table.

 

The best solution to improve query response times in such cases is to avoid using TEXT types unless you really need them. If you can’t avoid it, here are some remedial measures that can help reduce disk reads and improve performance.

 

MySQL version upgrade

 

MySQL 8.0 has an additional storage engine, named TempTable, which is the default engine for handling in-memory temporary tables. As of MySQL 8.0.13, the TempTable storage engine supports columns with BLOB-like storage – TEXT, BLOB, JSON or geometry types.

 

Let’s create a table with a similar schema as earlier and then run the tests to confirm the in-memory operation. Let’s look at the count of temporary tables before running the query.

 

SHOW SESSION STATUS LIKE 'created_tmp%tables';
 

Picture5.png

 

Notice that there are currently 0 disk tables and 44 memory tables.

 

Now let’s run the query aggregating a TEXT column.

 

select name, count(*) from new_table group by name;
 

Picture6.png

 

Let’s again look at the count of temporary tables.

 

SHOW SESSION STATUS LIKE 'created_tmp%tables';
 

Picture7.png

 

As you can see, the memory table count increased by 1, but no disk tables were created. Eliminating disk access should greatly improve response times.

 

Migrating to 8.0.13 can remove the disk-access penalty with its support of in memory operations. Before migrating to version 8, though, be sure to evaluate the other effects of migrating via a detailed cost-benefit analysis. Azure Database for MySQL supports minor versions > 8.0.13 on both Single Server and Flexible Server. You can find the latest supported versions on this page.

 

Use of VARCHAR

 

CHAR and VARCHAR are the most common datatypes for string handling. The CHAR datatype supports up to 255 characters, while the maximum length supported by VARCHAR depends on the total row length, which includes all the columns in the table. A maximum row-size of 65,535 bytes is supported.

 

Unless most of your columns contain long strings, it will usually suffice to use VARCHAR instead of TEXT. VARCHAR data can get processed and temporary tables will be created in-memory, which should bring a significant improvement in query response times.

 

An easy way to determine whether all columns will fit within the row length is to convert all the TEXT columns to VARCHAR. Let’s reuse the schema from the “new_table” to create a table called new_table1, but this time we’ll use VARCHAR instead of TEXT. I’ve increased the width of name field to 50,000 characters and the address to 20,000 characters to analyzed the limits on the row-size.

 

CREATE TABLE `new_table1` (
  `id` int(10) unsigned,
  `name` varchar(50000),
  `address` varchar(20000)
) ENGINE=InnoDB CHARACTER SET latin1;
 

Picture8a.png

Picture9a.png

 

The query results show that the row size too large as it exceeds 65,535 characters.

 

If you encounter the above error, reassess each column length or look to convert some of the smaller TEXT fields to substrings, as discussed later in the article. To convert the fields from TEXT to VARCHAR, alter the datatype using the ALTER TABLE command, as shown below:

 

ALTER TABLE new_table MODIFY name VARCHAR(200);
 

Picture10.png

 

Using SUBSTRINGS

 

If changing the table structure from TEXT to VARCHAR is not a possible solution, you can also try the following method of using SUBSTR (column, position, length) to convert the values to character strings. This will enable the creation of in-memory temporary tables, reducing the performance overhead.

 

Ensure that the result set of the substring operation fits within the max_heap_table_size. Otherwise, the engine will push the result set to disk, negating the performance benefit.

 

Let’s verify the behavior of using substrings. As earlier, let’s obtain the count of temporary tables.

 

SHOW SESSION STATUS LIKE 'created_tmp%tables';
 

Picture11.png

 

There are 38 disk and 58 memory tables. I’ll run the same query that aggregates the number of times a particular name is available in the table where name is a TEXT field.

 

select name, count(*) from new_table group by name;
 

Picture12.png

 

Now that I’ve received the result of the query, I’ll re-run the temporary table count to look more closely at the newly created tables.

 

SHOW SESSION STATUS LIKE 'created_tmp%tables';
 

Picture13.png

 

Running this led to the creation of one disk table.

 

Next, I’ll run the query extracting a character string of the TEXT field “name”.

 

SELECT SUBSTR(name, 1, 500), COUNT(*) FROM new_table GROUP BY SUBSTR(name, 1, 500);
 

Picture14.png

 

Running the query provides the same result set as it did earlier.

 

Let’s check the temporary table counts again.

 

SHOW SESSION STATUS LIKE 'created_tmp%tables';
 

Picture15.png

 

The query computation did not create any disk table. As you’ll notice from the above queries, when I used substrings, the query was processes in-memory and didn’t require a DDL operation.

 

Conclusion

 

We hope you now have a better understanding of why temporary tables are created, their impact of query performance, and some work arounds that you can try to minimize the impact. If you have any feedback or questions, please leave a comment below or email us at AskAzureDBforMySQL@service.microsoft.com.

 

Thank you!

 
 
Co-Authors
Version history
Last update:
‎Mar 10 2021 11:13 AM
Updated by: