Efficiently Generating and Loading 1 Billion Rows into a Relational Database Content in just an hour
Published Mar 29 2023 11:31 PM 2,665 Views
Microsoft

The challenge:

  • To generate and load 1 billion rows of data into a relational database on Azure in a significantly shorter time frame than the previous 72 hours.

The problem:

  • The initial parallel threads were competing for IO and CPU resources, causing internal locking and slowing down the process even further.

Previous solution (scaling issue):

  • The application generated batches of 1000 rows.
  • To load 1B rows – the application should insert 1M batches!
  • Parallel threads generated locking and even deadlocks.

The solution:

  • Generate 30 million rows in each of 36 files with different first letter/digit in parallel with small modifications of the original application that generate the rows according to complex business logic.
  • Define 36 identical tables with a constraint on the custom GUID key according to the first letter.
  • Load the files in parallel to the relevant table according to the first letter and constraint and increase database resources if necessary.
  • Define one partitioned table according to the first letter with the same key.
  • Switch in the 36 tables (meta-data operation) into the partitioned table.

Result:

  • The solution enabled the generation and loading of 1 billion rows of data into a relational database within an hour, significantly improving the previous time frame of 72 hours.

In conclusion, the solution involves parallelizing the data generation process, partitioning the data by the first letter or digit of a custom GUID key, and optimizing the loading process by using parallel threads and increasing database resources if necessary. This approach can significantly reduce the time it takes to generate and load a large volume of data into a relational database and can be applied to all relational databases that support partitioning.

Co-Authors
Version history
Last update:
‎Mar 29 2023 11:30 PM
Updated by: