Blog Post

Healthcare and Life Sciences Blog
4 MIN READ

Simplify Data Migration to Azure Blob Storage and Ensure Data Integrity with AzCopy

Venkat_Malladi's avatar
Jun 01, 2023

Introduction:  

 

In today's digital landscape, genomic assays are generating vast amounts of data, leading to the need for scalable and reliable storage solutions. Azure Blob Storage, a massively scalable object storage service provided by Microsoft Azure, offers a cost-effective and secure platform for storing and managing unstructured data. When migrating data from current storage solutions to Azure Blob Storage, it is essential to ensure the integrity and accuracy of transferred files. We will explore how to migrate data to Azure Blob Storage using AzCopy and validate checksums to guarantee data integrity. 

 

What is AzCopy? 

 

 AzCopy is a command-line tool provided by Microsoft that enables fast and secure data transfer to and from Azure Storage. It offers a straightforward and efficient way to move large amounts of data, including files, folders, and even entire virtual machines. With AzCopy, you can migrate data between various Azure storage services, including Azure Blob Storage, Azure Files, and Azure Data Lake Storage. 

 

Migrating Data to Azure Blob Storage with AzCopy: 

 

  1. Install AzCopy: Begin by installing AzCopy on your local machine or the server from which you will perform the data migration. You can download the latest version of AzCopy from the official Microsoft website or use the Azure CLI. 
  2. Obtain Storage Account Credentials: To migrate data to Azure Blob Storage, you will need the storage account credentials. Retrieve the account name and account key from the Azure portal or use Azure Key Vault for securely storing and accessing the credentials. 
  3. Prepare the Source Data: Organize your source data in a local folder or a network share, ensuring that the data is ready for migration. AzCopy supports parallel transfers, making it efficient for handling large datasets. 
  4. Execute the AzCopy Command: Open a command prompt or terminal and navigate to the directory where AzCopy is installed. Use the following command syntax to initiate the data migration: 

     

    azcopy copy [source] [destination] --source-key [source_key] --destination-key [destination_key]

     

    Replace [source] and [destination] with the appropriate paths, and [source_key] and [destination_key] with the corresponding storage account keys. 

  5. Monitor the Data Transfer: AzCopy provides real-time progress updates during the data transfer, allowing you to monitor the migration process. You can track the overall progress, transfer rate, and estimated time remaining, ensuring transparency throughout the operation 

Adding Checksums for Data Integrity:  

 

 

 

To ensure data integrity during the migration process, AzCopy allows you to add checksums of transferred files. Checksums are unique values computed from the data content, serving as a digital fingerprint to detect any modifications or corruptions. Follow these steps to enable checksum calculation on file transfer: 

 

6. Include the --put-md5 flag: When executing the AzCopy command, include the --put-md5 flag to enable checksum. This flag instructs AzCopy to create an MD5 hash of each file and save the hash as the Content-MD5 property of the destination blob or file. 

 

Verification of local vs Cloud checksums 

 

The checksum information can be used to validate the data in Azure against the local copy of the data to ensure complete and successful transfer.  

 

 

7. In this example on github we use a simple bash script to query the checksum details in Blob and use this information to perform the same actions against our local files to validate the data matches. 

 

 

 

./file-verification.sh -a storage_account -c “container/files” -f files.txt -o log.txt 

 

 

 

 

Error handling: 

 

If you get the following error reponse 

  

“RESPONSE Status: 403 This request is not authorized to perform this operation using this permission.” 

 

You don’t have the right permissions, and you will need to see grant access to Azure blob and queue data with RBAC in the Azure portalAzure CLI or Azure PowerShell. 

 

 

Access Tiers for cost management: 

 

When uploading you can set a blob’s access tier, using --block-blob-tier <blob-tier>. This is useful to manage cost of storing data in the cloud long term. 

  • Hot tier - An online tier optimized for storing data that is accessed or modified frequently. The hot tier has the highest storage costs, but the lowest access costs. 
  • Cool tier - An online tier optimized for storing data that is infrequently accessed or modified. Data in the cool tier should be stored for a minimum of 30 days. The cool tier has lower storage costs and higher access costs compared to the hot tier. 
  • Cold tier - An online tier optimized for storing data that is infrequently accessed or modified. Data in the cold tier should be stored for a minimum of 90 days. The cold tier has lower storage costs and higher access costs compared to the cool tier. 
  • Archive tier - An offline tier optimized for storing data that is rarely accessed, and that has flexible latency requirements, on the order of hours. Data in the archive tier should be stored for a minimum of 180 days. 

 

 

Conclusion:  

 

Migrating data to Azure Blob Storage is made simpler and more efficient with the help of AzCopy. Its robust capabilities, combined with the ability to validate checksums, ensure that your data remains intact and secure during the transfer process. By adopting AzCopy as your preferred data migration tool, you can leverage the scalability and durability of Azure Blob Storage while maintaining the integrity of your valuable data. 

 

Acknowledgments: 

 

We would like to acknowledge the University of Rochester, their Genomics Center, and the Wilmot Informatics Group for their discussions and inspirations for developing the above example.  

Updated Jun 14, 2023
Version 3.0
No CommentsBe the first to comment