Introduction:
In today's digital landscape, genomic assays are generating vast amounts of data, leading to the need for scalable and reliable storage solutions. Azure Blob Storage, a massively scalable object storage service provided by Microsoft Azure, offers a cost-effective and secure platform for storing and managing unstructured data. When migrating data from current storage solutions to Azure Blob Storage, it is essential to ensure the integrity and accuracy of transferred files. We will explore how to migrate data to Azure Blob Storage using AzCopy and validate checksums to guarantee data integrity.
What is AzCopy?
AzCopy is a command-line tool provided by Microsoft that enables fast and secure data transfer to and from Azure Storage. It offers a straightforward and efficient way to move large amounts of data, including files, folders, and even entire virtual machines. With AzCopy, you can migrate data between various Azure storage services, including Azure Blob Storage, Azure Files, and Azure Data Lake Storage.
Migrating Data to Azure Blob Storage with AzCopy:
azcopy copy [source] [destination] --source-key [source_key] --destination-key [destination_key]
Replace [source] and [destination] with the appropriate paths, and [source_key] and [destination_key] with the corresponding storage account keys.
Adding Checksums for Data Integrity:
To ensure data integrity during the migration process, AzCopy allows you to add checksums of transferred files. Checksums are unique values computed from the data content, serving as a digital fingerprint to detect any modifications or corruptions. Follow these steps to enable checksum calculation on file transfer:
6. Include the --put-md5 flag: When executing the AzCopy command, include the --put-md5 flag to enable checksum. This flag instructs AzCopy to create an MD5 hash of each file and save the hash as the Content-MD5 property of the destination blob or file.
Verification of local vs Cloud checksums
The checksum information can be used to validate the data in Azure against the local copy of the data to ensure complete and successful transfer.
7. In this example on github we use a simple bash script to query the checksum details in Blob and use this information to perform the same actions against our local files to validate the data matches.
./file-verification.sh -a storage_account -c “container/files” -f files.txt -o log.txt
Error handling:
If you get the following error reponse
“RESPONSE Status: 403 This request is not authorized to perform this operation using this permission.”
You don’t have the right permissions, and you will need to see grant access to Azure blob and queue data with RBAC in the Azure portal, Azure CLI or Azure PowerShell.
Access Tiers for cost management:
When uploading you can set a blob’s access tier, using --block-blob-tier <blob-tier>. This is useful to manage cost of storing data in the cloud long term.
Conclusion:
Migrating data to Azure Blob Storage is made simpler and more efficient with the help of AzCopy. Its robust capabilities, combined with the ability to validate checksums, ensure that your data remains intact and secure during the transfer process. By adopting AzCopy as your preferred data migration tool, you can leverage the scalability and durability of Azure Blob Storage while maintaining the integrity of your valuable data.
Acknowledgments:
We would like to acknowledge the University of Rochester, their Genomics Center, and the Wilmot Informatics Group for their discussions and inspirations for developing the above example.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.