In the realm of cloud storage and data management, preserving file attributes during migration is crucial for maintaining data integrity and accessibility. POSIX attributes, which include permissions, ownership, and timestamps, play a vital role in managing file systems, especially in environments that rely on strict access controls and metadata accuracy. When using azcopy or other tools to migrate data directly into Azure Blob Storage, these POSIX attributes are often lost, leading to potential security issues and workflow disruptions. This blog post explores an effective approach to migrating data into Azure Blob Storage via Lustre & HSM, ensuring that these essential attributes are preserved, thereby maintaining the consistency and reliability of your data.
This blog will guide you through manual steps for exporting data to blob storage while retaining specific POSIX attributes. The export process is achieved using the Lustre HSM interface. Before starting, ensure that your Managed Lustre system has HSM enabled and set up in advance. For more information on setting up automatic synchronization to Azure Blob Storage for Azure Managed Lustre, refer to this blog post: Automatic Synchronization to Azure BLOB Storage.
Client machines running Linux can access Azure Managed Lustre directly. Refer to the following article for client prerequisites: Connect client to the file system.
To mount Lustre:
sudo mount -t lustre -o noatime,flock <MGS_IP>@tcp:/lustrefs /<client_path>
Once you have a client that is connected to the file system you can now copy data directly into that file system.
To copy data into Lustre:
rsync -av /mydata /lustredata
Note: When migrating data to Azure Managed Lustre File System (AMLFS), ensure that the total storage used does not exceed the system’s allowed capacity. If the storage exceeds the file system capacity, files need to be archived and released to blob storage as needed before continuing the data migration.
Once the files have been copied into the Lustre File system, now utilize the export job process in order to write those files as well as the POSIX attributes as metadata to the blob storage container. This process includes using the export jobs with archive process.
When you export files from your Azure Managed Lustre system to blob storage there are additional attributes that are saved as metadata inside the blob storage as shown here: Metadata for exported files. The following attributes may be written as metadata to each object in blob storage depending on the type of object:
Parameter |
Description |
modtime |
The last modification time of the file |
owner |
The owner of the file |
group |
The group owner of the file |
permissions |
The existing permissions of the file |
hdi_isfolder |
If object is a folder, this value is set to true. Name corresponds with folder name. |
The metadata will appear in the blob attributes in storage as shown here:
Now that the blob storage contains the attributes for each blob object, including permissions and ownership of each file and directory, this data can be imported into any new Azure Managed Lustre file system while retaining those attributes. Follow these steps in order to import data using import jobs.
Note: This step is only required when setting up a new Azure Managed Lustre File System. It is not required for utilizing the existing AMLFS to which the data was originally copied.
References
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.