Migrate data to Azure Managed Lustre retaining POSIX attributes
Published May 30 2024 06:36 PM 1,712 Views
Microsoft

Introduction

In the realm of cloud storage and data management, preserving file attributes during migration is crucial for maintaining data integrity and accessibility. POSIX attributes, which include permissions, ownership, and timestamps, play a vital role in managing file systems, especially in environments that rely on strict access controls and metadata accuracy. When using azcopy or other tools to migrate data directly into Azure Blob Storage, these POSIX attributes are often lost, leading to potential security issues and workflow disruptions. This blog post explores an effective approach to migrating data into Azure Blob Storage via Lustre & HSM, ensuring that these essential attributes are preserved, thereby maintaining the consistency and reliability of your data.

 

Exporting Data to Blob Storage While Retaining POSIX Attributes

This blog will guide you through manual steps for exporting data to blob storage while retaining specific POSIX attributes. The export process is achieved using the Lustre HSM interface. Before starting, ensure that your Managed Lustre system has HSM enabled and set up in advance. For more information on setting up automatic synchronization to Azure Blob Storage for Azure Managed Lustre, refer to this blog post: Automatic Synchronization to Azure BLOB Storage.

 

Connect client to the Lustre file system

Client machines running Linux can access Azure Managed Lustre directly. Refer to the following article for client prerequisites: Connect client to the file system.

 

 

To mount Lustre:

sudo mount -t lustre -o noatime,flock <MGS_IP>@tcp:/lustrefs /<client_path>
 

Migrate data retaining POSIX attributes

Once you have a client that is connected to the file system you can now copy data directly into that file system.

  • Assuming the source location is /mydata and the destination Lustre file system is /lustredata
  • The -a parameter for rsync preserves all POSIX attributes, such as ownership, permissions, timestamps, symlinks, etc. See the rsync manual page for more details.

To copy data into Lustre:

rsync -av /mydata /lustredata
 

Note:  When migrating data to Azure Managed Lustre File System (AMLFS), ensure that the total storage used does not exceed the system’s allowed capacity. If the storage exceeds the file system capacity, files need to be archived and released to blob storage as needed before continuing the data migration.

 

Export data and attributes to blob storage

Once the files have been copied into the Lustre File system, now utilize the export job process in order to write those files as well as the POSIX attributes as metadata to the blob storage container.  This process includes using the export jobs with archive process.

 

jasonschuff_2-1717079946492.png

Which POSIX attributes are retained during an export job?   

When you export files from your Azure Managed Lustre system to blob storage there are additional attributes that are saved as metadata inside the blob storage as shown here: Metadata for exported files.  The following attributes may be written as metadata to each object in blob storage depending on the type of object:

 

 

Parameter

Description

modtime

The last modification time of the file

owner

The owner of the file

group

The group owner of the file

permissions

The existing permissions of the file

hdi_isfolder

If object is a folder, this value is set to true. Name corresponds with folder name.

 

The metadata will appear in the blob attributes in storage as shown here:

jasonschuff_3-1717079985593.png

 

  

Restoring data into a new Azure Managed Lustre File System: 

 

Now that the blob storage contains the attributes for each blob object, including permissions and ownership of each file and directory, this data can be imported into any new Azure Managed Lustre file system while retaining those attributes.  Follow these steps in order to import data using import jobs.

 

Note:  This step is only required when setting up a new Azure Managed Lustre File System. It is not required for utilizing the existing AMLFS to which the data was originally copied.

 

References

 

Co-Authors
Version history
Last update:
‎Jun 11 2024 02:13 PM
Updated by: