ADLS Gen1 to ADLS Gen2 Migrations: Options and Best Practices
Published May 17 2022 01:45 PM 4,856 Views
Microsoft

If you are not aware Azure Data Lake Storage (ADLS) Gen1 is being replaced by ADLS Gen2. The end of life for Gen1 is Feb 2024 (Retirement Announcement). FastTrack along with the Customer Support Unit (CSU), and the Azure Storage PG are running a program to ensure our customer base can successfully migrate from ADLS Gen 1 to ADLS Gen 2 by the deadline noted above. In this article we will characterize why ADLS Gen2 is the way forward, describe what an ADLS Gen1 migration life cycle looks like, detail the options you have for conducting a successful migration, and call out any underlying nuances, best practices and concerns for each option. (Please visit the Azure Data Lake Storage migration guidelines and patterns Microsoft Doc and https://aka.ms/adlsmigrations for additional information)

 

Asset 31.png

 

Why ADLS Gen2 is the way forward?

 

  • Gen2 offers lower total costs and more features
  • Premier Tier with SSD based storage
  • File System Semantics (e.g., hierarchical namespace)
  • Tiered Storage (hot, cold, archive)
  • File-level security
  • Additional redundancy options​
  • Increased performance
  • Lower costs for storage and transactions
  • Available in all regions
  • Private endpoint connections
  • Seamless integration with Azure analytics offerings (Synapse Analytics and Databricks) 
 

(For additional details visit the Comparing Azure Data Lake Storage Gen1 and Azure Blog Storage Microsoft Doc)

 

ADLS Gen1 Migration Lifecycle

(please visit the ADLS Gen1 Migrations Mind Map for additional details)

Like other data migrations (i.e. SQL, Oracle, etc.) ADLS Gen1 migrations follow a similar pattern/life cycle of: Assess, Prepare, Migrate, Cut Over

 

  1.  Assess Your Workload
    • Get familiar with Azure Data Lake Storage migration guidelines and patterns.
    • Read up on ADLS Gen2 features: Pricing, Feature Gaps, Access Control Model, Redundancy Options, Access Tiers, Life Cycle Management, Optimized Drivers and Best Practices for using ADLS Gen2.
    • Read up on the differences between ADLS Gen1 and ADLS Gen2 Capabilities: Data Organization, VNET Support, Access Control, etc. 
    • Read up on the differences between ADLS Gen1 and Azure Blob Storage.
    • Understand the RBAC and ACL behavioral differences between Gen1 and Gen2You should be aware of and understand the behavioral differences of the Access Control Models for Data Lake Storage Gen1 and Gen2. 
    • Be aware of your ADLS Gen2 Scalability and Performance Best Practices: Premium Disks, Partitioning, File Formats, File Size, Query Acceleration, Driver Optimizations and Bandwidth and Operations Per Blob Limits.
    • If you have an ADLA (Azure Data Lake Analytics) Account be aware that ADLA EOL is February 2024 and ADLA does not support ADLS Gen2. You will need to migrate your ADLA workloads to Azure Synapse, Databricks or another supported compute platform before attempting to migrate your Gen1 account to ADLS Gen2.
    • Take Inventory of your ADLS Gen1 Enviornment:
      1. How many ADLS Gen1 Accounts do you have? What Subscription, Resource Groups and Regions are the ADLS Gen1 Lakes in?
      2. What is the size of your Data Lakes? It is important to understand the size and footprint of your ADLS Gen1 Accounts. By understanding the size of your ADLS Gen1 footprint you will be better at estimating how long it will take you to migrate. A bigger footprint means it will take you longer to migrate, which means you will need a bigger migration window.
      3. Conduct an Inventory: File and Folder Count AnalysisYou should be aware of the overall inventory of your environment. Did you take note of all the File and Folder Counts, etc. and make sure you have the expected counts from your ADLS Gen1 Account? You want to make sure you are aware of what's in ADLS Gen1 so you can be sure you get the same counts (files and folders) in Gen2 after you've migrated.
      4. Gather all spots where you connect to your ADLS Gen1. How many applications are pointing to Gen1?
      5. Check for feature Gaps between ADLS Gen1 and Gen2.
      6. Can you afford downtime? This question will more than likely dictate what option you use to migrate. If you can afford some downtime look towards the Portal Experience for your migration. If you can't afford any downtime then you will more than likely lean on Azure Data Factory to do your heavy lifting.
  2.  Prepare Your Environment and Workload
    • Clean up data in your ADLS Gen1 Account. Try not to migrate anything to ADLS you don't need. Useless data adds more clutter and costs.
    • Analyze the Migration Options below (Portal Experience Migration vs Azure Data Factory Migrations). Choose the one that best fits your scenario.
    • Create and Configure your ADLS Gen2 Storage Account.
      • Make sure the account is completely empty and has never been used before. 
      • Configure the necessary redundancy, soft deletes, account key access, TLS version, Life Cycle Rules, access tiers.
    • Configure Security on your ADLS Gen2 Storage Account: RBAC, ACLs, VNET Support, Service Endpoints and/or Private Endpoints. 
    • Create an ADLS Migration Checklist. A checklist will help you juggle the numerous items and activities you need to accomplish during your Migration Week or Weekend and guide you towards your final cut over into an ADLS Gen2 Account.
    • Limit Increases. If during your Dry Runs you discover that Ingress/Egress exceeds throughput you can contact Support for a limit increase to improve the performance of your migration during your migration window.
    • Analyze your Migration Window length. Is it too tight? Or do you need more time?
    • Fix any Blocking Items.
    • Do you have a failback plan in place? What will you do if you have to failback?
    • Conduct a Migration Dry Run of a representative data set. This is probably the most important step. You need to practice, practice, practice to get an indication of how long it will take you to migrate, but also to validate that all dependencies and applications will work with ADLS Gen2.
    • Practice the "Go-Live"
  3.  Migrate (please visit the ADLS Migration Weekend Checklist for additional details)
    • Lock everyone out - Did you take away access to your users so they can't make any changes? Did you make sure people can't write to the new ADLS Gen2 Account?
    • Kick off your Migration
      • Migrate your folders
      • Migrate your files
      • Migrate your metadata
      • Migrate your ACLS
      • Migrate your RBAC
    • Monitor
    • Adjust connection strings inside your applications and external systems.
    • After Migration can you connect to your ADLS Gen2 Account and see data? Is sensitive data protected? Is there any missing data?
  4.  Cut Over
    • Validate all systems are accurate
    • Conduct your Go/No Go Meeting
    • Cut over

 

Migration Options

 

1. Portal Experience

If you want an option that will simplify the migration experience from ADLS Gen1 to Gen2 this is your option. This option is called the Portal Experience because the Azure Portal does the heavy lifting for you.

  • Here are the Pros:
    • Ease of Use
    • Speed ~ 7-10 Terabytes per hour.
    • Portal Experience Tool will give you an estimate of how long it will take to migrate your data.
    • Validation - Portal Experience Tool will create an inventory of all the files and folders in your ADLS Gen1 Account and validate if they were migrated to your landing ADLS Gen2 Account.
    • Portal Tool is free. There is no cost to use the tool.
  • Things to Know:
    • Portal Experience Tool cannot filter data so it's an all or nothing approach.  Everything gets moved.  You cannot pick and choose which files and folders to move. The tool will move everything in the account. So the data, the metadata and the ACL's get moved into your landing ADLS Gen2 Account.
    • For a migration to work you need a brand new, empty, never been used before ADLS Gen2 Account.
    • Portal Experience Tool can only migrate accounts which have less than 250 million files. 
    • Once you consent, and then press the Migrate Button, the tool will place your ADLS Gen1 Account into Read Only mode for the duration of the migration. 
    • Expect downtime during the migration period. The ADLS Gen1 Account will be in Read Only mode. (Remember, it's important to properly size YOUR Migration Window and plan for your account being in Read Only mode.)
    • Redundancy - For the Portal Experience you need to set your ADLS Gen2 Account to LRS for the migration. Post Migration you can adjust the redundancy for the account.

Customers will have two options to migrate their workloads within the Portal Experience Tool

  1. Copy data only (recommended): Data will be copied from Gen1 to Gen2, and Customer must update the applications and compute workloads to use the new ADLS Gen2 endpoint. This will be the preferred option for most of the customers with simple applications.
  2. Perform a complete migration: Data will be copied from Gen1 to Gen2 and the traffic from the Gen1 account will also be redirected to the Gen2 account. Redirected requests will use the Gen1 compatibility layer to translate Gen1 API calls to Gen2 equivalents. This option will help customers with complex workloads to have a stop gap solution before they update their application to use the Gen2 endpoint.

 

portalexperience1.png

 

portalexperience2.png

 

2. Azure Data Factory 

If you want an option that will give you the flexibility you need for your ADLS Gen1 to Gen2 migrations, then this is your option. Using Azure Data Factory to migrate your data gives you the following advantages:

  • Flexibility - You have the ability to pick and choose your files and folders. You can also pick and choose if you are going to bring over your ACL's or not.
  • Shrinking the Data Set - With ADF you can clean out unnecessary files and folders and only bring over what you want.
  • Cross Region - With ADF you can migrate to an ADLS Gen2 Account in another Region and put the data closer to you or your clients.
  • There are a number of sample ADF Migration pipelines that you can derive off of and use to get started:
    • Lift and Shift - Ideal for pipelines that can afford downtime.
    • Incremental Copy - Ideal for pipelines where all applications are upgraded in one go, but data copy requires more time.
    • Dual Pipelines - Ideal for scenarios when dual ingestion is needed and no downtime can be afforded.
    • These samples also include a Data Validation component and a Data Comparison Report to help you validate the data made it over into your ADLS Gen2 account.
  • Things to Know:
    • Start with a single ADF Copy Activity in a single pipeline and then gradually increase the parallelism.
    • Make sure the Degree of Copy Parallelism in the Copy Activity is set to nothing (empty). You want ADF to automatically handle scaling out for you and with ADF handling it for you, you will get better performance than if you were to dictate or specifically call out the Degree of Parallelism.
    • DIU Setting: Start out with 128 DIU and then start going up to get better performance.
    • If your data size in Data Lake Storage Gen1 is less than 30TB and the number of files is less than 1 million, you can copy all data into a single copy activity run.
    • ADF will start causing issues if you have too many files, specifically anything over 700,000,000 files.
    • Using Private Link will slow things down and cause the performance of your ADF Pipeline to go down.
    • ADF cannot support preserving the LastModified date from Gen1 to Gen2.

 

adfexperience1.png

 

ADLA (Azure Data Lake Analytics)

Azure Data Lake Storage Gen2 doesn't support Azure Data Lake Analytics. Azure Data Lake Analytics will be retired on February 29, 2024. If you attempt to use the Azure portal to migrate an Azure Data Lake Storage Gen1 account that is used for Azure Data Lake Analytics, it's possible that you'll break your Azure Data Lake Analytics workloads. You must first migrate your Azure Data Lake Analytics workloads to Azure Synapse Analytics or another supported compute platform before attempting to migrate your Gen1 account. For more information, see

 

Final Thoughts

Whether you have a simple migration or a complex migration with many dependencies, the tools to handle your migration are available through the Portal Experience and/or Azure Data Factory. However, the most important tool is your assessment, your planning and actually doing some Dry Runs. You must practice. In the end, planning and practice are the keys to making sure your migration from ADLS Gen1 to ADLS Gen2 is a success.

 

Let us know what you think by commenting below.

 

Resources

FastTrack for Azure:  Move to Azure efficiently with customized guidance from Azure engineering. FastTrack for Azure – Benefits and FAQ | Microsoft Azure 

Version history
Last update:
‎May 20 2022 05:31 AM
Updated by: