Restoring Soft-Deleted Blobs with multithreading in Azure Storage Using C#
Published Sep 03 2024 09:02 AM 1,543 Views
Microsoft

Blob soft delete is an essential feature that safeguards your data against accidental deletions or overwrites. By retaining deleted data for a specified period, it ensures data integrity and availability, even in the event of human error. However, restoring data in the soft delete state can be more labor-intensive, as the undelete API must be called for each individual deleted blob. Currently, there is no option to bulk undelete all blobs.

 

In this blog, we provide a sample C# code that will help you restore soft-deleted data efficiently. The code leverages multiple threads to expedite the restoration process, making it particularly effective if you have a large number of blobs to restore. Additionally, this program can be configured to undelete blobs within a specific container or directory, rather than scanning the entire storage account.

 

To run this program, follow these steps:

  • Install .NET SDK: Ensure you have the .NET SDK installed on your machine.
  • Connect to Azure Account:

 

Connect-AzAccount

 

  • Add NuGet Source:

 

dotnet nuget add source https://api.nuget.org/v3/index.json -n nuget.org

 

  • Create a New Console Application:

 

dotnet new console --force

 

  • Add the following code to Program.cs.

 


using Azure.Core;
using Azure.Identity;
using Azure.Storage.Files.DataLake;
using Azure.Storage.Files.DataLake.Models;

var StorageAccountName = "xxxx";
var ContainerName = "xxxx";
var DirectoryPath = "";
var Concurrency = 500;
var BatchSize = 500;

static DataLakeServiceClient GetDatalakeClient(string accountName)
{
    DataLakeClientOptions clientOptions = new DataLakeClientOptions()
    {
        Retry = {
                    Delay = TimeSpan.FromMilliseconds(500),
                    MaxRetries = 5,
                    Mode = RetryMode.Fixed,
                    MaxDelay = TimeSpan.FromSeconds(5),
                    NetworkTimeout = TimeSpan.FromSeconds(30)
                },
    };

    // only works for prod.
    DataLakeServiceClient client = new(
        new Uri($"https://{accountName}.blob.core.windows.net"),
        new DefaultAzureCredential(),
        clientOptions);

    return client;
}

Console.WriteLine("Starting the program");

var client = GetDatalakeClient(StorageAccountName);
var throttler = new SemaphoreSlim(initialCount: Concurrency);

List<Task> tasks = new List<Task>();
List<string> containerNames = new List<string>();

if (string.IsNullOrEmpty(ContainerName))
{
    var containers = client.GetFileSystems();
    foreach (var container in containers)
    {
        containerNames.Add(container.Name);
    }
}
else
{
    containerNames.Add(ContainerName);
}

var totalSuccessCount = 0;
var totalFailedCount = 0;

foreach (var container in containerNames)
{
    Console.WriteLine($"Recoverying for container {container}");
    var fileSystem = client.GetFileSystemClient(container);

    var deletedItems = fileSystem.GetDeletedPaths(pathPrefix: DirectoryPath);
    var count = 0;
    var totalSuccessCountForContainer = 0;
    var totalFailedCountForContainer = 0;
    foreach (PathDeletedItem item in deletedItems)
    {
        await throttler.WaitAsync();
        count++;
        try
        {
            var task = (fileSystem.UndeletePathAsync(item.Path, item.DeletionId));
            var continuedTask = task.ContinueWith(t =>
            {
                throttler.Release();
                if (t.IsFaulted)
                {
                    Interlocked.Increment(ref totalFailedCount);
                    Interlocked.Increment(ref totalFailedCountForContainer);
                    Console.WriteLine($"Failed count for container {totalFailedCountForContainer}, total failed count {totalFailedCount}, path {DirectoryPath + item.Path} due to {t.Exception.Message}");
                }
                else
                {
                    Interlocked.Increment(ref totalSuccessCount);
                    Interlocked.Increment(ref totalSuccessCountForContainer);
                    Console.WriteLine($"Success count for container {totalSuccessCountForContainer}, total success count {totalSuccessCount}");
                }
            });
            tasks.Add(continuedTask);
        }
        catch (Exception ex)
        {
            Console.WriteLine("Failed to create task: " + ex.ToString());
        }
        finally
        {
            if (count == Math.Max(Concurrency, BatchSize))
            {
                count = 0;
                await Task.WhenAll(tasks);
                tasks.Clear();
            }
        }
    }

    await Task.WhenAll(tasks);
    Console.WriteLine($"Recover finished for container {container}");
}

 

 

Replace xxxx with your specific storage account and container name. If you need to restore a particular directory, provide the directory name; otherwise, leave it empty to scan the entire container. The code is configured to run with 500 threads by default, but you can adjust this number according to your needs.

 

  • Add Required Packages:

 

dotnet add package Azure.Identity
dotnet add package Azure.Storage.Blobs

 

  • Build the Project:

 

dotnet build --configuration Release

 

 

  • Run the Program:

 

dotnet <path_to_dll>

 

 

Once the application is running, you can monitor the console window to track its progress and identify any potential issues or failures.

Co-Authors
Version history
Last update:
‎Sep 02 2024 11:57 PM
Updated by: