Blob soft delete is an essential feature that safeguards your data against accidental deletions or overwrites. By retaining deleted data for a specified period, it ensures data integrity and availability, even in the event of human error. However, restoring data in the soft delete state can be more labor-intensive, as the undelete API must be called for each individual deleted blob. Currently, there is no option to bulk undelete all blobs.
In this blog, we provide a sample C# code that will help you restore soft-deleted data efficiently. The code leverages multiple threads to expedite the restoration process, making it particularly effective if you have a large number of blobs to restore. Additionally, this program can be configured to undelete blobs within a specific container or directory, rather than scanning the entire storage account.
To run this program, follow these steps:
- Install .NET SDK: Ensure you have the .NET SDK installed on your machine.
- Connect to Azure Account:
Connect-AzAccount
- Add NuGet Source:
dotnet nuget add source https://api.nuget.org/v3/index.json -n nuget.org
- Create a New Console Application:
dotnet new console --force
- Add the following code to Program.cs.
using Azure.Core;
using Azure.Identity;
using Azure.Storage.Files.DataLake;
using Azure.Storage.Files.DataLake.Models;
var StorageAccountName = "xxxx";
var ContainerName = "xxxx";
var DirectoryPath = "";
var Concurrency = 500;
var BatchSize = 500;
static DataLakeServiceClient GetDatalakeClient(string accountName)
{
DataLakeClientOptions clientOptions = new DataLakeClientOptions()
{
Retry = {
Delay = TimeSpan.FromMilliseconds(500),
MaxRetries = 5,
Mode = RetryMode.Fixed,
MaxDelay = TimeSpan.FromSeconds(5),
NetworkTimeout = TimeSpan.FromSeconds(30)
},
};
// only works for prod.
DataLakeServiceClient client = new(
new Uri($"https://{accountName}.blob.core.windows.net"),
new DefaultAzureCredential(),
clientOptions);
return client;
}
Console.WriteLine("Starting the program");
var client = GetDatalakeClient(StorageAccountName);
var throttler = new SemaphoreSlim(initialCount: Concurrency);
List<Task> tasks = new List<Task>();
List<string> containerNames = new List<string>();
if (string.IsNullOrEmpty(ContainerName))
{
var containers = client.GetFileSystems();
foreach (var container in containers)
{
containerNames.Add(container.Name);
}
}
else
{
containerNames.Add(ContainerName);
}
var totalSuccessCount = 0;
var totalFailedCount = 0;
foreach (var container in containerNames)
{
Console.WriteLine($"Recoverying for container {container}");
var fileSystem = client.GetFileSystemClient(container);
var deletedItems = fileSystem.GetDeletedPaths(pathPrefix: DirectoryPath);
var count = 0;
var totalSuccessCountForContainer = 0;
var totalFailedCountForContainer = 0;
foreach (PathDeletedItem item in deletedItems)
{
await throttler.WaitAsync();
count++;
try
{
var task = (fileSystem.UndeletePathAsync(item.Path, item.DeletionId));
var continuedTask = task.ContinueWith(t =>
{
throttler.Release();
if (t.IsFaulted)
{
Interlocked.Increment(ref totalFailedCount);
Interlocked.Increment(ref totalFailedCountForContainer);
Console.WriteLine($"Failed count for container {totalFailedCountForContainer}, total failed count {totalFailedCount}, path {DirectoryPath + item.Path} due to {t.Exception.Message}");
}
else
{
Interlocked.Increment(ref totalSuccessCount);
Interlocked.Increment(ref totalSuccessCountForContainer);
Console.WriteLine($"Success count for container {totalSuccessCountForContainer}, total success count {totalSuccessCount}");
}
});
tasks.Add(continuedTask);
}
catch (Exception ex)
{
Console.WriteLine("Failed to create task: " + ex.ToString());
}
finally
{
if (count == Math.Max(Concurrency, BatchSize))
{
count = 0;
await Task.WhenAll(tasks);
tasks.Clear();
}
}
}
await Task.WhenAll(tasks);
Console.WriteLine($"Recover finished for container {container}");
}
Replace xxxx with your specific storage account and container name. If you need to restore a particular directory, provide the directory name; otherwise, leave it empty to scan the entire container. The code is configured to run with 500 threads by default, but you can adjust this number according to your needs.
- Add Required Packages:
dotnet add package Azure.Identity
dotnet add package Azure.Storage.Blobs
- Build the Project:
dotnet build --configuration Release
- Run the Program:
dotnet <path_to_dll>
Once the application is running, you can monitor the console window to track its progress and identify any potential issues or failures.