updates
22 TopicsResiliency Best Practices You Need For your Blob Storage Data
Maintaining Resiliency in Azure Blob Storage: A Guide to Best Practices Azure Blob Storage is a cornerstone of modern cloud storage, offering scalable and secure solutions for unstructured data. However, maintaining resiliency in Blob Storage requires careful planning and adherence to best practices. In this blog, I’ll share practical strategies to ensure your data remains available, secure, and recoverable under all circumstances. 1. Enable Soft Delete for Accidental Recovery (Most Important) Mistakes happen, but soft delete can be your safety net and. It allows you to recover deleted blobs within a specified retention period: Configure a soft delete retention period in Azure Storage. Regularly monitor your blob storage to ensure that critical data is not permanently removed by mistake. Enabling soft delete in Azure Blob Storage does not come with any additional cost for simply enabling the feature itself. However, it can potentially impact your storage costs because the deleted data is retained for the configured retention period, which means: The retained data contributes to the total storage consumption during the retention period. You will be charged according to the pricing tier of the data (Hot, Cool, or Archive) for the duration of retention 2. Utilize Geo-Redundant Storage (GRS) Geo-redundancy ensures your data is replicated across regions to protect against regional failures: Choose RA-GRS (Read-Access Geo-Redundant Storage) for read access to secondary replicas in the event of a primary region outage. Assess your workload’s RPO (Recovery Point Objective) and RTO (Recovery Time Objective) needs to select the appropriate redundancy. 3. Implement Lifecycle Management Policies Efficient storage management reduces costs and ensures long-term data availability: Set up lifecycle policies to transition data between hot, cool, and archive tiers based on usage. Automatically delete expired blobs to save on costs while keeping your storage organized. 4. Secure Your Data with Encryption and Access Controls Resiliency is incomplete without robust security. Protect your blobs using: Encryption at Rest: Azure automatically encrypts data using server-side encryption (SSE). Consider enabling customer-managed keys for additional control. Access Policies: Implement Shared Access Signatures (SAS) and Stored Access Policies to restrict access and enforce expiration dates. 5. Monitor and Alert for Anomalies Stay proactive by leveraging Azure’s monitoring capabilities: Use Azure Monitor and Log Analytics to track storage performance and usage patterns. Set up alerts for unusual activities, such as sudden spikes in access or deletions, to detect potential issues early. 6. Plan for Disaster Recovery Ensure your data remains accessible even during critical failures: Create snapshots of critical blobs for point-in-time recovery. Enable backup for blog & have the immutability feature enabled Test your recovery process regularly to ensure it meets your operational requirements. 7. Resource lock Adding Azure Locks to your Blob Storage account provides an additional layer of protection by preventing accidental deletion or modification of critical resources 7. Educate and Train Your Team Operational resilience often hinges on user awareness: Conduct regular training sessions on Blob Storage best practices. Document and share a clear data recovery and management protocol with all stakeholders. 8. "Critical Tip: Do Not Create New Containers with Deleted Names During Recovery" If a container or blob storage is deleted for any reason and recovery is being attempted, it’s crucial not to create a new container with the same name immediately. Doing so can significantly hinder the recovery process by overwriting backend pointers, which are essential for restoring the deleted data. Always ensure that no new containers are created using the same name during the recovery attempt to maximize the chances of successful restoration. Wrapping It Up Azure Blob Storage offers an exceptional platform for scalable and secure storage, but its resiliency depends on following best practices. By enabling features like soft delete, implementing redundancy, securing data, and proactively monitoring your storage environment, you can ensure that your data is resilient to failures and recoverable in any scenario. Protect your Azure resources with a lock - Azure Resource Manager | Microsoft Learn Data redundancy - Azure Storage | Microsoft Learn Overview of Azure Blobs backup - Azure Backup | Microsoft Learn Protect your Azure resources with a lock - Azure Resource Manager | Microsoft Learn1.1KViews1like1CommentMastering Azure Queries: Skip Token and Batching for Scale
Let's be honest. As a cloud engineer or DevOps professional managing a large Azure environment, running even a simple resource inventory query can feel like drinking from a firehose. You hit API limits, face slow performance, and struggle to get the complete picture of your estate—all because the data volume is overwhelming. But it doesn't have to be this way! This blog is your practical, hands-on guide to mastering two essential techniques for handling massive data volumes in Azure: using PowerShell and Azure Resource Graph (ARG): Skip Token (for full data retrieval) and Batching (for blazing-fast performance). 📋 TABLE OF CONTENTS 🚀 GETTING STARTED │ ├─ Prerequisites: PowerShell 7+ & Az.ResourceGraph Module └─ Introduction: Why Standard Queries Fail at Scale 📖 CORE CONCEPTS │ ├─ 📑 Skip Token: The Data Completeness Tool │ ├─ What is a Skip Token? │ ├─ The Bookmark Analogy │ ├─ PowerShell Implementation │ └─ 💻 Code Example: Pagination Loop │ └─ ⚡ Batching: The Performance Booster ├─ What is Batching? ├─ Performance Benefits ├─ Batching vs. Pagination ├─ Parallel Processing in PowerShell └─ 💻 Code Example: Concurrent Queries 🔍 DEEP DIVE │ ├─ Skip Token: Generic vs. Azure-Specific └─ Azure Resource Graph (ARG) at Scale ├─ ARG Overview ├─ Why ARG Needs These Techniques └─ 💻 Combined Example: Skip Token + Batching ✅ BEST PRACTICES │ ├─ When to Use Each Technique └─ Quick Reference Guide 📚 RESOURCES └─ Official Documentation & References Prerequisites Component Requirement / Details Command / Reference PowerShell Version The batching examples use ForEach-Object -Parallel, which requires PowerShell 7.0 or later. Check version: $PSVersionTable.PSVersion Install PowerShell 7+: Install PowerShell on Windows, Linux, and macOS Azure PowerShell Module Az.ResourceGraph module must be installed. Install module: Install-Module -Name Az.ResourceGraph -Scope CurrentUser Introduction: Why Standard Queries Don't Work at Scale When you query a service designed for big environments, like Azure Resource Graph, you face two limits: Result Limits (Pagination): APIs won't send you millions of records at once. They cap the result size (often 1,000 items) and stop. Efficiency Limits (Throttling): Sending a huge number of individual requests is slow and can cause the API to temporarily block you (throttling). Skip Token helps you solve the first limit by making sure you retrieve all results. Batching solves the second by grouping your requests to improve performance. Understanding Skip Token: The Continuation Pointer What is a Skip Token? A Skip Token (or continuation token) is a unique string value returned by an Azure API when a query result exceeds the maximum limit for a single response. Think of the Skip Token as a “bookmark” that tells Azure where your last page ended — so you can pick up exactly where you left off in the next API call. Instead of getting cut off after 1,000 records, the API gives you the first 1,000 results plus the Skip Token. You use this token in the next request to get the next page of data. This process is called pagination. Skip Token in Practice with PowerShell To get the complete dataset, you must use a loop that repeatedly calls the API, providing the token each time until the token is no longer returned. PowerShell Example: Using Skip Token to Loop Pages # Define the query $Query = "Resources | project name, type, location" $PageSize = 1000 $AllResults = @() $SkipToken = $null # Initialize the token Write-Host "Starting ARG query..." do { Write-Host "Fetching next page. (Token check: $($SkipToken -ne $null))" # 1. Execute the query, using the -SkipToken parameter $ResultPage = Search-AzGraph -Query $Query -First $PageSize -SkipToken $SkipToken # 2. Add the current page results to the main array $AllResults += $ResultPage # 3. Get the token for the next page, if it exists $SkipToken = $ResultPage.SkipToken Write-Host " -> Items in this page: $($ResultPage.Count). Total retrieved: $($AllResults.Count)" } while ($SkipToken -ne $null) # Loop as long as a Skip Token is returned Write-Host "Query finished. Total resources found: $($AllResults.Count)" This do-while loop is the reliable way to ensure you retrieve every item in a large result set. Understanding Batching: Grouping Requests What is Batching? Batching means taking several independent requests and combining them into a single API call. Instead of making N separate network requests for N pieces of data, you make one request containing all N sub-requests. Batching is primarily used for performance. It improves efficiency by: Reducing Overhead: Fewer separate network connections are needed. Lowering Throttling Risk: Fewer overall API calls are made, which helps you stay under rate limits. Feature Batching Pagination (Skip Token) Goal Improve efficiency/speed. Retrieve all data completely. Input Multiple different queries. Single query, continuing from a marker. Result One response with results for all grouped queries. Partial results with a token for the next step. Note: While Azure Resource Graph's REST API supports batch requests, the PowerShell Search-AzGraph cmdlet does not expose a -Batch parameter. Instead, we achieve batching by using PowerShell's ForEach-Object -Parallel (PowerShell 7+) to run multiple queries simultaneously. Batching in Practice with PowerShell Using parallel processing in PowerShell, you can efficiently execute multiple distinct Kusto queries targeting different scopes (like subscriptions) simultaneously. Method 5 Subscriptions 20 Subscriptions Sequential ~50 seconds ~200 seconds Parallel (ThrottleLimit 5) ~15 seconds ~45 seconds PowerShell Example: Running Multiple Queries in Parallel # Define multiple queries to run together $BatchQueries = @( @{ Query = "Resources | where type =~ 'Microsoft.Compute/virtualMachines'" Subscriptions = @("SUB_A") # Query 1 Scope }, @{ Query = "Resources | where type =~ 'Microsoft.Network/publicIPAddresses'" Subscriptions = @("SUB_B", "SUB_C") # Query 2 Scope } ) Write-Host "Executing batch of $($BatchQueries.Count) queries in parallel..." # Execute queries in parallel (true batching) $BatchResults = $BatchQueries | ForEach-Object -Parallel { $QueryConfig = $_ $Query = $QueryConfig.Query $Subs = $QueryConfig.Subscriptions Write-Host "[Batch Worker] Starting query: $($Query.Substring(0, [Math]::Min(50, $Query.Length)))..." -ForegroundColor Cyan $QueryResults = @() # Process each subscription in this query's scope foreach ($SubId in $Subs) { $SkipToken = $null do { $Params = @{ Query = $Query Subscription = $SubId First = 1000 } if ($SkipToken) { $Params['SkipToken'] = $SkipToken } $Result = Search-AzGraph @Params if ($Result) { $QueryResults += $Result } $SkipToken = $Result.SkipToken } while ($SkipToken) } Write-Host " [Batch Worker] ✅ Query completed - Retrieved $($QueryResults.Count) resources" -ForegroundColor Green # Return results with metadata [PSCustomObject]@{ Query = $Query Subscriptions = $Subs Data = $QueryResults Count = $QueryResults.Count } } -ThrottleLimit 5 Write-Host "`nBatch complete. Reviewing results..." # The results are returned in the same order as the input array $VMCount = $BatchResults[0].Data.Count $IPCount = $BatchResults[1].Data.Count Write-Host "Query 1 (VMs) returned: $VMCount results." Write-Host "Query 2 (IPs) returned: $IPCount results." # Optional: Display detailed results Write-Host "`n--- Detailed Results ---" for ($i = 0; $i -lt $BatchResults.Count; $i++) { $Result = $BatchResults[$i] Write-Host "`nQuery $($i + 1):" Write-Host " Query: $($Result.Query)" Write-Host " Subscriptions: $($Result.Subscriptions -join ', ')" Write-Host " Total Resources: $($Result.Count)" if ($Result.Data.Count -gt 0) { Write-Host " Sample (first 3):" $Result.Data | Select-Object -First 3 | Format-Table -AutoSize } } Azure Resource Graph (ARG) and Scale Azure Resource Graph (ARG) is a service built for querying resource properties quickly across a large number of Azure subscriptions using the Kusto Query Language (KQL). Because ARG is designed for large scale, it fully supports Skip Token and Batching: Skip Token: ARG automatically generates and returns the token when a query exceeds its result limit (e.g., 1,000 records). Batching: ARG's REST API provides a batch endpoint for sending up to ten queries in a single request. In PowerShell, we achieve similar performance benefits using ForEach-Object -Parallel to process multiple queries concurrently. Combined Example: Batching and Skip Token Together This script shows how to use Batching to start a query across multiple subscriptions and then use Skip Token within the loop to ensure every subscription's data is fully retrieved. $SubscriptionIDs = @("SUB_A") $KQLQuery = "Resources | project id, name, type, subscriptionId" Write-Host "Starting BATCHED query across $($SubscriptionIDs.Count) subscription(s)..." Write-Host "Using parallel processing for true batching...`n" # Process subscriptions in parallel (batching) $AllResults = $SubscriptionIDs | ForEach-Object -Parallel { $SubId = $_ $Query = $using:KQLQuery $SubResults = @() Write-Host "[Batch Worker] Processing Subscription: $SubId" -ForegroundColor Cyan $SkipToken = $null $PageCount = 0 do { $PageCount++ # Build parameters $Params = @{ Query = $Query Subscription = $SubId First = 1000 } if ($SkipToken) { $Params['SkipToken'] = $SkipToken } # Execute query $Result = Search-AzGraph @Params if ($Result) { $SubResults += $Result Write-Host " [Batch Worker] Sub: $SubId - Page $PageCount - Retrieved $($Result.Count) resources" -ForegroundColor Yellow } $SkipToken = $Result.SkipToken } while ($SkipToken) Write-Host " [Batch Worker] ✅ Completed $SubId - Total: $($SubResults.Count) resources" -ForegroundColor Green # Return results from this subscription $SubResults } -ThrottleLimit 5 # Process up to 5 subscriptions simultaneously Write-Host "`n--- Batch Processing Finished ---" Write-Host "Final total resource count: $($AllResults.Count)" # Optional: Display sample results if ($AllResults.Count -gt 0) { Write-Host "`nFirst 5 resources:" $AllResults | Select-Object -First 5 | Format-Table -AutoSize } Technique Use When... Common Mistake Actionable Advice Skip Token You must retrieve all data items, expecting more than 1,000 results. Forgetting to check for the token; you only get partial data. Always use a do-while loop to guarantee you get the complete set. Batching You need to run several separate queries (max 10 in ARG) efficiently. Putting too many queries in the batch, causing the request to fail. Group up to 10 logical queries or subscriptions into one fast request. By combining Skip Token for data completeness and Batching for efficiency, you can confidently query massive Azure estates without hitting limits or missing data. These two techniques — when used together — turn Azure Resource Graph from a “good tool” into a scalable discovery engine for your entire cloud footprint. Summary: Skip Token and Batching in Azure Resource Graph Goal: Efficiently query massive Azure environments using PowerShell and Azure Resource Graph (ARG). 1. Skip Token (The Data Completeness Tool) Concept What it Does Why it Matters PowerShell Use Skip Token A marker returned by Azure APIs when results hit the 1,000-item limit. It points to the next page of data. Ensures you retrieve all records, avoiding incomplete data (pagination). Use a do-while loop with the -SkipToken parameter in Search-AzGraph until the token is no longer returned. 2. Batching (The Performance Booster) Concept What it Does Why it Matters PowerShell Use Batching Processes multiple independent queries simultaneously using parallel execution. Drastically improves query speed by reducing overall execution time and helps avoid API throttling. Use ForEach-Object -Parallel (PowerShell 7+) with -ThrottleLimit to control concurrent queries. For PowerShell 5.1, use Start-Job with background jobs. 3. Best Practice: Combine Them For maximum efficiency, combine Batching and Skip Token. Use batching to run queries across multiple subscriptions simultaneously and use the Skip Token logic within the loop to ensure every single subscription's data is fully paginated and retrieved. Result: Fast, complete, and reliable data collection across your large Azure estate. References: Azure Resource Graph documentation Search-AzGraph PowerShell reference315Views1like2CommentsOperational Excellence In AI Infrastructure Fleets: Standardized Node Lifecycle Management
Co-authors: Choudary Maddukuri and Bhushan Mehendale AI infrastructure is scaling at an unprecedented pace, and the complexity of managing it is growing just as quickly. Onboarding new hardware into hyperscale fleets can take months, slowed by fragmented tools, vendor-specific firmware, and inconsistent diagnostics. As hyperscalers expand with diverse accelerators and CPU architectures, operational friction has become a critical bottleneck. Microsoft, in collaboration with the Open Compute Project (OCP) and leading silicon partners, is addressing this challenge. By standardizing lifecycle management across heterogeneous fleets, we’ve dramatically reduced onboarding effort, improved reliability, and achieved >95% Nodes-in-Service on incredibly large fleet sizes. This blog explores how we are contributing to and leveraging open standards to transform fragmented infrastructure into scalable, vendor-neutral AI platforms. Industry Context & Problem The rapid growth of generative AI has accelerated the adoption of GPUs and accelerators from multiple vendors, alongside diverse CPU architectures such as Arm and x86. Each new hardware SKU introduces its own ecosystem of proprietary tools, firmware update processes, management interfaces, reliability mechanisms, and diagnostic workflows. This hardware diversity leads to engineering toil, delayed deployments, and inconsistent customer experiences. Without a unified approach to lifecycle management, hyperscalers face escalating operational costs, slower innovation, and reduced efficiency. Node Lifecycle Standardization: Enabling Scalable, Reliable AI Infrastructure Microsoft, through the Open Compute Project (OCP) in collaboration with AMD, Arm, Google, Intel, Meta, and NVIDIA, is leading an industry-wide initiative to standardize AI infrastructure lifecycle management across GPU and CPU hardware management workstreams. Historically, onboarding each new SKU was a highly resource-intensive effort due to custom implementations and vendor-specific behaviors that required extensive Azure integration. This slowed scalability, increased engineering overhead, and limited innovation. With standardized node lifecycle processes and compliance tooling, hyperscalers can now onboard new SKUs much faster, achieving over 70% reduction in effort while enhancing overall fleet operational excellence. These efforts also enable silicon vendors to ensure interoperability across multiple cloud providers. Figure: How Standardization benefits both Hyperscalers & Suppliers. Key Benefits and Capabilities Firmware Updates: Firmware update mechanisms aligned with DMTF standards, minimize downtime and streamline fleet-wide secure deployments. Unified Manageability Interfaces: Standardized Redfish APIs and PLDM protocols create a consistent framework for out-of-band management, reducing integration overhead and ensuring predictable behavior across hardware vendors. RAS (Reliability, Availability and Serviceability) Features: Standardization enforces minimum RAS requirements across all IP blocks, including CPER (Common Platform Error Record) based error logging, crash dumps, and error recovery flows to enhance system uptime. Debug & Diagnostics: Unified APIs and standardized crash & debug dump formats reduce issue resolution time from months to days. Streamlined diagnostic workflows enable precise FRU isolation and clear service actions. Compliance Tooling: Tool contributions such as CTAM (Compliance Tool for Accelerator Manageability) and CPACT (Cloud Processor Accessibility Compliance Tool) automate compliance and acceptance testing—ensuring suppliers meet hyperscaler requirements for seamless onboarding. Technical Specifications & Contributions Through deep collaboration within the Open Compute Project (OCP) community, Microsoft and its partners have published multiple specifications that streamline SKU development, validation, and fleet operations. Summary of Key Contributions Specification Focus Area Benefit GPU Firmware Update requirements Firmware Updates Enables consistent firmware update processes across vendors GPU Management Interfaces Manageability Standardizes telemetry and control via Redfish/PLDM GPU RAS Requirements Reliability and Availability Reduces AI job interruptions caused by hardware errors CPU Debug and RAS requirements Debug and Diagnostics Achieves >95% node serviceability through unified diagnostics and debug CPU Impactless Updates requirements Impactless Updates Enables Impactless firmware updates to address security and quality issues without workload interruptions Compliance Tools Validation Automates specification compliance testing for faster hardware onboarding Embracing Open Standards: A Collaborative Shift in AI Infrastructure Management This standardized approach to lifecycle management represents a foundational shift in how AI infrastructure is maintained. By embracing open standards and collaborative innovation, the industry can scale AI deployments faster, with greater reliability and lower operational cost. Microsoft’s leadership within the OCP community—and its deep partnerships with other Hyperscalers and silicon vendors—are paving the way for scalable, interoperable, and vendor-neutral AI infrastructure across the global cloud ecosystem. To learn more about Microsoft’s datacenter innovations, check out the virtual datacenter tour at datacenters.microsoft.com.717Views0likes0CommentsMicrosoft Azure Cloud HSM is now generally available
Microsoft Azure Cloud HSM is now generally available. Azure Cloud HSM is a highly available, FIPS 140-3 Level 3 validated single-tenant hardware security module (HSM) service designed to meet the highest security and compliance standards. With full administrative control over their HSM, customers can securely manage cryptographic keys and perform cryptographic operations within their own dedicated Cloud HSM cluster. In today’s digital landscape, organizations face an unprecedented volume of cyber threats, data breaches, and regulatory pressures. At the heart of securing sensitive information lies a robust key management and encryption strategy, which ensures that data remains confidential, tamper-proof, and accessible only to authorized users. However, encryption alone is not enough. How cryptographic keys are managed determines the true strength of security. Every interaction in the digital world from processing financial transactions, securing applications like PKI, database encryption, document signing to securing cloud workloads and authenticating users relies on cryptographic keys. A poorly managed key is a security risk waiting to happen. Without a clear key management strategy, organizations face challenges such as data exposure, regulatory non-compliance and operational complexity. An HSM is a cornerstone of a strong key management strategy, providing physical and logical security to safeguard cryptographic keys. HSMs are purpose-built devices designed to generate, store, and manage encryption keys in a tamper-resistant environment, ensuring that even in the event of a data breach, protected data remains unreadable. As cyber threats evolve, organizations must take a proactive approach to securing data with enterprise-grade encryption and key management solutions. Microsoft Azure Cloud HSM empowers businesses to meet these challenges head-on, ensuring that security, compliance, and trust remain non-negotiable priorities in the digital age. Key Features of Azure Cloud HSM Azure Cloud HSM ensures high availability and redundancy by automatically clustering multiple HSMs and synchronizing cryptographic data across three instances, eliminating the need for complex configurations. It optimizes performance through load balancing of cryptographic operations, reducing latency. Periodic backups enhance security by safeguarding cryptographic assets and enabling seamless recovery. Designed to meet FIPS 140-3 Level 3, it provides robust security for enterprise applications. Ideal use cases for Azure Cloud HSM Azure Cloud HSM is ideal for organizations migrating security-sensitive applications from on-premises to Azure Virtual Machines or transitioning from Azure Dedicated HSM or AWS Cloud HSM to a fully managed Azure-native solution. It supports applications requiring PKCS#11, OpenSSL, and JCE for seamless cryptographic integration and enables running shrink-wrapped software like Apache/Nginx SSL Offload, Microsoft SQL Server/Oracle TDE, and ADCS on Azure VMs. Additionally, it supports tools and applications that require document and code signing. Get started with Azure Cloud HSM Ready to deploy Azure Cloud HSM? Learn more and start building today: Get Started Deploying Azure Cloud HSM Customers can download the Azure Cloud HSM SDK and Client Tools from GitHub: Microsoft Azure Cloud HSM SDK Stay tuned for further updates as we continue to enhance Microsoft Azure Cloud HSM to support your most demanding security and compliance needs.6.7KViews3likes2CommentsEnhancing Azure Private DNS Resiliency with Internet Fallback
Is your Azure environment prone to DNS resolution hiccups, especially when leveraging Private Link and multiple virtual networks? Dive into our latest blog post, "Enhancing Azure Private DNS Resiliency with Internet Fallback," and discover how to eliminate those frustrating NXDOMAIN errors and ensure seamless application availability. I break down the common challenges faced in complex Azure setups, including isolated Private DNS zones and hybrid environments, and reveal how the new internet fallback feature acts as a vital safety net. Learn how this powerful tool automatically switches to public DNS resolution when private resolution fails, minimizing downtime and simplifying management. Our tutorial walks you through the easy steps to enable internet fallback, empowering you to fortify your Azure networks and enhance application resilience. Whether you're dealing with multi-tenant deployments or intricate service dependencies, this feature is your key to uninterrupted connectivity. Don't let DNS resolution issues disrupt your operations. Read the full article to learn how to implement Azure Private DNS internet fallback and ensure your applications stay online, no matter what.1.9KViews2likes2CommentsProvisioning Azure Storage Containers and Folders Using Bicep and PowerShell
Overview: This blog demonstrates how to: Deploy an Azure Storage Account and Blob containers using Bicep Create a folder-like structure inside those containers using PowerShell This approach is ideal for cloud engineers and DevOps professionals seeking end-to-end automation for structured storage provisioning. Bicep natively supports the creation of: Storage Accounts Containers (via the blobServices resource) However, folders (directories) inside containers are not first-class resources in ARM/Bicep — they're created by uploading a blob with a virtual path, e.g., folder1/blob.txt. So how can we automate the creation of these folder structures without manually uploading dummy blobs? You can check out the blog "Designing Reusable Bicep Modules: A Databricks Example" for a good reference on how to structure the storage account pattern. It covers reusable module design and shows how to keep things clean and consistent. 1. Deploy an Azure Storage Account and Blob containers using Bicep You can provision a Storage Account and its associated Blob Containers using a few lines of code. and the parameters for 'directory services' be like, The process involves: Defining the Microsoft.Storage/storageAccounts resource for the Storage Account. Adding a nested blobServices/containers resource to create Blob containers within it. Using parameters to dynamically assign names, access tiers, and network rules. 2. Create a folder-like structure inside those containers using PowerShell To simulate a directory structure in Azure Data Lake Storage Gen2, use Bicep with deployment scripts that execute az storage fs directory create. This enables automation of folder creation inside blob containers at deployment time. In this setup: A Microsoft.Resources/deploymentScripts resource is used. The az storage fs directory create command creates virtual folders inside containers. Access is authenticated using a secure account key fetched via storageAccount.listKeys() Parameter Flow and Integration The solution uses Bicep’s module linking capabilities: The main module outputs the Storage Account name and container names. These outputs are passed as parameters to the deployment script module. The script loops through each container and folder, uploading a dummy blob to create the folder. Here’s the final setup with the storage account and container ready, plus the directory created inside—everything’s all set! Conclusion This approach is especially useful in enterprise environments where storage structures must be provisioned consistently across environments. You can extend this pattern further to: Tag blobs/folders Assign RBAC roles Handle folder-level metadata Have you faced similar challenges with Azure Storage provisioning? Share your experience or drop a comment!819Views0likes0CommentsDesigning Reusable Bicep Modules: A Databricks Example
In this blog, I’ll walk you through how to design a reusable Bicep module for deploying Azure Databricks, a popular analytics and machine learning platform. We'll focus on creating a parameterized and composable pattern using Bicep and Azure Verified Modules (AVM), enabling your team to replicate this setup across environments with minimal changes. Why Reusability in Bicep Matters: As your Azure environment scales, manually copying and modifying Bicep files for every service or environment becomes error-prone and unmanageable. Reusable Bicep modules help: Eliminate redundant code Enforce naming, tagging, and networking standards Accelerate onboarding of new services or teams Enable self-service infrastructure in CI/CD pipelines Here, we’ll create a reusable module to deploy an Azure Databricks Workspace with: Consistent naming conventions Virtual network injection (VNet) Private endpoint integration (UI, Blob, DFS) Optional DNS zone configuration Role assignments AVM module integration Module Inputs (Parameters) Your Bicep pattern uses several key parameters: Parameterizing the Pattern These parameters allow the module to be flexible yet consistent across multiple environments. Naming conventions: The nameObject structure is used to build consistent names for all resources: var adbworkspaceName = toLower('${nameObject.client}-${nameObject.workloadIdentifier}-${nameObject.environment}-${nameObject.region}-adb-${nameObject.suffix}') Configuring Private Endpoints and DNS The module allows defining private endpoint configurations for both Databricks and storage: This logic ensures: Private access to the Databricks UI Optional DNS zone integration for custom resolution You can extend this to include Blob and DFS storage private endpoints, which are essential for secure data lake integrations. Plugging in the AVM Module: The actual deployment leverages an Azure Verified Module (AVM) stored in an Azure Container Registry (ACR): Example usage: Using the module in your main Bicep deployment: Conclusion This Bicep-based or any reusable modules or patterns enable consistent, secure, and scalable deployments across your Azure environments. Whether you're deploying a single workspace or rolling out 50 across environments, this pattern helps ensure governance and simplicity. Resources Azure Bicep Documentation Azure Verified Modules Azure Databricks Docs393Views1like0CommentsDeploying a GitLab Runner on Azure: A Step-by-Step Guide
This guide walks you through the entire process — from VM setup to running your first successful job. Step 1: Create an Azure VM Log in to the Azure Portal. Create a new VM with the following settings: Image: Ubuntu 20.04 LTS (recommended) Authentication: SSH Public Key (generate a .pem file for secure access) Once created, note the public IP address. Connect to the VM From your terminal: ssh -i "/path/to/your/key.pem" admin_name@<YOUR_VM_PUBLIC_IP> Note: Make sure to replace the above command with path to .pem file and admin name which you would have given during VM deployment. Step 2: Install Docker on the Azure VM Run the following commands to install Docker: sudo apt update && sudo apt upgrade -y sudo apt install -y docker.io sudo systemctl start docker sudo systemctl enable docker #Enable Docker to start automatically on boot sudo usermod -aG docker $USER Test Docker with: docker run hello-world A success message should appear. If you see permission denied, run: newgrp docker Note: Log out and log back in (or restart the VM) for group changes to apply. Step 3: Install GitLab Runner Download the GitLab Runner binary: Assign execution permissions: Install and start the runner as a service: #Step1 sudo chmod +x /usr/local/bin/gitlab-runner #Step2 sudo curl -L --output /usr/local/bin/gitlab-runner \ https://gitlab-runner-downloads.s3.amazonaws.com/latest/binaries/gitlab-runner-linux-amd64 #Step3 sudo gitlab-runner install --user=azureuser sudo gitlab-runner start sudo systemctl enable gitlab-runner #Enable GitLab Runner to start automatically on boot Step 4: Register the GitLab Runner Navigate to runner section on your Gitlab to generate registration token (Gitlab -> Settings -> CI/CD -> Runners -> New Project Runner) On your Azure VM, run: sudo gitlab-runner register \ --url https://gitlab.com/ \ --registration-token <YOUR_TOKEN> \ --executor docker \ --docker-image Ubuntu:22.04 \ --description "Azure VM Runner" \ --tag-list "gitlab-runner-vm" \ --non-interactive Note: Replace the registration toke, description, tag-list as required After registration, restart the runner: sudo gitlab-runner restart Verify the runner’s status with: sudo gitlab-runner list Your runner should appear in the list. If runner does not appear, make sure to follow step 4 as described. Step 5: Add Runner Tags to Your Pipeline In .gitlab-ci.yml default: tags: - gitlab-runner-vm Step 6: Verify Pipeline Execution Create a simple job to test the runner: test-runner: tags: - gitlab-runner-vm script: - echo "Runner is working!" Troubleshooting Common Issues Permission Denied (Docker Error) Error: docker: permission denied while trying to connect to the Docker daemon socket Solution: Run newgrp docker If unresolved, restart Docker: sudo systemctl restart docker No Active Runners Online Error: This job is stuck because there are no active runners online. Solution: Check runner status: sudo gitlab-runner status If inactive, restart the runner: sudo gitlab-runner restart Ensure your runner tag in the pipelines matches the one you provided while creating runner for project Final Tips Always restart the runner after making configuration changes: sudo gitlab-runner restart Remember to periodically check the runner’s status and update its configuration as needed to keep it running smoothly. Happy coding and enjoy the enhanced capabilities of your new GitLab Runner setup!1.7KViews2likes2CommentsCreating an Application Landing Zone on Azure Using Bicep
🧩 What Is an Application Landing Zone? An Application Landing Zone is a pre-configured Azure environment designed to host applications in a secure, scalable, and governed manner. It is a foundational component of the Azure Landing Zones framework, which supports enterprise-scale cloud adoption by providing a consistent and governed environment for deploying workloads. 🔍 Key Characteristics Security and Compliance Built-in policies and controls ensure that applications meet organizational and regulatory requirements. Pre-configured Infrastructure Includes networking, identity, security, monitoring, and governance components that are ready to support application workloads Scalability and Flexibility Designed to scale with application demand, supporting both monolithic and microservices-based architectures. Governance and Management Integrated with Azure Policy, Azure Monitor, and Azure Security Center to enforce governance and provide operational insights. Developer Enablement Provides a consistent environment that accelerates development and deployment cycles. 🏗️ Core Components An Application Landing Zone typically includes: Networking with Virtual Networks (VNets) with subnets and NSGs Azure Active Directory (AAD) integration Role-Based Access Control (RBAC) Azure Key Vault/Managed HSM for secrets management Monitoring and Logging via Azure Monitor and Log Analytics Application Gateway or Azure Front Door for traffic management CI/CD Pipelines integrated with Azure DevOps or GitHub Actions 🛠️ Prerequisites Before deploying the Application Landing Zone, please ensure the following: ✅ Access & Identity Azure Subscription Access: You must have access to an active Azure subscription where the landing zone will be provisioned. This subscription should be part of a broader management group hierarchy if you're following enterprise scale landing zone patterns. A Service Principal (SPN): A Service Principal is required for automating deployments via CI/CD pipelines or Infrastructure as Code (IaC) tools. It should have an atleast Contributor role at the subscription level to create and manage resources. Explicit access to the following is required: - Resource Groups (for deploying application components) - Azure Policy (to assign and manage governance rules) - Azure Key Vault (to retrieve secrets, certificates, or credentials) Azure Active Directory (AAD) Ensure that AAD is properly configured for: - Role-Based Access Control (RBAC) - Group-based access assignments - Conditional Access policies (if applicable) Tip: Use Managed Identities where possible to reduce the need for credential management. ✅ Tooling Azure CLI - Required for scripting and executing deployment commands. - Ensure you're authenticated using az login or a service principal. - Recommended version: 2.55.0 or later for compatibility with Bicep latest Azure features Azure PowerShell - Installed and authenticated (Connect-AzAccount) - Recommended module: Az module version 11.0.0 or later Visual Studio Code Preferred IDE for working with Bicep and ARM templates. - Install the following extensions: - Bicep: for authoring and validating infrastructure templates. - Azure Account: for managing Azure sessions and subscriptions. Source Control & CI/CD Integration Access to GitHub or Azure DevOps is required for: - Storing IaC templates - Automating deployments via pipelines - Managing version control and collaboration � Tip: Use GitHub Actions or Azure Pipelines to automate validation, testing, and deployment of your landing zone templates. ✅ Environment Setup Resource Naming Conventions Define a naming standard that reflects resource type, environment, region, and application. Example: rg-app1-prod-weu for a production resource group in West Europe. Tagging Strategy Predefine tags for: - Cost Management (e.g., CostCenter, Project) - Ownership (e.g., Owner, Team) - Environment (e.g., Dev, Test, Prod) Networking Baseline Ensure that required VNets, subnets, and DNS settings are in place. Plan for hybrid connectivity if integrating with on-premises networks (e.g., via VPN or ExpressRoute). Security Baseline Define and apply: - RBAC roles for least-privilege access - Azure built-in as well as custom Policies for compliance enforcement - NSGs and ASGs for network security 🧱 Application Landing Zone Architecture Using Bicep Bicep is a domain-specific language (DSL) for deploying Azure resources declaratively. It simplifies the authoring experience compared to traditional ARM templates and supports modular, reusable, and maintainable infrastructure-as-code (IaC) practices. The Application Landing Zone (App LZ) architecture leverages Bicep to define and deploy a secure, scalable, and governed environment for hosting applications. This architecture is structured into phases, each representing a logical grouping of resources. These phases align with enterprise cloud adoption frameworks and enable teams to deploy infrastructure incrementally and consistently. 🧱 Architectural Phases The App LZ is typically divided into the following phases, each implemented using modular Bicep templates: 1. Foundation Phase Establishes the core infrastructure and governance baseline: Resource groups Virtual networks and subnets Network security groups (NSGs) Diagnostic settings Azure Policy assignments 2. Identity & Access Phase Implements secure access and identity controls: Role-Based Access Control (RBAC) Azure Active Directory (AAD) integration Managed identities Key Vault access policies 3. Security & Monitoring Phase Ensures observability and compliance: Azure Monitor and Log Analytics Security Center configuration Alerts and action groups Defender for Cloud settings 4. Application Infrastructure Phase Deploys application-specific resources: App Services, AKS, or Function Apps Application Gateway or Azure Front Door Storage accounts, databases, and messaging services Private endpoints and service integrations 5. CI/CD Integration Phase Automates deployment and lifecycle management: GitHub Actions or Azure Pipelines Deployment scripts and parameter files Secrets management via Key Vault Environment-specific configurations 🔁 Modular Bicep Templates Each phase is implemented using modular Bicep templates, which offer: Reusability: Templates can be reused across environments (Dev, Test, Prod). Flexibility: Parameters allow customization without modifying core logic. Incremental Deployment: Phases can be deployed independently or chained together. Testability: Each module can be validated against test cases before full deployment. 💡 Example: A network.bicep module can be reused across multiple landing zones with different subnet configurations. To ensure a smooth and automated deployment experience, here’s the complete flow from setup: ✅ Benefits of This Approach Consistency & Compliance: Enforces Azure best practices and governance policies Modularity: Reusable Bicep modules simplify maintenance and scaling Automation: CI/CD pipelines reduce manual effort and errors Security: Aligns with Microsoft’s security baselines and CAF Scalability: Easily extendable to support new workloads or environments Native Azure Integration: Supports all Azure resources and features. Tooling Support: Integrated with Visual Studio Code, Azure CLI, and GitHub. 🔄 Why Choose Bicep Over Terraform? First-Party Integration: Bicep is a first-party solution maintained by Microsoft, ensuring day-one support for new Azure services and API changes. This means customers can immediately leverage the latest features and updates without waiting for third-party providers to catch up. Azure-Specific Optimization: Bicep is deeply integrated with Azure services, offering a tailored experience for Azure resource management. This integration ensures that deployments are optimized for Azure, providing better performance and reliability. Simplified Syntax: Bicep uses a domain-specific language (DSL) that is more concise and easier to read compared to Terraform's HCL (HashiCorp Configuration Language). This simplicity reduces the learning curve and makes it easier for teams to write and maintain infrastructure code. Incremental Deployment: Unlike Terraform, Bicep does not store state. Instead, it relies on incremental deployment, which simplifies the deployment process and reduces the complexity associated with state management. This approach ensures that resources are deployed consistently without the need for managing state files. Azure Policy Integration: Bicep integrates seamlessly with Azure Policy, allowing for preflight validation to ensure compliance with policies before deployment. This integration helps in maintaining governance and compliance across deployments What-If Analysis: Bicep offers a "what-if" operation that predicts the changes before deploying a Bicep file. This feature allows customers to preview the impact of their changes without making any modifications to the existing infrastructure. 🏁 Conclusion Creating an Application Landing Zone using Bicep provides a robust, scalable, and secure foundation for deploying applications in Azure. By following a phased, modular approach and leveraging automation, organizations can accelerate their cloud adoption journey while maintaining governance and operational excellence.1.5KViews1like0CommentsMicrosoft Fabric: Automate Artifact Deployment with Azure DevOps and Python
Microsoft Fabric is rapidly becoming the go-to platform for enterprise-grade analytics and reporting. However, deploying artifacts like dataflows, datasets, and reports across environments (Dev → Test → Prod) can be a manual and error-prone process. This blog walks you through a fully automated and secure CI/CD solution that uses Azure DevOps Pipelines, Python scripting, and Microsoft Fabric REST APIs to streamline artifact deployment across Fabric workspaces. Whether you’re a DevOps engineer or Fabric administrator, this setup brings speed, security, and consistency to your deployment pipeline. ✅ The Challenge Microsoft Fabric currently provides deployment pipeline console for promotion of artifacts. Manual promotion across environments introduces risks like: Misconfiguration or broken dependencies Lack of traceability or versioning Security and audit concerns with manual artifact movement 🔧 The Solution — Python + Azure DevOps + Fabric API This solution uses a tokenized YAML pipeline combined with a custom Python script to promote Fabric artifacts between environments using Fabric Deployment Pipelines and REST APIs. 🔑 Key Advantages & How This Helps You ✅ Zero-Touch Deployment – Automates Dev → Test artifact promotion using Fabric Deployment APIs ✅ Repeatable & Consistent – YAML pipelines enforce consistent promotion logic ✅ Secure Authentication – OAuth2 ROPC flow with service account credentials ✅ Deployment Visibility – Logs tracked via DevOps and Fabric API responses ✅ Low Overhead – Just a lightweight Python script—no external tools needed 🧩 Core Features 1️⃣ Fabric Deployment Pipeline Integration – Automates artifact promotion across Dev, Test, and Prod stages 2️⃣ Environment-Aware Deployment – Supports variable groups and environment-specific parameters 3️⃣ Flexible API Control – Granular stage control with REST API interactions 4️⃣ Real-Time Status Logging – Pipeline polls deployment status from Fabric 5️⃣ Modular YAML Architecture – Easy to plug into any existing DevOps pipeline 6️⃣ Secure Secrets Management – Credentials and sensitive info managed via DevOps variable groups ⚙️ How It Works Define a Fabric Deployment Pipeline in your source workspace (Dev). Configure Azure DevOps pipeline with YAML and use Python to trigger the Fabric deployment stage. Promote artifacts (notebooks, data pipelines, semantic models, reports, lakehouses, etc.) between environments using Fabric REST APIs. Monitor deployment in DevOps logs and optionally via Fabric’s deploymentPipelineRuns endpoint. 📌 Sample: Python API Trigger Logic Step-by-Step Setup 1. Create a Deployment Pipeline in Fabric Go to Microsoft Fabric Portal . Navigate to Deployment Pipelines. Click Create pipeline, and provide a name. The pipeline will have default Development, Test, and Production stages. 2. Assign Workspaces to Stages For each stage (Dev, Test), click Assign Workspace. Choose the appropriate Fabric workspace. Click Save and Apply. 3. Copy the Deployment Pipeline ID Open the created pipeline. In the browser URL, copy the ID: Store this ID in a DevOps variable group. 4. Update Placeholder Values Replace all placeholder values like: tenant_id username client_id deployment_pipeline_id Use variable groups for security. 5. Create Variable Groups in Azure DevOps a. Create a Group: fabric-secrets Store secrets like: tenant_id client_id service-acc-username service-acc-key b. Create a Group: fabric-ids Store: deployment_pipeline_id dev_stage_id test_stage_id 6. Get Stage IDs via API Use this API to get stage IDs: Or use a helper script to extract dev_stage_id and test_stage_id, then update them in your fabric-ids variable group. 7. Azure DevOps YAML Pipeline Save the below in a .yml file in your repo: trigger: none variables: - group: fabric-secrets - group: fabric-ids stages: - stage: Generate_And_Deploy_Artifacts displayName: 'Generate and Deploy Artifacts' jobs: - job: Generate_And_Deploy_Artifacts_Job displayName: 'Generate and Deploy Artifacts' steps: - publish: $(System.DefaultWorkingDirectory) artifact: fabric-artifacts-$(System.StageName) displayName: 'Publish Configuration Files' - script: | echo "🚀 Running test.py..." export tenant_id=$(tenant_id) export username=$(service-acc-username) export password=$(service-acc-key) export client_id=$(client_id) export deployment_pipeline_id=$(deployment_pipeline_id) export dev_stage_id=$(dev_stage_id) export test_stage_id=$(test_stage_id) python fabric-artifacts-deploy/artifacts/test.py displayName: 'Run Deployment Script (Dev → Test)' ''' 8. Python Script - test.py import json import os import time tenant_id = os.getenv('tenant_id') client_id = os.getenv('client_id') username = os.getenv('username') password = os.getenv('password') deployment_pipeline_id = os.getenv('deployment_pipeline_id') dev_stage_id = os.getenv('dev_stage_id') test_stage_id = os.getenv('test_stage_id') deploy_url = f'https://api.fabric.microsoft.com/v1/deploymentPipelines/{deployment_pipeline_id}/deploy' token_url = f'https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token' def get_access_token(): token_data = { 'grant_type': 'password', 'client_id': client_id, 'scope': 'https://api.fabric.microsoft.com/.default offline_access', 'username': username, 'password': password, } response = requests.post(token_url, data=token_data) if response.status_code == 200: return response.json().get('access_token') else: print("❌ Failed to authenticate") print(response.text) exit(1) def poll_operation_status(location_url, headers): while True: response = requests.get(location_url, headers=headers) if response.status_code in [200, 201]: status = response.json().get("status", "Unknown") print(f"⏳ Status: {status}") if status == "Succeeded": print("✅ Deployment successful!") break elif status == "Failed": print("❌ Deployment failed!") print(response.text) exit(1) time.sleep(5) elif response.status_code == 202: retry_after = int(response.headers.get("Retry-After", 5)) print(f"Waiting {retry_after} seconds...") time.sleep(retry_after) else: print("❌ Unexpected response") print(response.text) exit(1) def deploy(source_stage_id, target_stage_id, note, token): payload = { "sourceStageId": source_stage_id, "targetStageId": target_stage_id, "note": note } headers = { 'Authorization': f'Bearer {token}', 'Content-Type': 'application/json' } response = requests.post(deploy_url, headers=headers, data=json.dumps(payload)) if response.status_code in [200, 201]: print("✅ Deployment completed") elif response.status_code == 202: location_url = response.headers.get('location') if location_url: poll_operation_status(location_url, headers) else: print("❌ Location header missing in 202 response") else: print("❌ Deployment failed") print(response.text) if __name__ == "__main__": token = get_access_token() print("✅ Token acquired") deploy(dev_stage_id, test_stage_id, "Deploy Dev → Test", token) ✅ Result After running the DevOps pipeline: Artifacts from Dev workspace are deployed to Test workspace. Logs are visible in the pipeline run. 💡 Why Python? Python acts as the glue between Azure DevOps and Microsoft Fabric: Retrieves authentication tokens Triggers Fabric pipeline stages Parses deployment responses Easy to integrate with YAML via script tasks This approach keeps your CI/CD stack clean, lightweight, and fully automatable. 🚀 Get Started Today Use this solution to: Accelerate delivery across environments Eliminate manual promotion risk Improve deployment visibility Enable DevOps best practices within Microsoft Fabric1.3KViews1like0Comments