updates
25 TopicsDefending the cloud: Azure neutralized a record-breaking 15 Tbps DDoS attack
On October 24, 2025, Azure DDOS Protection automatically detected and mitigated a multi-vector DDoS attack measuring 15.72 Tbps and nearly 3.64 billion packets per second (pps). This was the largest DDoS attack ever observed in the cloud and it targeted a single endpoint in Australia. By utilizing Azure’s globally distributed DDoS Protection infrastructure and continuous detection capabilities, mitigation measures were initiated. Malicious traffic was effectively filtered and redirected, maintaining uninterrupted service availability for customer workloads. The attack originated from Aisuru botnet. Aisuru is a Turbo Mirai-class IoT botnet that frequently causes record-breaking DDoS attacks by exploiting compromised home routers and cameras, mainly in residential ISPs in the United States and other countries. The attack involved extremely high-rate UDP floods targeting a specific public IP address, launched from over 500,000 source IPs across various regions. These sudden UDP bursts had minimal source spoofing and used random source ports, which helped simplify traceback and facilitated provider enforcement. Attackers are scaling with the internet itself. As fiber-to-the-home speeds rise and IoT devices get more powerful, the baseline for attack size keeps climbing. As we approach the upcoming holiday season, it is essential to confirm that all internet-facing applications and workloads are adequately protected against DDOS attacks. Additionally, do not wait for an actual attack to assess your defensive capabilities or operational readiness—conduct regular simulations to identify and address potential issues proactively. Learn more about Azure DDOS Protection at Azure DDoS Protection Overview | Microsoft Learn46KViews6likes3CommentsAnnouncing Kubernetes Center (Preview) On Azure Portal
Today, we’re excited to introduce the Kubernetes Center in the Azure portal, a new experience to simplify how customers manage, monitor, and optimize Azure Kubernetes Services environments at scale. The Kubernetes Center provides a unified view across all clusters, intelligent insights, and streamlined workflows that help platform teams stay in control while enabling developers to move fast. As Kubernetes adoption accelerates, many teams face growing challenges in managing clusters and workloads at scale. Getting a quick snapshot of what needs attention across clusters and workloads can quickly become overwhelming. Kubernetes Center is designed to change that, offering a streamlined and intuitive experience that brings everything together in one place, brings the most critical Kubernetes capabilities into a single pane of glass for unified visibility and control. What is Kubernetes Center?: Actionable insights from the start: Kubernetes Center surfaces key issues like security vulnerabilities, cluster alerts, compliance gaps, and upgrade recommendations in a single, unified view. This helps teams focus immediately on what matters most, leading to faster resolution times, improved security posture, and greater operational clarity. Streamlined management experience: By bringing together AKS, AKS Automatic, Fleet Manager, and Managed Namespaces into a single experience, we’ve reduced the need to jump between services. Everything you need to manage Kubernetes on Azure is now organized in one consistent interface. Centralized Quickstarts: Whether you’re getting started or troubleshooting advanced scenarios, Kubernetes Center brings relevant documentation, learning resources, and in-context help into one place so you can spend less time searching and more time building. Azure Portal: From Distinct landing experiences for AKS, Fleet Manager, and Managed Kubernetes Namespaces: To a streamlined management experience: Get the big picture at a glance, then dive deeper with individual pages designed for effortless discovery. Centralized Quickstarts: Next Steps: Build on your momentum by exploring Kubernetes Center. Create your first AKS cluster or deploy your first application using the Deploy Your Application flow and track your progress in real time or Check out the new experience and instantly see your existing clusters in a streamlined management experience. Your feedback will help shape what comes next. Start building today with Kubernetes Center on Azure Portal! Learn more: Create and Manage Kubernetes resources in the Azure portal with Kubernetes Center (preview) - Azure Kubernetes Service | Microsoft Learn FAQ: What products from Azure are included in Kubernetes Center? A. Kubernetes Center brings together all your Azure Kubernetes resources such as AKS, AKS Automatic, Fleet Manager, and Managed Namespaces into a single interface for simplified operations. Create new resources or view your existing resources in Kubernetes Center. Does Kubernetes Center handle multi-cluster management? A. Kubernetes Center provides a unified interface aka single pane of glass to view and monitor all your Kubernetes resources in one place. For multi-cluster operations like upgrading Kubernetes Version, placing cluster resources on N clusters, policy management, and coordination across environments, Kubernetes Fleet Manager is the solution designed to handle that complexity at scale. It enables teams to manage clusters at scale with automation, consistency, and operational control. Does Kubernetes Center provide security and compliance insights? A. Absolutely. When Microsoft Defender for Containers is enabled, Kubernetes Center surfaces critical security vulnerabilities and compliance gaps across your clusters. Where can I find help and documentation? A. All relevant documentation, QuickStarts, and learning resources are available directly within Kubernetes Center, making it easier to get support without leaving the platform. For more information: Create and Manage Kubernetes resources in the Azure portal with Kubernetes Center (preview) - Azure Kubernetes Service | Microsoft Learn What is the status of this launch? A. Kubernetes Center is currently in preview, offering core capabilities with more features planned for the general availability release. What is the roadmap for GA? A. Our roadmap includes adding new features and introducing tailored views designed for Admins and Developers. We also plan to enhance support for multi-cluster capabilities in Azure Fleet Manager, enabling smoother and more efficient operations within the Kubernetes Center.3KViews10likes0CommentsResiliency Best Practices You Need For your Blob Storage Data
Maintaining Resiliency in Azure Blob Storage: A Guide to Best Practices Azure Blob Storage is a cornerstone of modern cloud storage, offering scalable and secure solutions for unstructured data. However, maintaining resiliency in Blob Storage requires careful planning and adherence to best practices. In this blog, I’ll share practical strategies to ensure your data remains available, secure, and recoverable under all circumstances. 1. Enable Soft Delete for Accidental Recovery (Most Important) Mistakes happen, but soft delete can be your safety net and. It allows you to recover deleted blobs within a specified retention period: Configure a soft delete retention period in Azure Storage. Regularly monitor your blob storage to ensure that critical data is not permanently removed by mistake. Enabling soft delete in Azure Blob Storage does not come with any additional cost for simply enabling the feature itself. However, it can potentially impact your storage costs because the deleted data is retained for the configured retention period, which means: The retained data contributes to the total storage consumption during the retention period. You will be charged according to the pricing tier of the data (Hot, Cool, or Archive) for the duration of retention 2. Utilize Geo-Redundant Storage (GRS) Geo-redundancy ensures your data is replicated across regions to protect against regional failures: Choose RA-GRS (Read-Access Geo-Redundant Storage) for read access to secondary replicas in the event of a primary region outage. Assess your workload’s RPO (Recovery Point Objective) and RTO (Recovery Time Objective) needs to select the appropriate redundancy. 3. Implement Lifecycle Management Policies Efficient storage management reduces costs and ensures long-term data availability: Set up lifecycle policies to transition data between hot, cool, and archive tiers based on usage. Automatically delete expired blobs to save on costs while keeping your storage organized. 4. Secure Your Data with Encryption and Access Controls Resiliency is incomplete without robust security. Protect your blobs using: Encryption at Rest: Azure automatically encrypts data using server-side encryption (SSE). Consider enabling customer-managed keys for additional control. Access Policies: Implement Shared Access Signatures (SAS) and Stored Access Policies to restrict access and enforce expiration dates. 5. Monitor and Alert for Anomalies Stay proactive by leveraging Azure’s monitoring capabilities: Use Azure Monitor and Log Analytics to track storage performance and usage patterns. Set up alerts for unusual activities, such as sudden spikes in access or deletions, to detect potential issues early. 6. Plan for Disaster Recovery Ensure your data remains accessible even during critical failures: Create snapshots of critical blobs for point-in-time recovery. Enable backup for blog & have the immutability feature enabled Test your recovery process regularly to ensure it meets your operational requirements. 7. Resource lock Adding Azure Locks to your Blob Storage account provides an additional layer of protection by preventing accidental deletion or modification of critical resources 7. Educate and Train Your Team Operational resilience often hinges on user awareness: Conduct regular training sessions on Blob Storage best practices. Document and share a clear data recovery and management protocol with all stakeholders. 8. "Critical Tip: Do Not Create New Containers with Deleted Names During Recovery" If a container or blob storage is deleted for any reason and recovery is being attempted, it’s crucial not to create a new container with the same name immediately. Doing so can significantly hinder the recovery process by overwriting backend pointers, which are essential for restoring the deleted data. Always ensure that no new containers are created using the same name during the recovery attempt to maximize the chances of successful restoration. Wrapping It Up Azure Blob Storage offers an exceptional platform for scalable and secure storage, but its resiliency depends on following best practices. By enabling features like soft delete, implementing redundancy, securing data, and proactively monitoring your storage environment, you can ensure that your data is resilient to failures and recoverable in any scenario. Protect your Azure resources with a lock - Azure Resource Manager | Microsoft Learn Data redundancy - Azure Storage | Microsoft Learn Overview of Azure Blobs backup - Azure Backup | Microsoft Learn Protect your Azure resources with a lock - Azure Resource Manager | Microsoft Learn1.2KViews1like1CommentMastering Azure Queries: Skip Token and Batching for Scale
Let's be honest. As a cloud engineer or DevOps professional managing a large Azure environment, running even a simple resource inventory query can feel like drinking from a firehose. You hit API limits, face slow performance, and struggle to get the complete picture of your estate—all because the data volume is overwhelming. But it doesn't have to be this way! This blog is your practical, hands-on guide to mastering two essential techniques for handling massive data volumes in Azure: using PowerShell and Azure Resource Graph (ARG): Skip Token (for full data retrieval) and Batching (for blazing-fast performance). 📋 TABLE OF CONTENTS 🚀 GETTING STARTED │ ├─ Prerequisites: PowerShell 7+ & Az.ResourceGraph Module └─ Introduction: Why Standard Queries Fail at Scale 📖 CORE CONCEPTS │ ├─ 📑 Skip Token: The Data Completeness Tool │ ├─ What is a Skip Token? │ ├─ The Bookmark Analogy │ ├─ PowerShell Implementation │ └─ 💻 Code Example: Pagination Loop │ └─ ⚡ Batching: The Performance Booster ├─ What is Batching? ├─ Performance Benefits ├─ Batching vs. Pagination ├─ Parallel Processing in PowerShell └─ 💻 Code Example: Concurrent Queries 🔍 DEEP DIVE │ ├─ Skip Token: Generic vs. Azure-Specific └─ Azure Resource Graph (ARG) at Scale ├─ ARG Overview ├─ Why ARG Needs These Techniques └─ 💻 Combined Example: Skip Token + Batching ✅ BEST PRACTICES │ ├─ When to Use Each Technique └─ Quick Reference Guide 📚 RESOURCES └─ Official Documentation & References Prerequisites Component Requirement / Details Command / Reference PowerShell Version The batching examples use ForEach-Object -Parallel, which requires PowerShell 7.0 or later. Check version: $PSVersionTable.PSVersion Install PowerShell 7+: Install PowerShell on Windows, Linux, and macOS Azure PowerShell Module Az.ResourceGraph module must be installed. Install module: Install-Module -Name Az.ResourceGraph -Scope CurrentUser Introduction: Why Standard Queries Don't Work at Scale When you query a service designed for big environments, like Azure Resource Graph, you face two limits: Result Limits (Pagination): APIs won't send you millions of records at once. They cap the result size (often 1,000 items) and stop. Efficiency Limits (Throttling): Sending a huge number of individual requests is slow and can cause the API to temporarily block you (throttling). Skip Token helps you solve the first limit by making sure you retrieve all results. Batching solves the second by grouping your requests to improve performance. Understanding Skip Token: The Continuation Pointer What is a Skip Token? A Skip Token (or continuation token) is a unique string value returned by an Azure API when a query result exceeds the maximum limit for a single response. Think of the Skip Token as a “bookmark” that tells Azure where your last page ended — so you can pick up exactly where you left off in the next API call. Instead of getting cut off after 1,000 records, the API gives you the first 1,000 results plus the Skip Token. You use this token in the next request to get the next page of data. This process is called pagination. Skip Token in Practice with PowerShell To get the complete dataset, you must use a loop that repeatedly calls the API, providing the token each time until the token is no longer returned. PowerShell Example: Using Skip Token to Loop Pages # Define the query $Query = "Resources | project name, type, location" $PageSize = 1000 $AllResults = @() $SkipToken = $null # Initialize the token Write-Host "Starting ARG query..." do { Write-Host "Fetching next page. (Token check: $($SkipToken -ne $null))" # 1. Execute the query, using the -SkipToken parameter $ResultPage = Search-AzGraph -Query $Query -First $PageSize -SkipToken $SkipToken # 2. Add the current page results to the main array $AllResults += $ResultPage # 3. Get the token for the next page, if it exists $SkipToken = $ResultPage.SkipToken Write-Host " -> Items in this page: $($ResultPage.Count). Total retrieved: $($AllResults.Count)" } while ($SkipToken -ne $null) # Loop as long as a Skip Token is returned Write-Host "Query finished. Total resources found: $($AllResults.Count)" This do-while loop is the reliable way to ensure you retrieve every item in a large result set. Understanding Batching: Grouping Requests What is Batching? Batching means taking several independent requests and combining them into a single API call. Instead of making N separate network requests for N pieces of data, you make one request containing all N sub-requests. Batching is primarily used for performance. It improves efficiency by: Reducing Overhead: Fewer separate network connections are needed. Lowering Throttling Risk: Fewer overall API calls are made, which helps you stay under rate limits. Feature Batching Pagination (Skip Token) Goal Improve efficiency/speed. Retrieve all data completely. Input Multiple different queries. Single query, continuing from a marker. Result One response with results for all grouped queries. Partial results with a token for the next step. Note: While Azure Resource Graph's REST API supports batch requests, the PowerShell Search-AzGraph cmdlet does not expose a -Batch parameter. Instead, we achieve batching by using PowerShell's ForEach-Object -Parallel (PowerShell 7+) to run multiple queries simultaneously. Batching in Practice with PowerShell Using parallel processing in PowerShell, you can efficiently execute multiple distinct Kusto queries targeting different scopes (like subscriptions) simultaneously. Method 5 Subscriptions 20 Subscriptions Sequential ~50 seconds ~200 seconds Parallel (ThrottleLimit 5) ~15 seconds ~45 seconds PowerShell Example: Running Multiple Queries in Parallel # Define multiple queries to run together $BatchQueries = @( @{ Query = "Resources | where type =~ 'Microsoft.Compute/virtualMachines'" Subscriptions = @("SUB_A") # Query 1 Scope }, @{ Query = "Resources | where type =~ 'Microsoft.Network/publicIPAddresses'" Subscriptions = @("SUB_B", "SUB_C") # Query 2 Scope } ) Write-Host "Executing batch of $($BatchQueries.Count) queries in parallel..." # Execute queries in parallel (true batching) $BatchResults = $BatchQueries | ForEach-Object -Parallel { $QueryConfig = $_ $Query = $QueryConfig.Query $Subs = $QueryConfig.Subscriptions Write-Host "[Batch Worker] Starting query: $($Query.Substring(0, [Math]::Min(50, $Query.Length)))..." -ForegroundColor Cyan $QueryResults = @() # Process each subscription in this query's scope foreach ($SubId in $Subs) { $SkipToken = $null do { $Params = @{ Query = $Query Subscription = $SubId First = 1000 } if ($SkipToken) { $Params['SkipToken'] = $SkipToken } $Result = Search-AzGraph @Params if ($Result) { $QueryResults += $Result } $SkipToken = $Result.SkipToken } while ($SkipToken) } Write-Host " [Batch Worker] ✅ Query completed - Retrieved $($QueryResults.Count) resources" -ForegroundColor Green # Return results with metadata [PSCustomObject]@{ Query = $Query Subscriptions = $Subs Data = $QueryResults Count = $QueryResults.Count } } -ThrottleLimit 5 Write-Host "`nBatch complete. Reviewing results..." # The results are returned in the same order as the input array $VMCount = $BatchResults[0].Data.Count $IPCount = $BatchResults[1].Data.Count Write-Host "Query 1 (VMs) returned: $VMCount results." Write-Host "Query 2 (IPs) returned: $IPCount results." # Optional: Display detailed results Write-Host "`n--- Detailed Results ---" for ($i = 0; $i -lt $BatchResults.Count; $i++) { $Result = $BatchResults[$i] Write-Host "`nQuery $($i + 1):" Write-Host " Query: $($Result.Query)" Write-Host " Subscriptions: $($Result.Subscriptions -join ', ')" Write-Host " Total Resources: $($Result.Count)" if ($Result.Data.Count -gt 0) { Write-Host " Sample (first 3):" $Result.Data | Select-Object -First 3 | Format-Table -AutoSize } } Azure Resource Graph (ARG) and Scale Azure Resource Graph (ARG) is a service built for querying resource properties quickly across a large number of Azure subscriptions using the Kusto Query Language (KQL). Because ARG is designed for large scale, it fully supports Skip Token and Batching: Skip Token: ARG automatically generates and returns the token when a query exceeds its result limit (e.g., 1,000 records). Batching: ARG's REST API provides a batch endpoint for sending up to ten queries in a single request. In PowerShell, we achieve similar performance benefits using ForEach-Object -Parallel to process multiple queries concurrently. Combined Example: Batching and Skip Token Together This script shows how to use Batching to start a query across multiple subscriptions and then use Skip Token within the loop to ensure every subscription's data is fully retrieved. $SubscriptionIDs = @("SUB_A") $KQLQuery = "Resources | project id, name, type, subscriptionId" Write-Host "Starting BATCHED query across $($SubscriptionIDs.Count) subscription(s)..." Write-Host "Using parallel processing for true batching...`n" # Process subscriptions in parallel (batching) $AllResults = $SubscriptionIDs | ForEach-Object -Parallel { $SubId = $_ $Query = $using:KQLQuery $SubResults = @() Write-Host "[Batch Worker] Processing Subscription: $SubId" -ForegroundColor Cyan $SkipToken = $null $PageCount = 0 do { $PageCount++ # Build parameters $Params = @{ Query = $Query Subscription = $SubId First = 1000 } if ($SkipToken) { $Params['SkipToken'] = $SkipToken } # Execute query $Result = Search-AzGraph @Params if ($Result) { $SubResults += $Result Write-Host " [Batch Worker] Sub: $SubId - Page $PageCount - Retrieved $($Result.Count) resources" -ForegroundColor Yellow } $SkipToken = $Result.SkipToken } while ($SkipToken) Write-Host " [Batch Worker] ✅ Completed $SubId - Total: $($SubResults.Count) resources" -ForegroundColor Green # Return results from this subscription $SubResults } -ThrottleLimit 5 # Process up to 5 subscriptions simultaneously Write-Host "`n--- Batch Processing Finished ---" Write-Host "Final total resource count: $($AllResults.Count)" # Optional: Display sample results if ($AllResults.Count -gt 0) { Write-Host "`nFirst 5 resources:" $AllResults | Select-Object -First 5 | Format-Table -AutoSize } Technique Use When... Common Mistake Actionable Advice Skip Token You must retrieve all data items, expecting more than 1,000 results. Forgetting to check for the token; you only get partial data. Always use a do-while loop to guarantee you get the complete set. Batching You need to run several separate queries (max 10 in ARG) efficiently. Putting too many queries in the batch, causing the request to fail. Group up to 10 logical queries or subscriptions into one fast request. By combining Skip Token for data completeness and Batching for efficiency, you can confidently query massive Azure estates without hitting limits or missing data. These two techniques — when used together — turn Azure Resource Graph from a “good tool” into a scalable discovery engine for your entire cloud footprint. Summary: Skip Token and Batching in Azure Resource Graph Goal: Efficiently query massive Azure environments using PowerShell and Azure Resource Graph (ARG). 1. Skip Token (The Data Completeness Tool) Concept What it Does Why it Matters PowerShell Use Skip Token A marker returned by Azure APIs when results hit the 1,000-item limit. It points to the next page of data. Ensures you retrieve all records, avoiding incomplete data (pagination). Use a do-while loop with the -SkipToken parameter in Search-AzGraph until the token is no longer returned. 2. Batching (The Performance Booster) Concept What it Does Why it Matters PowerShell Use Batching Processes multiple independent queries simultaneously using parallel execution. Drastically improves query speed by reducing overall execution time and helps avoid API throttling. Use ForEach-Object -Parallel (PowerShell 7+) with -ThrottleLimit to control concurrent queries. For PowerShell 5.1, use Start-Job with background jobs. 3. Best Practice: Combine Them For maximum efficiency, combine Batching and Skip Token. Use batching to run queries across multiple subscriptions simultaneously and use the Skip Token logic within the loop to ensure every single subscription's data is fully paginated and retrieved. Result: Fast, complete, and reliable data collection across your large Azure estate. References: Azure Resource Graph documentation Search-AzGraph PowerShell reference378Views1like2CommentsOperational Excellence In AI Infrastructure Fleets: Standardized Node Lifecycle Management
Co-authors: Choudary Maddukuri and Bhushan Mehendale AI infrastructure is scaling at an unprecedented pace, and the complexity of managing it is growing just as quickly. Onboarding new hardware into hyperscale fleets can take months, slowed by fragmented tools, vendor-specific firmware, and inconsistent diagnostics. As hyperscalers expand with diverse accelerators and CPU architectures, operational friction has become a critical bottleneck. Microsoft, in collaboration with the Open Compute Project (OCP) and leading silicon partners, is addressing this challenge. By standardizing lifecycle management across heterogeneous fleets, we’ve dramatically reduced onboarding effort, improved reliability, and achieved >95% Nodes-in-Service on incredibly large fleet sizes. This blog explores how we are contributing to and leveraging open standards to transform fragmented infrastructure into scalable, vendor-neutral AI platforms. Industry Context & Problem The rapid growth of generative AI has accelerated the adoption of GPUs and accelerators from multiple vendors, alongside diverse CPU architectures such as Arm and x86. Each new hardware SKU introduces its own ecosystem of proprietary tools, firmware update processes, management interfaces, reliability mechanisms, and diagnostic workflows. This hardware diversity leads to engineering toil, delayed deployments, and inconsistent customer experiences. Without a unified approach to lifecycle management, hyperscalers face escalating operational costs, slower innovation, and reduced efficiency. Node Lifecycle Standardization: Enabling Scalable, Reliable AI Infrastructure Microsoft, through the Open Compute Project (OCP) in collaboration with AMD, Arm, Google, Intel, Meta, and NVIDIA, is leading an industry-wide initiative to standardize AI infrastructure lifecycle management across GPU and CPU hardware management workstreams. Historically, onboarding each new SKU was a highly resource-intensive effort due to custom implementations and vendor-specific behaviors that required extensive Azure integration. This slowed scalability, increased engineering overhead, and limited innovation. With standardized node lifecycle processes and compliance tooling, hyperscalers can now onboard new SKUs much faster, achieving over 70% reduction in effort while enhancing overall fleet operational excellence. These efforts also enable silicon vendors to ensure interoperability across multiple cloud providers. Figure: How Standardization benefits both Hyperscalers & Suppliers. Key Benefits and Capabilities Firmware Updates: Firmware update mechanisms aligned with DMTF standards, minimize downtime and streamline fleet-wide secure deployments. Unified Manageability Interfaces: Standardized Redfish APIs and PLDM protocols create a consistent framework for out-of-band management, reducing integration overhead and ensuring predictable behavior across hardware vendors. RAS (Reliability, Availability and Serviceability) Features: Standardization enforces minimum RAS requirements across all IP blocks, including CPER (Common Platform Error Record) based error logging, crash dumps, and error recovery flows to enhance system uptime. Debug & Diagnostics: Unified APIs and standardized crash & debug dump formats reduce issue resolution time from months to days. Streamlined diagnostic workflows enable precise FRU isolation and clear service actions. Compliance Tooling: Tool contributions such as CTAM (Compliance Tool for Accelerator Manageability) and CPACT (Cloud Processor Accessibility Compliance Tool) automate compliance and acceptance testing—ensuring suppliers meet hyperscaler requirements for seamless onboarding. Technical Specifications & Contributions Through deep collaboration within the Open Compute Project (OCP) community, Microsoft and its partners have published multiple specifications that streamline SKU development, validation, and fleet operations. Summary of Key Contributions Specification Focus Area Benefit GPU Firmware Update requirements Firmware Updates Enables consistent firmware update processes across vendors GPU Management Interfaces Manageability Standardizes telemetry and control via Redfish/PLDM GPU RAS Requirements Reliability and Availability Reduces AI job interruptions caused by hardware errors CPU Debug and RAS requirements Debug and Diagnostics Achieves >95% node serviceability through unified diagnostics and debug CPU Impactless Updates requirements Impactless Updates Enables Impactless firmware updates to address security and quality issues without workload interruptions Compliance Tools Validation Automates specification compliance testing for faster hardware onboarding Embracing Open Standards: A Collaborative Shift in AI Infrastructure Management This standardized approach to lifecycle management represents a foundational shift in how AI infrastructure is maintained. By embracing open standards and collaborative innovation, the industry can scale AI deployments faster, with greater reliability and lower operational cost. Microsoft’s leadership within the OCP community—and its deep partnerships with other Hyperscalers and silicon vendors—are paving the way for scalable, interoperable, and vendor-neutral AI infrastructure across the global cloud ecosystem. To learn more about Microsoft’s datacenter innovations, check out the virtual datacenter tour at datacenters.microsoft.com.789Views0likes0CommentsMicrosoft Azure Cloud HSM is now generally available
Microsoft Azure Cloud HSM is now generally available. Azure Cloud HSM is a highly available, FIPS 140-3 Level 3 validated single-tenant hardware security module (HSM) service designed to meet the highest security and compliance standards. With full administrative control over their HSM, customers can securely manage cryptographic keys and perform cryptographic operations within their own dedicated Cloud HSM cluster. In today’s digital landscape, organizations face an unprecedented volume of cyber threats, data breaches, and regulatory pressures. At the heart of securing sensitive information lies a robust key management and encryption strategy, which ensures that data remains confidential, tamper-proof, and accessible only to authorized users. However, encryption alone is not enough. How cryptographic keys are managed determines the true strength of security. Every interaction in the digital world from processing financial transactions, securing applications like PKI, database encryption, document signing to securing cloud workloads and authenticating users relies on cryptographic keys. A poorly managed key is a security risk waiting to happen. Without a clear key management strategy, organizations face challenges such as data exposure, regulatory non-compliance and operational complexity. An HSM is a cornerstone of a strong key management strategy, providing physical and logical security to safeguard cryptographic keys. HSMs are purpose-built devices designed to generate, store, and manage encryption keys in a tamper-resistant environment, ensuring that even in the event of a data breach, protected data remains unreadable. As cyber threats evolve, organizations must take a proactive approach to securing data with enterprise-grade encryption and key management solutions. Microsoft Azure Cloud HSM empowers businesses to meet these challenges head-on, ensuring that security, compliance, and trust remain non-negotiable priorities in the digital age. Key Features of Azure Cloud HSM Azure Cloud HSM ensures high availability and redundancy by automatically clustering multiple HSMs and synchronizing cryptographic data across three instances, eliminating the need for complex configurations. It optimizes performance through load balancing of cryptographic operations, reducing latency. Periodic backups enhance security by safeguarding cryptographic assets and enabling seamless recovery. Designed to meet FIPS 140-3 Level 3, it provides robust security for enterprise applications. Ideal use cases for Azure Cloud HSM Azure Cloud HSM is ideal for organizations migrating security-sensitive applications from on-premises to Azure Virtual Machines or transitioning from Azure Dedicated HSM or AWS Cloud HSM to a fully managed Azure-native solution. It supports applications requiring PKCS#11, OpenSSL, and JCE for seamless cryptographic integration and enables running shrink-wrapped software like Apache/Nginx SSL Offload, Microsoft SQL Server/Oracle TDE, and ADCS on Azure VMs. Additionally, it supports tools and applications that require document and code signing. Get started with Azure Cloud HSM Ready to deploy Azure Cloud HSM? Learn more and start building today: Get Started Deploying Azure Cloud HSM Customers can download the Azure Cloud HSM SDK and Client Tools from GitHub: Microsoft Azure Cloud HSM SDK Stay tuned for further updates as we continue to enhance Microsoft Azure Cloud HSM to support your most demanding security and compliance needs.6.8KViews3likes2CommentsEnhancing Azure Private DNS Resiliency with Internet Fallback
Is your Azure environment prone to DNS resolution hiccups, especially when leveraging Private Link and multiple virtual networks? Dive into our latest blog post, "Enhancing Azure Private DNS Resiliency with Internet Fallback," and discover how to eliminate those frustrating NXDOMAIN errors and ensure seamless application availability. I break down the common challenges faced in complex Azure setups, including isolated Private DNS zones and hybrid environments, and reveal how the new internet fallback feature acts as a vital safety net. Learn how this powerful tool automatically switches to public DNS resolution when private resolution fails, minimizing downtime and simplifying management. Our tutorial walks you through the easy steps to enable internet fallback, empowering you to fortify your Azure networks and enhance application resilience. Whether you're dealing with multi-tenant deployments or intricate service dependencies, this feature is your key to uninterrupted connectivity. Don't let DNS resolution issues disrupt your operations. Read the full article to learn how to implement Azure Private DNS internet fallback and ensure your applications stay online, no matter what.2KViews2likes2CommentsProvisioning Azure Storage Containers and Folders Using Bicep and PowerShell
Overview: This blog demonstrates how to: Deploy an Azure Storage Account and Blob containers using Bicep Create a folder-like structure inside those containers using PowerShell This approach is ideal for cloud engineers and DevOps professionals seeking end-to-end automation for structured storage provisioning. Bicep natively supports the creation of: Storage Accounts Containers (via the blobServices resource) However, folders (directories) inside containers are not first-class resources in ARM/Bicep — they're created by uploading a blob with a virtual path, e.g., folder1/blob.txt. So how can we automate the creation of these folder structures without manually uploading dummy blobs? You can check out the blog "Designing Reusable Bicep Modules: A Databricks Example" for a good reference on how to structure the storage account pattern. It covers reusable module design and shows how to keep things clean and consistent. 1. Deploy an Azure Storage Account and Blob containers using Bicep You can provision a Storage Account and its associated Blob Containers using a few lines of code. and the parameters for 'directory services' be like, The process involves: Defining the Microsoft.Storage/storageAccounts resource for the Storage Account. Adding a nested blobServices/containers resource to create Blob containers within it. Using parameters to dynamically assign names, access tiers, and network rules. 2. Create a folder-like structure inside those containers using PowerShell To simulate a directory structure in Azure Data Lake Storage Gen2, use Bicep with deployment scripts that execute az storage fs directory create. This enables automation of folder creation inside blob containers at deployment time. In this setup: A Microsoft.Resources/deploymentScripts resource is used. The az storage fs directory create command creates virtual folders inside containers. Access is authenticated using a secure account key fetched via storageAccount.listKeys() Parameter Flow and Integration The solution uses Bicep’s module linking capabilities: The main module outputs the Storage Account name and container names. These outputs are passed as parameters to the deployment script module. The script loops through each container and folder, uploading a dummy blob to create the folder. Here’s the final setup with the storage account and container ready, plus the directory created inside—everything’s all set! Conclusion This approach is especially useful in enterprise environments where storage structures must be provisioned consistently across environments. You can extend this pattern further to: Tag blobs/folders Assign RBAC roles Handle folder-level metadata Have you faced similar challenges with Azure Storage provisioning? Share your experience or drop a comment!993Views0likes0CommentsDesigning Reusable Bicep Modules: A Databricks Example
In this blog, I’ll walk you through how to design a reusable Bicep module for deploying Azure Databricks, a popular analytics and machine learning platform. We'll focus on creating a parameterized and composable pattern using Bicep and Azure Verified Modules (AVM), enabling your team to replicate this setup across environments with minimal changes. Why Reusability in Bicep Matters: As your Azure environment scales, manually copying and modifying Bicep files for every service or environment becomes error-prone and unmanageable. Reusable Bicep modules help: Eliminate redundant code Enforce naming, tagging, and networking standards Accelerate onboarding of new services or teams Enable self-service infrastructure in CI/CD pipelines Here, we’ll create a reusable module to deploy an Azure Databricks Workspace with: Consistent naming conventions Virtual network injection (VNet) Private endpoint integration (UI, Blob, DFS) Optional DNS zone configuration Role assignments AVM module integration Module Inputs (Parameters) Your Bicep pattern uses several key parameters: Parameterizing the Pattern These parameters allow the module to be flexible yet consistent across multiple environments. Naming conventions: The nameObject structure is used to build consistent names for all resources: var adbworkspaceName = toLower('${nameObject.client}-${nameObject.workloadIdentifier}-${nameObject.environment}-${nameObject.region}-adb-${nameObject.suffix}') Configuring Private Endpoints and DNS The module allows defining private endpoint configurations for both Databricks and storage: This logic ensures: Private access to the Databricks UI Optional DNS zone integration for custom resolution You can extend this to include Blob and DFS storage private endpoints, which are essential for secure data lake integrations. Plugging in the AVM Module: The actual deployment leverages an Azure Verified Module (AVM) stored in an Azure Container Registry (ACR): Example usage: Using the module in your main Bicep deployment: Conclusion This Bicep-based or any reusable modules or patterns enable consistent, secure, and scalable deployments across your Azure environments. Whether you're deploying a single workspace or rolling out 50 across environments, this pattern helps ensure governance and simplicity. Resources Azure Bicep Documentation Azure Verified Modules Azure Databricks Docs426Views1like0Comments