azure virtual machines
16 TopicsAnnouncing Cobalt 200: Azure’s next cloud-native CPU
By Selim Bilgin, Corporate Vice President, Silicon Engineering, and Pat Stemen, Vice President, Azure Cobalt Today, we’re thrilled to announce Azure Cobalt 200, our next-generation Arm-based CPU designed for cloud-native workloads. Cobalt 200 is a milestone in our continued approach to optimize every layer of the cloud stack from silicon to software. Our design goals were to deliver full compatibility for workloads using our existing Azure Cobalt CPUs, deliver up to 50% performance improvement over Cobalt 100, and integrate with the latest Microsoft security, networking and storage technologies. Like its predecessor, Cobalt 200 is optimized for common customer workloads and delivers unique capabilities for our own Microsoft cloud products. Our first production Cobalt 200 servers are now live in our datacenters, with wider rollout and customer availability coming in 2026. Azure Cobalt 200 SoC and platform Building on Cobalt 100: Leading Price-Performance Our Azure Cobalt journey began with Cobalt 100, our first custom-built processor for cloud-native workloads. Cobalt 100 VMs have been Generally Available (GA) since October of 2024 and availability has expanded rapidly to 32 Azure datacenter regions around the world. In just one year, we have been blown away with the pace that customers have adopted the new platform, and migrated their most critical workloads to Cobalt 100 for the performance, efficiency, and price-performance benefits. Cloud analytics leaders like Databricks and Snowflake are adopting Cobalt 100 to optimize their cloud footprint. The compute performance and energy-efficiency balance of Cobalt 100-based virtual machines and containers has proven ideal for large-scale data processing workloads. Microsoft’s own cloud services have also rapidly adopted Azure Cobalt for similar benefits. Microsoft Teams achieved up to 45% better performance using Cobalt 100 than their previous compute platform. This increased performance means less servers needed for the same task, for instance Microsoft Teams media processing uses 35% fewer compute cores with Cobalt 100. Designing Compute Infrastructure for Real Workloads With this solid foundation, we set out to design a worthy successor – Cobalt 200. We faced a key challenge: traditional compute benchmarks do not represent the diversity of our customer workloads. Our telemetry from the wide range of workloads running in Azure (small microservices to globally available SaaS products) did not match common hardware performance benchmarks. Existing benchmarks tend to skew toward CPU core-focused compute patterns, leaving gaps in how real-world cloud applications behave at scale when using network and storage resources. Optimizing Azure Cobalt for customer workloads requires us to expand beyond these CPU core benchmarks to truly understand and model the diversity of customer workloads in Azure. As a result, we created a portfolio of benchmarks drawn directly from the usage patterns we see in Azure, including databases, web servers, storage caches, network transactions, and data analytics. Each of our benchmark workloads includes multiple variants for performance evaluation based on the ways our customers may use the underlying database, storage, or web serving technology. In total, we built and refined over 140 individual benchmark variants as part of our internal evaluation suite. With the help of our software teams, we created a complete digital twin simulation from the silicon up: beginning with the CPU core microarchitecture, fabric, and memory IP blocks in Cobalt 200, all the way through the server design and rack topology. Then, we used AI, statistical modelling and the power of Azure to model the performance and power consumption of the 140 benchmarks against 2,800 combinations of SoC and system design parameters: core count, cache size, memory speed, server topology, SoC power, and rack configuration. This resulted in the evaluation of over 350,000 configuration candidates of the Cobalt 200 system as part of our design process. This extensive modelling and simulation helped us to quickly iterate to find the optimal design point for Cobalt 200, delivering over 50% increased performance compared to Cobalt 100, all while continuing to deliver our most power-efficient platform in Azure. Cobalt 200: Delivering Performance and Efficiency At the heart of every Cobalt 200 server is the most advanced compute silicon in Azure: the Cobalt 200 System-on-Chip (SoC). The Cobalt 200 SoC is built around the Arm Neoverse Compute Subsystems V3 (CSS V3), the latest performance-optimized core and fabric from Arm. Each Cobalt 200 SoC includes 132 active cores with 3MB of L2 cache per-core and 192MB of L3 system cache to deliver exceptional performance for customer workloads. Power efficiency is just as important as raw performance. Energy consumption represents a significant portion of the lifetime operating cost of a cloud server. One of the unique innovations in our Azure Cobalt CPUs is individual per-core Dynamic Voltage and Frequency Scaling (DVFS). In Cobalt 200 this allows each of the 132 cores to run at a different performance level, delivering optimal power consumption no matter the workload. We are also taking advantage of the latest TSMC 3nm process, further improving power efficiency. Security is top-of-mind for all of our customers and a key part of the unique innovation in Cobalt 200. We designed and built a custom memory controller for Cobalt 200, so that memory encryption is on by default with negligible performance impact. Cobalt 200 also implements Arm’s Confidential Compute Architecture (CCA), which supports hardware-based isolation of VM memory from the hypervisor and host OS. When designing Cobalt 200, our benchmark workloads and design simulations revealed an interesting trend: several universal compute patterns emerged – compression, decompression, and encryption. Over 30% of cloud workloads had significant use of one of these common operations. Optimizing for these common operations required a different approach than just cache sizing and CPU core selection. We designed custom compression and cryptography accelerators – dedicated blocks of silicon on each Cobalt 200 SoC – solely for the purpose of accelerating these operations without sacrificing CPU cycles. These accelerators help reduce workload CPU consumption and overall costs. For example, by offloading compression and encryption tasks to the Cobalt 200 accelerator, Azure SQL is able to reduce use of critical compute resources, prioritizing them for customer workloads. Leading Infrastructure Innovation with Cobalt 200 Azure Cobalt is more than just an SoC, and we are constantly optimizing and accelerating every layer in the infrastructure. The latest Azure Boost capabilities are built into the new Cobalt 200 system, which significantly improves networking and remote storage performance. Azure Boost delivers increased network bandwidth and offloads remote storage and networking tasks to custom hardware, improving overall workload performance and reducing latency. Cobalt 200 systems also embed the Azure Integrated HSM (Hardware Security Module), providing customers with top-tier cryptographic key protection within Azure’s infrastructure, ensuring sensitive data stays secure. The Azure Integrated HSM works with Azure Key Vault for simplified management of encryption keys, offering high availability and scalability as well as meeting FIPS 140-3 Level 3 compliance. An Azure Cobalt 200 server in a validation lab Looking Forward to 2026 We are excited about the innovation and advanced technology in Cobalt 200 and look forward to seeing how our customers create breakthrough products and services. We’re busy racking and stacking Cobalt 200 servers around the world and look forward to sharing more as we get closer to wider availability next year. Check out Microsoft Ignite opening keynote Read more on what's new in Azure at Ignite Learn more about Microsoft's global infrastructure13KViews8likes0Comments(Part-1) Leverage Bicep: Standard model to Automate Azure IaaS deployment
Subjects. Those deeply interested in IaC using Azure. Those who understand the basics of Azure Resource Manager Templates and want to work deeply with Bicep. Those who understand the names of services and functions used in Azure IaaS and have experience in building automation. Agenda. How about Bicep Difference between ARM templates and Bicep Basic functionality Bicep Development Environment Sample Code and Explanation Traps and Avoidance Notes. Azure services are evolving every day. This content is based on what we have confirmed as of April 2023.8.7KViews4likes0CommentsDeploying a GitLab Runner on Azure: A Step-by-Step Guide
This guide walks you through the entire process — from VM setup to running your first successful job. Step 1: Create an Azure VM Log in to the Azure Portal. Create a new VM with the following settings: Image: Ubuntu 20.04 LTS (recommended) Authentication: SSH Public Key (generate a .pem file for secure access) Once created, note the public IP address. Connect to the VM From your terminal: ssh -i "/path/to/your/key.pem" admin_name@<YOUR_VM_PUBLIC_IP> Note: Make sure to replace the above command with path to .pem file and admin name which you would have given during VM deployment. Step 2: Install Docker on the Azure VM Run the following commands to install Docker: sudo apt update && sudo apt upgrade -y sudo apt install -y docker.io sudo systemctl start docker sudo systemctl enable docker #Enable Docker to start automatically on boot sudo usermod -aG docker $USER Test Docker with: docker run hello-world A success message should appear. If you see permission denied, run: newgrp docker Note: Log out and log back in (or restart the VM) for group changes to apply. Step 3: Install GitLab Runner Download the GitLab Runner binary: Assign execution permissions: Install and start the runner as a service: #Step1 sudo chmod +x /usr/local/bin/gitlab-runner #Step2 sudo curl -L --output /usr/local/bin/gitlab-runner \ https://gitlab-runner-downloads.s3.amazonaws.com/latest/binaries/gitlab-runner-linux-amd64 #Step3 sudo gitlab-runner install --user=azureuser sudo gitlab-runner start sudo systemctl enable gitlab-runner #Enable GitLab Runner to start automatically on boot Step 4: Register the GitLab Runner Navigate to runner section on your Gitlab to generate registration token (Gitlab -> Settings -> CI/CD -> Runners -> New Project Runner) On your Azure VM, run: sudo gitlab-runner register \ --url https://gitlab.com/ \ --registration-token <YOUR_TOKEN> \ --executor docker \ --docker-image Ubuntu:22.04 \ --description "Azure VM Runner" \ --tag-list "gitlab-runner-vm" \ --non-interactive Note: Replace the registration toke, description, tag-list as required After registration, restart the runner: sudo gitlab-runner restart Verify the runner’s status with: sudo gitlab-runner list Your runner should appear in the list. If runner does not appear, make sure to follow step 4 as described. Step 5: Add Runner Tags to Your Pipeline In .gitlab-ci.yml default: tags: - gitlab-runner-vm Step 6: Verify Pipeline Execution Create a simple job to test the runner: test-runner: tags: - gitlab-runner-vm script: - echo "Runner is working!" Troubleshooting Common Issues Permission Denied (Docker Error) Error: docker: permission denied while trying to connect to the Docker daemon socket Solution: Run newgrp docker If unresolved, restart Docker: sudo systemctl restart docker No Active Runners Online Error: This job is stuck because there are no active runners online. Solution: Check runner status: sudo gitlab-runner status If inactive, restart the runner: sudo gitlab-runner restart Ensure your runner tag in the pipelines matches the one you provided while creating runner for project Final Tips Always restart the runner after making configuration changes: sudo gitlab-runner restart Remember to periodically check the runner’s status and update its configuration as needed to keep it running smoothly. Happy coding and enjoy the enhanced capabilities of your new GitLab Runner setup!2KViews2likes2CommentsAzure Extended Zones: Optimizing Performance, Compliance, and Accessibility
Azure Extended Zones are small-scale Azure extensions located in specific metros or jurisdictions to support low-latency and data residency workloads. They enable users to run latency-sensitive applications close to end users while maintaining compliance with data residency requirements, all within the Azure ecosystem.3KViews2likes0CommentsAzure VNet Flow Logs with Terraform: The Complete Migration and Traffic Analytics Guide
Migrating from NSG Flow Logs to VNet Flow Logs in Azure: Implementation with Terraform Author: Ibrahim Baig (Consultant) Executive Summary Microsoft is retiring Network Security Group (NSG) flow logs and recommends migrating to Virtual Network (VNet) flow logs. After June 30, 2025, new NSG flow logs cannot be created, and all NSG flow logs will be retired by September 30, 2027. Migrating to VNet flow logs ensures continued support and provides broader, simpler network visibility. What Changed & Key Dates - June 30, 2025: Creation of new NSG flow logs is blocked. - September 30, 2027: NSG flow logs are retired (resources deleted; historical blobs remain per retention policy). - Microsoft provides migration scripts and policy guidance for NSG→VNet flow logs. Why Migrate? (Benefits) Operational Simplicity & Coverage - Enable logging at the VNet, subnet, or NIC scope—no dependency on NSG. - Broader visibility across all workloads inside a VNet, not just NSG-governed traffic. Security & Analytics - Native integration with Traffic Analytics for enriched insights. - Monitor Azure Virtual Network Manager (AVNM) security admin rules. Continuity & Cost Parity - VNet flow logs are priced the same as NSG flow logs (with 5 GB/month free). What’s New in VNet Flow Logs - Scopes: Enable at VNet, subnet, or NIC level. - Storage: JSON logs to Azure Storage. - At-scale enablement: Built-in Azure Policy for auditing and auto-deployment. - Analytics: Traffic Analytics add-on for deep insights. - AVNM awareness: Observe centrally managed security admin rules. Traffic Analytics: Capabilities & Value Traffic Analytics (TA) is a powerful add-on for VNet flow logs, providing: - Automated Traffic Insights: Visualize traffic flows, identify top talkers, and detect anomalous patterns. - Threat Detection: Surface suspicious flows, lateral movement, and communication with malicious IPs. - Network Segmentation Validation: Confirm that segmentation policies are effective and spot unintended access. - Performance Monitoring: Analyze bandwidth usage, latency, and flow volumes for troubleshooting. - Customizable Dashboards: Drill down by subnet, region, or workload for targeted investigations. - Integration: Seamless with Azure Monitor and Log Analytics for alerting and automation. For practical recipes and advanced use cases, see https://blog.cloudtrooper.net/2024/05/08/vnet-flow-logs-recipes/. GAP: The Terraform Registry page for azurerm_network_watcher_flow_log does not yet provide an explicit VNet flow logs example. In practice, you use the same resource and set target_resource_id to the ID of the VNet (or Subnet/NIC). Registry page (latest): https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/network_watcher_flow_log Important notes: - Same resource block: azurerm_network_watcher_flow_log - Use target_resource_id = <resource ID of VNet/Subnet/NIC> (instead of legacy network_security_group_id) - As of 30 July 2025, creating new NSG flow logs is no longer possible (provider notes); migrate to VNet/Subnet/NIC targets. - Keep your azurerm provider up-to-date, earlier builds had validation gaps for subnet/NIC IDs; these were tracked and addressed in provider issues. Implementation Guide Option A — Terraform (Recommended for IaC) Note: Use a dedicated Storage account for flow logs, as lifecycle rules may be overwritten. terraform { required_version = ">= 1.5" required_providers { azurerm = { source = "hashicorp/azurerm" version = ">= 3.110.0" # or latest } } } provider "azurerm" { features {} } data "azurerm_network_watcher" "this" { name = "NetworkWatcher_${var.region}" resource_group_name = "NetworkWatcherRG" } resource "azurerm_network_watcher_flow_log" "vnet_flow_log" { name = "${var.vnet_name}-flowlog" network_watcher_name = data.azurerm_network_watcher.this.name resource_group_name = data.azurerm_network_watcher.this.resource_group_name target_resource_id = azurerm_virtual_network.vnet.id storage_account_id = azurerm_storage_account.flowlogs_sa.id enabled = true retention_policy { enabled = true days = 30 } traffic_analytics { enabled = true workspace_id = azurerm_log_analytics_workspace.law.workspace_id workspace_region = azurerm_log_analytics_workspace.law.location workspace_resource_id = azurerm_log_analytics_workspace.law.id interval_in_minutes = 60 } tags = { owner = "network-platform" environment = var.env } } Option B — Azure CLI az network watcher flow-log create \ --location westus \ --resource-group MyResourceGroup \ --name myVNetFlowLog \ --vnet MyVNetName \ --storage-account mystorageaccount \ --workspace "/subscriptions/<subId>/resourceGroups/<rg>/providers/Microsoft.OperationalInsights/workspaces/<LAWName>" \ --traffic-analytics true \ --interval 60 Option C — Azure Portal - Go to Network Watcher → Flow logs → + Create. - Choose Flow log type = Virtual network; select VNet/Subnet/NIC, Storage account, and optionally enable Traffic Analytics. Option D — At Scale via Azure Policy - Use built-in policies to audit and auto-deploy VNet flow logs (DeployIfNotExists). Migration Approach (NSG → VNet Flow Logs) Inventory existing NSG flow logs. Choose migration method: Microsoft script or Azure Policy. Run both in parallel temporarily to validate. Disable NSG flow logs before retirement. Challenges & Mitigations - Permissions: Ensure required roles on Log Analytics workspace. - Terraform lifecycle: Use a dedicated Storage account. - Tooling compatibility: Verify SIEM/NDR support. - Provider/API maturity: Use current azurerm provider. Validation Checklist - Storage: New blobs appear in the configured Storage account. - Traffic Analytics: Data visible in Log Analytics workspace. - AVNM: Confirm traffic allowed/denied states appear in logs. Cost Considerations - VNet flow logs ingestion: $0.50/GB after 5 GB free/month. - Traffic Analytics processing: $2.30/GB (60-min) or $3.50/GB (10-min). Traffic Analytics Deep Dive: VNet Flow Logs are stored in Azure Blob Storage. Optionally, you can enable Traffic Analytics, which will do two things: it will enrich the flow logs with additional information, and will send everything to a Log Analytics Workspace for easy querying. This “enrich and forward to Log Analytics” operation will happen in intervals, either every 10 minutes or every hour. Table Structure: NTAIPDetails This table will contain some enrichment data about public IP addresses, including whether they belong to Azure services and their region, and geolocation information for other public IPs. Here you can see a sample of what that table looks like: NTAIpDetails | distinct FlowType, PublicIpDetails, Location Table Structure: NTATopologyDetails This table contains information about different elements of your topology, including VNets, subnets, route tables, routes, NSGs, Application Gateways and much more. Here you cans see what it looks like: Table Structure: NTANetAnalytics Alright, now we are coming to more interesting things: this table is the one containing the flows we are looking for. Records in this table will contain the usual attributes you would expect such as source and destination IP, protocol, and destination port. Additionally, data will be enriched with information such as: Source and destination VM Source and destination NIC Source and destination subnet Source and destination load balancer Flow encryption (yes/no) Whether the flow is going over ExpressRoute And many more Further below you can read some scenarios with detailed queries that will show you some examples of ways you can extract information from VNet Flow Logs and Traffic Analytics. Of course, these are just some of the scenarios that came to mind on my topology, the idea is that you can get inspiration from these queries to support your individual use case. Example Scenario: Imagine you want to see with which IP addresses a given virtual machine has been talking to in the last few days: NTANetAnalytics | where TimeGenerated > ago(10d) | where SrcIp == "10.10.1.4" and strlen(DestIp)>0 | summarize TotalBytes=sum(BytesDestToSrc+BytesSrcToDest) by SrcIp, DestIp Similarly, you can play around with such KQL queries in the workspace to deep dive into the Flow Logs. References & Further Reading https://learn.microsoft.com/en-us/azure/network-watcher/nsg-flow-logs-overview https://learn.microsoft.com/en-us/azure/network-watcher/nsg-flow-logs-migrate https://learn.microsoft.com/en-us/azure/network-watcher/vnet-flow-logs-overview https://learn.microsoft.com/en-us/azure/network-watcher/vnet-flow-logs-manage https://learn.microsoft.com/en-us/cli/azure/network/watcher/flow-log?view=azure-cli-latest https://learn.microsoft.com/en-us/azure/network-watcher/vnet-flow-logs-policy https://azure.microsoft.com/en-us/pricing/details/network-watcher/ https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/network_watcher_flow_log https://blog.cloudtrooper.net/2024/05/08/vnet-flow-logs-recipes/553Views1like0CommentsResiliency Best Practices You Need For your Blob Storage Data
Maintaining Resiliency in Azure Blob Storage: A Guide to Best Practices Azure Blob Storage is a cornerstone of modern cloud storage, offering scalable and secure solutions for unstructured data. However, maintaining resiliency in Blob Storage requires careful planning and adherence to best practices. In this blog, I’ll share practical strategies to ensure your data remains available, secure, and recoverable under all circumstances. 1. Enable Soft Delete for Accidental Recovery (Most Important) Mistakes happen, but soft delete can be your safety net and. It allows you to recover deleted blobs within a specified retention period: Configure a soft delete retention period in Azure Storage. Regularly monitor your blob storage to ensure that critical data is not permanently removed by mistake. Enabling soft delete in Azure Blob Storage does not come with any additional cost for simply enabling the feature itself. However, it can potentially impact your storage costs because the deleted data is retained for the configured retention period, which means: The retained data contributes to the total storage consumption during the retention period. You will be charged according to the pricing tier of the data (Hot, Cool, or Archive) for the duration of retention 2. Utilize Geo-Redundant Storage (GRS) Geo-redundancy ensures your data is replicated across regions to protect against regional failures: Choose RA-GRS (Read-Access Geo-Redundant Storage) for read access to secondary replicas in the event of a primary region outage. Assess your workload’s RPO (Recovery Point Objective) and RTO (Recovery Time Objective) needs to select the appropriate redundancy. 3. Implement Lifecycle Management Policies Efficient storage management reduces costs and ensures long-term data availability: Set up lifecycle policies to transition data between hot, cool, and archive tiers based on usage. Automatically delete expired blobs to save on costs while keeping your storage organized. 4. Secure Your Data with Encryption and Access Controls Resiliency is incomplete without robust security. Protect your blobs using: Encryption at Rest: Azure automatically encrypts data using server-side encryption (SSE). Consider enabling customer-managed keys for additional control. Access Policies: Implement Shared Access Signatures (SAS) and Stored Access Policies to restrict access and enforce expiration dates. 5. Monitor and Alert for Anomalies Stay proactive by leveraging Azure’s monitoring capabilities: Use Azure Monitor and Log Analytics to track storage performance and usage patterns. Set up alerts for unusual activities, such as sudden spikes in access or deletions, to detect potential issues early. 6. Plan for Disaster Recovery Ensure your data remains accessible even during critical failures: Create snapshots of critical blobs for point-in-time recovery. Enable backup for blog & have the immutability feature enabled Test your recovery process regularly to ensure it meets your operational requirements. 7. Resource lock Adding Azure Locks to your Blob Storage account provides an additional layer of protection by preventing accidental deletion or modification of critical resources 7. Educate and Train Your Team Operational resilience often hinges on user awareness: Conduct regular training sessions on Blob Storage best practices. Document and share a clear data recovery and management protocol with all stakeholders. 8. "Critical Tip: Do Not Create New Containers with Deleted Names During Recovery" If a container or blob storage is deleted for any reason and recovery is being attempted, it’s crucial not to create a new container with the same name immediately. Doing so can significantly hinder the recovery process by overwriting backend pointers, which are essential for restoring the deleted data. Always ensure that no new containers are created using the same name during the recovery attempt to maximize the chances of successful restoration. Wrapping It Up Azure Blob Storage offers an exceptional platform for scalable and secure storage, but its resiliency depends on following best practices. By enabling features like soft delete, implementing redundancy, securing data, and proactively monitoring your storage environment, you can ensure that your data is resilient to failures and recoverable in any scenario. Protect your Azure resources with a lock - Azure Resource Manager | Microsoft Learn Data redundancy - Azure Storage | Microsoft Learn Overview of Azure Blobs backup - Azure Backup | Microsoft Learn Protect your Azure resources with a lock - Azure Resource Manager | Microsoft Learn1.2KViews1like1Comment