azure virtual machines
28 TopicsDeploying a GitLab Runner on Azure: A Step-by-Step Guide
This guide walks you through the entire process — from VM setup to running your first successful job. Step 1: Create an Azure VM Log in to the Azure Portal. Create a new VM with the following settings: Image: Ubuntu 20.04 LTS (recommended) Authentication: SSH Public Key (generate a .pem file for secure access) Once created, note the public IP address. Connect to the VM From your terminal: ssh -i "/path/to/your/key.pem" admin_name@<YOUR_VM_PUBLIC_IP> Note: Make sure to replace the above command with path to .pem file and admin name which you would have given during VM deployment. Step 2: Install Docker on the Azure VM Run the following commands to install Docker: sudo apt update && sudo apt upgrade -y sudo apt install -y docker.io sudo systemctl start docker sudo systemctl enable docker #Enable Docker to start automatically on boot sudo usermod -aG docker $USER Test Docker with: docker run hello-world A success message should appear. If you see permission denied, run: newgrp docker Note: Log out and log back in (or restart the VM) for group changes to apply. Step 3: Install GitLab Runner Download the GitLab Runner binary: Assign execution permissions: Install and start the runner as a service: #Step1 sudo chmod +x /usr/local/bin/gitlab-runner #Step2 sudo curl -L --output /usr/local/bin/gitlab-runner \ https://gitlab-runner-downloads.s3.amazonaws.com/latest/binaries/gitlab-runner-linux-amd64 #Step3 sudo gitlab-runner install --user=azureuser sudo gitlab-runner start sudo systemctl enable gitlab-runner #Enable GitLab Runner to start automatically on boot Step 4: Register the GitLab Runner Navigate to runner section on your Gitlab to generate registration token (Gitlab -> Settings -> CI/CD -> Runners -> New Project Runner) On your Azure VM, run: sudo gitlab-runner register \ --url https://gitlab.com/ \ --registration-token <YOUR_TOKEN> \ --executor docker \ --docker-image Ubuntu:22.04 \ --description "Azure VM Runner" \ --tag-list "gitlab-runner-vm" \ --non-interactive Note: Replace the registration toke, description, tag-list as required After registration, restart the runner: sudo gitlab-runner restart Verify the runner’s status with: sudo gitlab-runner list Your runner should appear in the list. If runner does not appear, make sure to follow step 4 as described. Step 5: Add Runner Tags to Your Pipeline In .gitlab-ci.yml default: tags: - gitlab-runner-vm Step 6: Verify Pipeline Execution Create a simple job to test the runner: test-runner: tags: - gitlab-runner-vm script: - echo "Runner is working!" Troubleshooting Common Issues Permission Denied (Docker Error) Error: docker: permission denied while trying to connect to the Docker daemon socket Solution: Run newgrp docker If unresolved, restart Docker: sudo systemctl restart docker No Active Runners Online Error: This job is stuck because there are no active runners online. Solution: Check runner status: sudo gitlab-runner status If inactive, restart the runner: sudo gitlab-runner restart Ensure your runner tag in the pipelines matches the one you provided while creating runner for project Final Tips Always restart the runner after making configuration changes: sudo gitlab-runner restart Remember to periodically check the runner’s status and update its configuration as needed to keep it running smoothly. Happy coding and enjoy the enhanced capabilities of your new GitLab Runner setup!119Views0likes0CommentsMonitoring Time Drift in Azure Kubernetes Service for Regulated Industries
In this blog post, I will share how customers can monitor their Azure Kubernetes Service (AKS) clusters for time drifts using a custom container image, Azure managed Prometheus and Grafana. Understanding Time Sync in Cloud Environments Azure’s underlying infrastructure uses Microsoft-managed Stratum 1 time servers connected to GPS-based atomic clocks to ensure a highly accurate reference time. Linux VMs in Azure can synchronize either with their Azure host via Precision Time Protocol (PTP) devices like /dev/ptp0, or with external NTP servers over the public internet. The Azure host, being physically closer and more stable, provides a lower-latency and more reliable time source. On Azure, Linux VMs use chrony, a Linux time synchronization service. It provides superior performance under varying network conditions and includes advanced capabilities for handling drift and jitter. Terminology like "Last offset" (difference between system and reference time), "Skew" (drift rate), and "Root dispersion" (uncertainty of the time measurement) help quantify how well a system's clock is aligned. Solution Overview At the time of writing this article, it is not possible to monitor clock errors on Azure Kubernetes Service nodes directly, since node images can not be customized and are managed by Azure. Customers may ask "How do we prove our AKS workloads are keeping time accurately?" To address this, I've developed a solution that consists of a custom container image running as a DaemonSet, which generates Prometheus metrics and can be visualized on Grafana dashboards, to continuously monitor time drift across Kubernetes nodes. This solution deploys a containerized Prometheus exporter to every node in the Azure Kubernetes Service (AKS) cluster. It exposes a metric representing the node's time drift, allowing Prometheus to scrape the data and Azure Managed Grafana to visualize it. The design emphasizes security and simplicity: the container runs as a non-root user with minimal privileges, and it securely accesses the Chrony socket on the host to extract time synchronization metrics. As we walk through the solution, it is recommended that you follow along with code on GitHub. Technical Deep Dive: From Image Build to Pod Execution The custom container image is built around a Python script (chrony_exporter.py) that runs the chronyc tracking command, parses its output, and calculates a 'clock error' value. This value is calculated in the following way: clock_error = |last_offset| + root_dispersion + (0.5 × root_delay) This script then exports the result via a Prometheus-compatible HTTP endpoint. The only dependency it requires is the prometheus_client library, defined in the requirements.txt file Secure Entrypoint with Limited Root Access The container is designed to run as a non-root user. The entrypoint.sh script launches the Python exporter using sudo, which is the only command that this user is allowed to run with elevated privileges. This ensures that while root is required to query chronyc, the rest of the container operates with a strict least-privilege model: #!/bin/bash echo "Executing as non-root user: $(whoami)" sudo /app/venv/bin/python /app/chrony_exporter.py By restricting the sudoers file to a single command, this approach allows safe execution of privileged operations without exposing the container to unnecessary risk. DaemonSet with Pod Hardening and Host Socket Access The deployment is defined as a Kubernetes DaemonSet (chrony-ds.yaml), ensuring one pod runs on each AKS node. The pod has the following hardening and configuration settings: Runs as non-root (runAsUser: 1001, runAsNonRoot: true) Read-only root filesystem to minimize tampering risk and altering of scripts HostPath volume mount for /run/chrony so it can query the Chrony daemon on the node Prometheus annotations for automated metric scraping Example DaemonSet snippet: securityContext: runAsUser: 1001 runAsGroup: 1001 runAsNonRoot: true containers: - name: chrony-monitor image: <chrony-image> command: ["/bin/sh", "-c", "/app/entrypoint.sh"] securityContext: readOnlyRootFilesystem: true volumeMounts: - name: chrony-socket mountPath: /run/chrony volumes: - name: chrony-socket hostPath: path: /run/chrony type: Directory This setup gives the container controlled access to the Chrony Unix socket on the host while preventing any broader filesystem access. Configuration: Using the Azure Host as a Time Source The underlying AKS node's (Linux VM) chrony.conf file is configured to sync time from the Azure host through the PTP device (/dev/ptp0). This configuration is optimized for cloud environments and includes: refclock PHC /dev/ptp0 for direct PTP sync makestep 1.0 -1 to immediately correct large drifts on startup This ensures that time metrics reflect highly accurate local synchronization, avoiding public NTP network variability. With these layers combined—secure container build, restricted execution model, and Kubernetes-native deployment—you gain a powerful yet minimalistic time accuracy monitoring solution tailored for financial and regulated environments. Setup Instructions Prerequisites An existing AKS cluster Azure Monitor with Managed Prometheus and Grafana enabled An Azure Container Registry (ACR) to host your image Steps Clone the project repository: git clone https://github.com/Azure/chrony-tracker.git Build the Docker image locally: docker build --platform=linux/amd64 -t chrony-tracker:1.0 . Tag the image for your ACR: docker tag chrony-tracker:1.0 <youracr>.azurecr.io/chrony-tracker:1.0 Push the image to ACR: docker push <youracr>.azurecr.io/chrony-tracker:1.0 Update the DaemonSet YAML (chrony-ds.yaml) to use your ACR image: image: <youracr>.azurecr.io/chrony-tracker:1.0 Apply the DaemonSet: kubectl apply -f chrony-ds.yaml Apply the Prometheus scrape config (ConfigMap): kubectl apply -f ama-metrics-prometheus-config-configmap.yaml Delete the "ama-metrics-xxx" pods from the kube-system namespace to apply the new configurations After these steps, your AKS nodes will be monitored for clock drift. Viewing the Metric in Managed Grafana Once the DaemonSet and ConfigMap are deployed and metrics are being scraped by Managed Prometheus, you can visualize the chrony_clock_error_ms metric in Azure Managed Grafana by following these steps: Open the Azure Portal and navigate to your Azure Managed Grafana resource. Select the Grafana workspace and navigate to the Endpoint by clicking on the URL under Overview From the left-hand menu, select Metrics and then click on + New metric exploration Enter the name of the metric "chrony_clock_error_ms" under Search metrics and click Select You should now be able to view the metric To customize it and view all sources, click on the Open in explorer button Optional: Secure the Metrics Endpoint To enhance the security of the /metrics endpoint exposed by each pod, you can enable basic authentication on the exporter. This requires configuring an HTTP server inside the container with basic authentication. You would also need to update your Prometheus ConfigMap to include authentication credentials . For detailed guidance on securing scrape targets, refer to the Prometheus documentation on authentication and TLS settings. In addition it is recommended to use Private link for Kubernetes monitoring with Azure Monitor and Azure managed Prometheus Learn More If you'd like to explore this solution further or integrate it into your production workloads, the following resources provide valuable guidance: Microsoft Learn: Time sync in Linux VMs chroncy-tracker GitHub repo Azure Monitor and Prometheus Integration Author Dotan Paz Sr. Cloud Solutions Architect, Microsoft345Views0likes0CommentsBoosting Performance with the Latest Generations of Virtual Machines in Azure
Microsoft Azure recently announced the availability of the new generation of VMs (v6)—including the Dl/Dv6 (general purpose) and El/Ev6 (memory-optimized) series. These VMs are powered by the latest Intel Xeon processors and are engineered to deliver: Up to 30% higher per-core performance compared to previous generations. Greater scalability, with options of up to 128 vCPUs (Dv6) and 192 vCPUs (Ev6). Significant enhancements in CPU cache (up to 5× larger), memory bandwidth, and NVMe-enabled storage. Improved security with features like Intel® Total Memory Encryption (TME) and enhanced networking via the new Microsoft Azure Network Adaptor (MANA). By Microsoft By Microsoft Evaluated Virtual Machines and Geekbench Results The table below summarizes the configuration and Geekbench results for the two VMs we tested. VM1 represents a previous-generation machine with more vCPUs and memory, while VM2 is from the new Dld e6 series, showing superior performance despite having fewer vCPUs. VM1 features VM1 - D16S V5 (16 Vcpus - 64GB RAM) VM1 - D16S V5 (16 Vcpus - 64GB RAM) VM2 features VM2 - D16ls v6 (16 Vcpus - 32GB RAM) VM2 - D16ls v6 (16 Vcpus - 32GB RAM) Key Observations: Single-Core Performance: VM2 scores 2013 compared to VM1’s 1570, a 28.2% improvement. This demonstrates that even with half the vCPUs, the new Dld e6 series provides significantly better performance per core. Multi-Core Performance: Despite having fewer cores, VM2 achieves a multi-core score of 12,566 versus 9,454 for VM1, showing a 32.9% increase in performance. VM 1 VM 2 Enhanced Throughput in Specific Workloads: File Compression: 1909 MB/s (VM2) vs. 1654 MB/s (VM1) – a 15.4% improvement. Object Detection: 2851 images/s (VM2) vs. 1592 images/s (VM1) – a remarkable 79.2% improvement. Ray Tracing: 1798 Kpixels/s (VM2) vs. 1512 Kpixels/s (VM1) – an 18.9% boost. These results reflect the significant advancements enabled by the new generation of Intel processors. Score VM 1 VM 1 VM 1 Score VM 2 VM 2 VM 2 Evolution of Hardware in Azure: From Ice Lake-SP to Emerald Rapids Technical Specifications of the Processors Evaluated Understanding the dramatic performance improvements begins with a look at the processor specifications: Intel Xeon Platinum 8370C (Ice Lake-SP) Architecture: Ice Lake-SP Base Frequency: 2.79 GHz Max Frequency: 3.5 GHz L3 Cache: 48 MB Supported Instructions: AVX-512, VNNI, DL Boost VM 1 Intel Xeon Platinum 8573C (Emerald Rapids) Architecture: Emerald Rapids Base Frequency: 2.3 GHz Max Frequency: 4.2 GHz L3 Cache: 260 MB Supported Instructions: AVX-512, AMX, VNNI, DL Boost VM 2 Impact on Performance Cache Size Increase: The jump from 48 MB to 260 MB of L3 cache is a key factor. A larger cache reduces dependency on RAM accesses, thereby lowering latency and significantly boosting performance in memory-intensive workloads such as AI, big data, and scientific simulations. Enhanced Frequency Dynamics: While the base frequency of the Emerald Rapids processor is slightly lower, its higher maximum frequency (4.2 GHz vs. 3.5 GHz) means that under load, performance-critical tasks can benefit from this burst capability. Advanced Instruction Support: The introduction of AMX (Advanced Matrix Extensions) in Emerald Rapids, along with the robust AVX-512 support, optimizes the execution of complex mathematical and AI workloads. Efficiency Gains: These processors also offer improved energy efficiency, reducing the energy consumed per compute unit. This efficiency translates into lower operational costs and a more sustainable cloud environment. Beyond Our Tests: Overview of the New v6 Series While our tests focused on the Dld e6 series, Azure’s new v6 generation includes several families designed for different workloads: 1. Dlsv6 and Dldsv6-series Segment: General purpose with NVMe local storage (where applicable) vCPUs Range: 2 – 128 Memory: 4 – 256 GiB Local Disk: Up to 7,040 GiB (Dldsv6) Highlights: 5× increased CPU cache (up to 300 MB) and higher network bandwidth (up to 54 Gbps) 2. Dsv6 and Ddsv6-series Segment: General purpose vCPUs Range: 2 – 128 Memory: Up to 512 GiB Local Disk: Up to 7,040 GiB in Ddsv6 Highlights: Up to 30% improved performance over the previous Dv5 generation and Azure Boost for enhanced IOPS and network performance 3. Esv6 and Edsv6-series Segment: Memory-optimized vCPUs Range: 2 – 192* (with larger sizes available in Q2) Memory: Up to 1.8 TiB (1832 GiB) Local Disk: Up to 10,560 GiB in Edsv6 Highlights: Ideal for in-memory analytics, relational databases, and enterprise applications requiring vast amounts of RAM Note: Sizes with higher vCPUs and memory (e.g., E128/E192) will be generally available in Q2 of this year. Key Innovations in the v6 Generation Increased CPU Cache: Up to 5× more cache (from 60 MB to 300 MB) dramatically improves data access speeds. NVMe for Storage: Enhanced local and remote storage performance, with up to 3× more IOPS locally and the capability to reach 400k IOPS remotely via Azure Boost. Azure Boost: Delivers higher throughput (up to 12 GB/s remote disk throughput) and improved network bandwidth (up to 200 Gbps for larger sizes). Microsoft Azure Network Adaptor (MANA): Provides improved network stability and performance for both Windows and Linux environments. Intel® Total Memory Encryption (TME): Enhances data security by encrypting the system memory. Scalability: Options ranging from 128 vCPUs/512 GiB RAM in the Dv6 family to 192 vCPUs/1.8 TiB RAM in the Ev6 family. Performance Gains: Benchmarks and internal tests (such as SPEC CPU Integer) indicate improvements of 15%–30% across various workloads including web applications, databases, analytics, and generative AI tasks. My personal perspective and point of view The new Azure v6 VMs mark a significant advancement in cloud computing performance, scalability, and security. Our Geekbench tests clearly show that the Dld e6 series—powered by the latest Intel Xeon Platinum 8573C (Emerald Rapids)—delivers up to 30% better performance than previous-generation machines with more resources. Coupled with the hardware evolution from Ice Lake-SP to Emerald Rapids—which brings a dramatic increase in cache size, improved frequency dynamics, and advanced instruction support—the new v6 generation sets a new standard for high-performance workloads. Whether you’re running critical enterprise applications, data-intensive analytics, or next-generation AI models, the enhanced capabilities of these VMs offer significant benefits in performance, efficiency, and cost-effectiveness. References and Further Reading: Microsoft’s official announcement: Azure Dld e6 VMs Internal tests performed with Geekbench 6.4.0 (AVX2) in the Germany West Central Azure region.893Views0likes0CommentsResiliency Best Practices You Need For your Blob Storage Data
Maintaining Resiliency in Azure Blob Storage: A Guide to Best Practices Azure Blob Storage is a cornerstone of modern cloud storage, offering scalable and secure solutions for unstructured data. However, maintaining resiliency in Blob Storage requires careful planning and adherence to best practices. In this blog, I’ll share practical strategies to ensure your data remains available, secure, and recoverable under all circumstances. 1. Enable Soft Delete for Accidental Recovery (Most Important) Mistakes happen, but soft delete can be your safety net and. It allows you to recover deleted blobs within a specified retention period: Configure a soft delete retention period in Azure Storage. Regularly monitor your blob storage to ensure that critical data is not permanently removed by mistake. Enabling soft delete in Azure Blob Storage does not come with any additional cost for simply enabling the feature itself. However, it can potentially impact your storage costs because the deleted data is retained for the configured retention period, which means: The retained data contributes to the total storage consumption during the retention period. You will be charged according to the pricing tier of the data (Hot, Cool, or Archive) for the duration of retention 2. Utilize Geo-Redundant Storage (GRS) Geo-redundancy ensures your data is replicated across regions to protect against regional failures: Choose RA-GRS (Read-Access Geo-Redundant Storage) for read access to secondary replicas in the event of a primary region outage. Assess your workload’s RPO (Recovery Point Objective) and RTO (Recovery Time Objective) needs to select the appropriate redundancy. 3. Implement Lifecycle Management Policies Efficient storage management reduces costs and ensures long-term data availability: Set up lifecycle policies to transition data between hot, cool, and archive tiers based on usage. Automatically delete expired blobs to save on costs while keeping your storage organized. 4. Secure Your Data with Encryption and Access Controls Resiliency is incomplete without robust security. Protect your blobs using: Encryption at Rest: Azure automatically encrypts data using server-side encryption (SSE). Consider enabling customer-managed keys for additional control. Access Policies: Implement Shared Access Signatures (SAS) and Stored Access Policies to restrict access and enforce expiration dates. 5. Monitor and Alert for Anomalies Stay proactive by leveraging Azure’s monitoring capabilities: Use Azure Monitor and Log Analytics to track storage performance and usage patterns. Set up alerts for unusual activities, such as sudden spikes in access or deletions, to detect potential issues early. 6. Plan for Disaster Recovery Ensure your data remains accessible even during critical failures: Create snapshots of critical blobs for point-in-time recovery. Enable backup for blog & have the immutability feature enabled Test your recovery process regularly to ensure it meets your operational requirements. 7. Resource lock Adding Azure Locks to your Blob Storage account provides an additional layer of protection by preventing accidental deletion or modification of critical resources 7. Educate and Train Your Team Operational resilience often hinges on user awareness: Conduct regular training sessions on Blob Storage best practices. Document and share a clear data recovery and management protocol with all stakeholders. 8. "Critical Tip: Do Not Create New Containers with Deleted Names During Recovery" If a container or blob storage is deleted for any reason and recovery is being attempted, it’s crucial not to create a new container with the same name immediately. Doing so can significantly hinder the recovery process by overwriting backend pointers, which are essential for restoring the deleted data. Always ensure that no new containers are created using the same name during the recovery attempt to maximize the chances of successful restoration. Wrapping It Up Azure Blob Storage offers an exceptional platform for scalable and secure storage, but its resiliency depends on following best practices. By enabling features like soft delete, implementing redundancy, securing data, and proactively monitoring your storage environment, you can ensure that your data is resilient to failures and recoverable in any scenario. Protect your Azure resources with a lock - Azure Resource Manager | Microsoft Learn Data redundancy - Azure Storage | Microsoft Learn Overview of Azure Blobs backup - Azure Backup | Microsoft Learn Protect your Azure resources with a lock - Azure Resource Manager | Microsoft Learn878Views1like0CommentsAzure VMs Not Applying GPOs Correctly
Hi everyone, Quick question… if my Azure VMs are joined to my domain, they should be applying all my configured GPOs, right? For some reason, my VMs are not applying the GPOs, even after running a GPUPDATE /force. At the moment, I am testing some simple GPOs like: Creating a folder on the desktop Setting the time format to Brazilian (dd/mm/yyyy) Adjusting the timezone to Brasília When I run gpresult /r, it shows that the GPOs are being applied, but for some reason, the VM just doesn’t reflect them. Any idea what might be causing this?185Views0likes1CommentAzure Extended Zones: Optimizing Performance, Compliance, and Accessibility
Azure Extended Zones are small-scale Azure extensions located in specific metros or jurisdictions to support low-latency and data residency workloads. They enable users to run latency-sensitive applications close to end users while maintaining compliance with data residency requirements, all within the Azure ecosystem.2.8KViews2likes0Comments