azure virtual machines
12 TopicsDeploying a GitLab Runner on Azure: A Step-by-Step Guide
This guide walks you through the entire process — from VM setup to running your first successful job. Step 1: Create an Azure VM Log in to the Azure Portal. Create a new VM with the following settings: Image: Ubuntu 20.04 LTS (recommended) Authentication: SSH Public Key (generate a .pem file for secure access) Once created, note the public IP address. Connect to the VM From your terminal: ssh -i "/path/to/your/key.pem" admin_name@<YOUR_VM_PUBLIC_IP> Note: Make sure to replace the above command with path to .pem file and admin name which you would have given during VM deployment. Step 2: Install Docker on the Azure VM Run the following commands to install Docker: sudo apt update && sudo apt upgrade -y sudo apt install -y docker.io sudo systemctl start docker sudo systemctl enable docker #Enable Docker to start automatically on boot sudo usermod -aG docker $USER Test Docker with: docker run hello-world A success message should appear. If you see permission denied, run: newgrp docker Note: Log out and log back in (or restart the VM) for group changes to apply. Step 3: Install GitLab Runner Download the GitLab Runner binary: Assign execution permissions: Install and start the runner as a service: #Step1 sudo chmod +x /usr/local/bin/gitlab-runner #Step2 sudo curl -L --output /usr/local/bin/gitlab-runner \ https://gitlab-runner-downloads.s3.amazonaws.com/latest/binaries/gitlab-runner-linux-amd64 #Step3 sudo gitlab-runner install --user=azureuser sudo gitlab-runner start sudo systemctl enable gitlab-runner #Enable GitLab Runner to start automatically on boot Step 4: Register the GitLab Runner Navigate to runner section on your Gitlab to generate registration token (Gitlab -> Settings -> CI/CD -> Runners -> New Project Runner) On your Azure VM, run: sudo gitlab-runner register \ --url https://gitlab.com/ \ --registration-token <YOUR_TOKEN> \ --executor docker \ --docker-image Ubuntu:22.04 \ --description "Azure VM Runner" \ --tag-list "gitlab-runner-vm" \ --non-interactive Note: Replace the registration toke, description, tag-list as required After registration, restart the runner: sudo gitlab-runner restart Verify the runner’s status with: sudo gitlab-runner list Your runner should appear in the list. If runner does not appear, make sure to follow step 4 as described. Step 5: Add Runner Tags to Your Pipeline In .gitlab-ci.yml default: tags: - gitlab-runner-vm Step 6: Verify Pipeline Execution Create a simple job to test the runner: test-runner: tags: - gitlab-runner-vm script: - echo "Runner is working!" Troubleshooting Common Issues Permission Denied (Docker Error) Error: docker: permission denied while trying to connect to the Docker daemon socket Solution: Run newgrp docker If unresolved, restart Docker: sudo systemctl restart docker No Active Runners Online Error: This job is stuck because there are no active runners online. Solution: Check runner status: sudo gitlab-runner status If inactive, restart the runner: sudo gitlab-runner restart Ensure your runner tag in the pipelines matches the one you provided while creating runner for project Final Tips Always restart the runner after making configuration changes: sudo gitlab-runner restart Remember to periodically check the runner’s status and update its configuration as needed to keep it running smoothly. Happy coding and enjoy the enhanced capabilities of your new GitLab Runner setup!793Views2likes2CommentsMonitoring Time Drift in Azure Kubernetes Service for Regulated Industries
In this blog post, I will share how customers can monitor their Azure Kubernetes Service (AKS) clusters for time drifts using a custom container image, Azure managed Prometheus and Grafana. Understanding Time Sync in Cloud Environments Azure’s underlying infrastructure uses Microsoft-managed Stratum 1 time servers connected to GPS-based atomic clocks to ensure a highly accurate reference time. Linux VMs in Azure can synchronize either with their Azure host via Precision Time Protocol (PTP) devices like /dev/ptp0, or with external NTP servers over the public internet. The Azure host, being physically closer and more stable, provides a lower-latency and more reliable time source. On Azure, Linux VMs use chrony, a Linux time synchronization service. It provides superior performance under varying network conditions and includes advanced capabilities for handling drift and jitter. Terminology like "Last offset" (difference between system and reference time), "Skew" (drift rate), and "Root dispersion" (uncertainty of the time measurement) help quantify how well a system's clock is aligned. Solution Overview At the time of writing this article, it is not possible to monitor clock errors on Azure Kubernetes Service nodes directly, since node images can not be customized and are managed by Azure. Customers may ask "How do we prove our AKS workloads are keeping time accurately?" To address this, I've developed a solution that consists of a custom container image running as a DaemonSet, which generates Prometheus metrics and can be visualized on Grafana dashboards, to continuously monitor time drift across Kubernetes nodes. This solution deploys a containerized Prometheus exporter to every node in the Azure Kubernetes Service (AKS) cluster. It exposes a metric representing the node's time drift, allowing Prometheus to scrape the data and Azure Managed Grafana to visualize it. The design emphasizes security and simplicity: the container runs as a non-root user with minimal privileges, and it securely accesses the Chrony socket on the host to extract time synchronization metrics. As we walk through the solution, it is recommended that you follow along with code on GitHub. Technical Deep Dive: From Image Build to Pod Execution The custom container image is built around a Python script (chrony_exporter.py) that runs the chronyc tracking command, parses its output, and calculates a 'clock error' value. This value is calculated in the following way: clock_error = |last_offset| + root_dispersion + (0.5 × root_delay) This script then exports the result via a Prometheus-compatible HTTP endpoint. The only dependency it requires is the prometheus_client library, defined in the requirements.txt file Secure Entrypoint with Limited Root Access The container is designed to run as a non-root user. The entrypoint.sh script launches the Python exporter using sudo, which is the only command that this user is allowed to run with elevated privileges. This ensures that while root is required to query chronyc, the rest of the container operates with a strict least-privilege model: #!/bin/bash echo "Executing as non-root user: $(whoami)" sudo /app/venv/bin/python /app/chrony_exporter.py By restricting the sudoers file to a single command, this approach allows safe execution of privileged operations without exposing the container to unnecessary risk. DaemonSet with Pod Hardening and Host Socket Access The deployment is defined as a Kubernetes DaemonSet (chrony-ds.yaml), ensuring one pod runs on each AKS node. The pod has the following hardening and configuration settings: Runs as non-root (runAsUser: 1001, runAsNonRoot: true) Read-only root filesystem to minimize tampering risk and altering of scripts HostPath volume mount for /run/chrony so it can query the Chrony daemon on the node Prometheus annotations for automated metric scraping Example DaemonSet snippet: securityContext: runAsUser: 1001 runAsGroup: 1001 runAsNonRoot: true containers: - name: chrony-monitor image: <chrony-image> command: ["/bin/sh", "-c", "/app/entrypoint.sh"] securityContext: readOnlyRootFilesystem: true volumeMounts: - name: chrony-socket mountPath: /run/chrony volumes: - name: chrony-socket hostPath: path: /run/chrony type: Directory This setup gives the container controlled access to the Chrony Unix socket on the host while preventing any broader filesystem access. Configuration: Using the Azure Host as a Time Source The underlying AKS node's (Linux VM) chrony.conf file is configured to sync time from the Azure host through the PTP device (/dev/ptp0). This configuration is optimized for cloud environments and includes: refclock PHC /dev/ptp0 for direct PTP sync makestep 1.0 -1 to immediately correct large drifts on startup This ensures that time metrics reflect highly accurate local synchronization, avoiding public NTP network variability. With these layers combined—secure container build, restricted execution model, and Kubernetes-native deployment—you gain a powerful yet minimalistic time accuracy monitoring solution tailored for financial and regulated environments. Setup Instructions Prerequisites An existing AKS cluster Azure Monitor with Managed Prometheus and Grafana enabled An Azure Container Registry (ACR) to host your image Steps Clone the project repository: git clone https://github.com/Azure/chrony-tracker.git Build the Docker image locally: docker build --platform=linux/amd64 -t chrony-tracker:1.0 . Tag the image for your ACR: docker tag chrony-tracker:1.0 <youracr>.azurecr.io/chrony-tracker:1.0 Push the image to ACR: docker push <youracr>.azurecr.io/chrony-tracker:1.0 Update the DaemonSet YAML (chrony-ds.yaml) to use your ACR image: image: <youracr>.azurecr.io/chrony-tracker:1.0 Apply the DaemonSet: kubectl apply -f chrony-ds.yaml Apply the Prometheus scrape config (ConfigMap): kubectl apply -f ama-metrics-prometheus-config-configmap.yaml Delete the "ama-metrics-xxx" pods from the kube-system namespace to apply the new configurations After these steps, your AKS nodes will be monitored for clock drift. Viewing the Metric in Managed Grafana Once the DaemonSet and ConfigMap are deployed and metrics are being scraped by Managed Prometheus, you can visualize the chrony_clock_error_ms metric in Azure Managed Grafana by following these steps: Open the Azure Portal and navigate to your Azure Managed Grafana resource. Select the Grafana workspace and navigate to the Endpoint by clicking on the URL under Overview From the left-hand menu, select Metrics and then click on + New metric exploration Enter the name of the metric "chrony_clock_error_ms" under Search metrics and click Select You should now be able to view the metric To customize it and view all sources, click on the Open in explorer button Optional: Secure the Metrics Endpoint To enhance the security of the /metrics endpoint exposed by each pod, you can enable basic authentication on the exporter. This requires configuring an HTTP server inside the container with basic authentication. You would also need to update your Prometheus ConfigMap to include authentication credentials . For detailed guidance on securing scrape targets, refer to the Prometheus documentation on authentication and TLS settings. In addition it is recommended to use Private link for Kubernetes monitoring with Azure Monitor and Azure managed Prometheus Learn More If you'd like to explore this solution further or integrate it into your production workloads, the following resources provide valuable guidance: Microsoft Learn: Time sync in Linux VMs chroncy-tracker GitHub repo Azure Monitor and Prometheus Integration Author Dotan Paz Sr. Cloud Solutions Architect, Microsoft513Views0likes0CommentsResiliency Best Practices You Need For your Blob Storage Data
Maintaining Resiliency in Azure Blob Storage: A Guide to Best Practices Azure Blob Storage is a cornerstone of modern cloud storage, offering scalable and secure solutions for unstructured data. However, maintaining resiliency in Blob Storage requires careful planning and adherence to best practices. In this blog, I’ll share practical strategies to ensure your data remains available, secure, and recoverable under all circumstances. 1. Enable Soft Delete for Accidental Recovery (Most Important) Mistakes happen, but soft delete can be your safety net and. It allows you to recover deleted blobs within a specified retention period: Configure a soft delete retention period in Azure Storage. Regularly monitor your blob storage to ensure that critical data is not permanently removed by mistake. Enabling soft delete in Azure Blob Storage does not come with any additional cost for simply enabling the feature itself. However, it can potentially impact your storage costs because the deleted data is retained for the configured retention period, which means: The retained data contributes to the total storage consumption during the retention period. You will be charged according to the pricing tier of the data (Hot, Cool, or Archive) for the duration of retention 2. Utilize Geo-Redundant Storage (GRS) Geo-redundancy ensures your data is replicated across regions to protect against regional failures: Choose RA-GRS (Read-Access Geo-Redundant Storage) for read access to secondary replicas in the event of a primary region outage. Assess your workload’s RPO (Recovery Point Objective) and RTO (Recovery Time Objective) needs to select the appropriate redundancy. 3. Implement Lifecycle Management Policies Efficient storage management reduces costs and ensures long-term data availability: Set up lifecycle policies to transition data between hot, cool, and archive tiers based on usage. Automatically delete expired blobs to save on costs while keeping your storage organized. 4. Secure Your Data with Encryption and Access Controls Resiliency is incomplete without robust security. Protect your blobs using: Encryption at Rest: Azure automatically encrypts data using server-side encryption (SSE). Consider enabling customer-managed keys for additional control. Access Policies: Implement Shared Access Signatures (SAS) and Stored Access Policies to restrict access and enforce expiration dates. 5. Monitor and Alert for Anomalies Stay proactive by leveraging Azure’s monitoring capabilities: Use Azure Monitor and Log Analytics to track storage performance and usage patterns. Set up alerts for unusual activities, such as sudden spikes in access or deletions, to detect potential issues early. 6. Plan for Disaster Recovery Ensure your data remains accessible even during critical failures: Create snapshots of critical blobs for point-in-time recovery. Enable backup for blog & have the immutability feature enabled Test your recovery process regularly to ensure it meets your operational requirements. 7. Resource lock Adding Azure Locks to your Blob Storage account provides an additional layer of protection by preventing accidental deletion or modification of critical resources 7. Educate and Train Your Team Operational resilience often hinges on user awareness: Conduct regular training sessions on Blob Storage best practices. Document and share a clear data recovery and management protocol with all stakeholders. 8. "Critical Tip: Do Not Create New Containers with Deleted Names During Recovery" If a container or blob storage is deleted for any reason and recovery is being attempted, it’s crucial not to create a new container with the same name immediately. Doing so can significantly hinder the recovery process by overwriting backend pointers, which are essential for restoring the deleted data. Always ensure that no new containers are created using the same name during the recovery attempt to maximize the chances of successful restoration. Wrapping It Up Azure Blob Storage offers an exceptional platform for scalable and secure storage, but its resiliency depends on following best practices. By enabling features like soft delete, implementing redundancy, securing data, and proactively monitoring your storage environment, you can ensure that your data is resilient to failures and recoverable in any scenario. Protect your Azure resources with a lock - Azure Resource Manager | Microsoft Learn Data redundancy - Azure Storage | Microsoft Learn Overview of Azure Blobs backup - Azure Backup | Microsoft Learn Protect your Azure resources with a lock - Azure Resource Manager | Microsoft Learn975Views1like0CommentsAzure Extended Zones: Optimizing Performance, Compliance, and Accessibility
Azure Extended Zones are small-scale Azure extensions located in specific metros or jurisdictions to support low-latency and data residency workloads. They enable users to run latency-sensitive applications close to end users while maintaining compliance with data residency requirements, all within the Azure ecosystem.2.9KViews2likes0Comments(Part-2) Leverage Bicep: Standard model to Automate Azure IaaS deployment
Subjects. Those deeply interested in IaC using Azure. Those who understand the basics of Azure Resource Manager Templates and want to work deeply with Bicep. Those who understand the names of services and functions used in Azure IaaS and have experience in building automation. Agenda. How about Bicep Difference between ARM templates and Bicep Basic functionality Bicep Development Environment Sample Code and Explanation Traps and Avoidance Notes. Azure services are evolving every day. This content is based on what we have confirmed as of April 2023.7KViews1like0Comments(Part-1) Leverage Bicep: Standard model to Automate Azure IaaS deployment
Subjects. Those deeply interested in IaC using Azure. Those who understand the basics of Azure Resource Manager Templates and want to work deeply with Bicep. Those who understand the names of services and functions used in Azure IaaS and have experience in building automation. Agenda. How about Bicep Difference between ARM templates and Bicep Basic functionality Bicep Development Environment Sample Code and Explanation Traps and Avoidance Notes. Azure services are evolving every day. This content is based on what we have confirmed as of April 2023.8.5KViews4likes0Comments(Part-3) Leverage Bicep: Standard model to Automate Azure IaaS deployment
Subjects. Those deeply interested in IaC using Azure. Those who understand the basics of Azure Resource Manager Templates and want to work deeply with Bicep. Those who understand the names of services and functions used in Azure IaaS and have experience in building automation. Agenda. How about Bicep Difference between ARM templates and Bicep Basic functionality Bicep Development Environment Sample Code and Explanation Traps and Avoidance Notes. Azure services are evolving every day. This content is based on what we have confirmed as of April 2023.7.5KViews1like1Comment