Many organizations running workloads in cloud-native environments, namely financial services such as trading companies and banks, may be required to monitor their workloads for accurate and consistent timekeeping across distributed systems. Whether it’s validating trades, issuing security tokens, or correlating logs for incident response, drifted clocks introduce uncertainty that customers can not afford.
In this blog post, I will share how customers can monitor their Azure Kubernetes Service (AKS) clusters for time drifts using a custom container image, Azure managed Prometheus and Grafana.
Understanding Time Sync in Cloud Environments
Azure’s underlying infrastructure uses Microsoft-managed Stratum 1 time servers connected to GPS-based atomic clocks to ensure a highly accurate reference time. Linux VMs in Azure can synchronize either with their Azure host via Precision Time Protocol (PTP) devices like /dev/ptp0, or with external NTP servers over the public internet. The Azure host, being physically closer and more stable, provides a lower-latency and more reliable time source.
On Azure, Linux VMs use chrony, a Linux time synchronization service. It provides superior performance under varying network conditions and includes advanced capabilities for handling drift and jitter. Terminology like "Last offset" (difference between system and reference time), "Skew" (drift rate), and "Root dispersion" (uncertainty of the time measurement) help quantify how well a system's clock is aligned.
Solution Overview
At the time of writing this article, it is not possible to monitor clock errors on Azure Kubernetes Service nodes directly, since node images can not be customized and are managed by Azure.
Customers may ask "How do we prove our AKS workloads are keeping time accurately?" To address this, I've developed a solution that consists of a custom container image running as a DaemonSet, which generates Prometheus metrics and can be visualized on Grafana dashboards, to continuously monitor time drift across Kubernetes nodes.
This solution deploys a containerized Prometheus exporter to every node in the Azure Kubernetes Service (AKS) cluster. It exposes a metric representing the node's time drift, allowing Prometheus to scrape the data and Azure Managed Grafana to visualize it. The design emphasizes security and simplicity: the container runs as a non-root user with minimal privileges, and it securely accesses the Chrony socket on the host to extract time synchronization metrics.
As we walk through the solution, it is recommended that you follow along with code on GitHub.
Technical Deep Dive: From Image Build to Pod Execution
The custom container image is built around a Python script (chrony_exporter.py) that runs the chronyc tracking command, parses its output, and calculates a 'clock error' value. This value is calculated in the following way:
clock_error = |last_offset| + root_dispersion + (0.5 × root_delay)
This script then exports the result via a Prometheus-compatible HTTP endpoint. The only dependency it requires is the prometheus_client library, defined in the requirements.txt file
Secure Entrypoint with Limited Root Access
The container is designed to run as a non-root user. The entrypoint.sh script launches the Python exporter using sudo, which is the only command that this user is allowed to run with elevated privileges. This ensures that while root is required to query chronyc, the rest of the container operates with a strict least-privilege model:
#!/bin/bash
echo "Executing as non-root user: $(whoami)"
sudo /app/venv/bin/python /app/chrony_exporter.py
By restricting the sudoers file to a single command, this approach allows safe execution of privileged operations without exposing the container to unnecessary risk.
DaemonSet with Pod Hardening and Host Socket Access
The deployment is defined as a Kubernetes DaemonSet (chrony-ds.yaml), ensuring one pod runs on each AKS node. The pod has the following hardening and configuration settings:
- Runs as non-root (runAsUser: 1001, runAsNonRoot: true)
- Read-only root filesystem to minimize tampering risk and altering of scripts
- HostPath volume mount for /run/chrony so it can query the Chrony daemon on the node
- Prometheus annotations for automated metric scraping
Example DaemonSet snippet:
securityContext:
runAsUser: 1001
runAsGroup: 1001
runAsNonRoot: true
containers:
- name: chrony-monitor
image: <chrony-image>
command: ["/bin/sh", "-c", "/app/entrypoint.sh"]
securityContext:
readOnlyRootFilesystem: true
volumeMounts:
- name: chrony-socket
mountPath: /run/chrony
volumes:
- name: chrony-socket
hostPath:
path: /run/chrony
type: Directory
This setup gives the container controlled access to the Chrony Unix socket on the host while preventing any broader filesystem access.
Configuration: Using the Azure Host as a Time Source
The underlying AKS node's (Linux VM) chrony.conf file is configured to sync time from the Azure host through the PTP device (/dev/ptp0). This configuration is optimized for cloud environments and includes:
- refclock PHC /dev/ptp0 for direct PTP sync
- makestep 1.0 -1 to immediately correct large drifts on startup
This ensures that time metrics reflect highly accurate local synchronization, avoiding public NTP network variability.
With these layers combined—secure container build, restricted execution model, and Kubernetes-native deployment—you gain a powerful yet minimalistic time accuracy monitoring solution tailored for financial and regulated environments.
Setup Instructions
Prerequisites
- An existing AKS cluster
- Azure Monitor with Managed Prometheus and Grafana enabled
- An Azure Container Registry (ACR) to host your image
Steps
- Clone the project repository:
git clone https://github.com/Azure/chrony-tracker.git
- Build the Docker image locally:
docker build --platform=linux/amd64 -t chrony-tracker:1.0 .
- Tag the image for your ACR:
docker tag chrony-tracker:1.0 <youracr>.azurecr.io/chrony-tracker:1.0
- Push the image to ACR:
docker push <youracr>.azurecr.io/chrony-tracker:1.0
- Update the DaemonSet YAML (chrony-ds.yaml) to use your ACR image:
image: <youracr>.azurecr.io/chrony-tracker:1.0
- Apply the DaemonSet:
kubectl apply -f chrony-ds.yaml
- Apply the Prometheus scrape config (ConfigMap):
kubectl apply -f ama-metrics-prometheus-config-configmap.yaml
- Delete the "ama-metrics-xxx" pods from the kube-system namespace to apply the new configurations
After these steps, your AKS nodes will be monitored for clock drift.
Viewing the Metric in Managed Grafana
Once the DaemonSet and ConfigMap are deployed and metrics are being scraped by Managed Prometheus, you can visualize the chrony_clock_error_ms metric in Azure Managed Grafana by following these steps:
- Open the Azure Portal and navigate to your Azure Managed Grafana resource.
- Select the Grafana workspace and navigate to the Endpoint by clicking on the URL under Overview
- From the left-hand menu, select Metrics and then click on + New metric exploration
- Enter the name of the metric "chrony_clock_error_ms" under Search metrics and click Select
- You should now be able to view the metric
- To customize it and view all sources, click on the Open in explorer button
Optional: Secure the Metrics Endpoint
To enhance the security of the /metrics endpoint exposed by each pod, you can enable basic authentication on the exporter. This requires configuring an HTTP server inside the container with basic authentication. You would also need to update your Prometheus ConfigMap to include authentication credentials .
For detailed guidance on securing scrape targets, refer to the Prometheus documentation on authentication and TLS settings.
In addition it is recommended to use Private link for Kubernetes monitoring with Azure Monitor and Azure managed Prometheus
Learn More
If you'd like to explore this solution further or integrate it into your production workloads, the following resources provide valuable guidance:
- Microsoft Learn: Time sync in Linux VMs
- chroncy-tracker GitHub repo
- Azure Monitor and Prometheus Integration
Author
Dotan Paz
Sr. Cloud Solutions Architect, Microsoft
Updated Apr 21, 2025
Version 1.0Dotanp
Microsoft
Joined March 12, 2025
Azure Infrastructure Blog
Follow this blog board to get notified when there's new activity