Collecting and displaying niping network latency measurements with Azure Monitor

Microsoft

Nov 02, 2020

While this post describes a completely Azure-based monitoring situation, it is possible to have the telegraf agent monitor things on-prem, and send that monitoring data to Azure monitor. I tested this out, and there were only a couple of additional configuration steps necessary:

In the blog post we use a 'managed service identity' for the telegraf monitor agent to authenticate with Azure for the data upload. This works great and requires minimal configuration. However, for machines that are not in Azure, you will have to use another method for authentication. I created a service principal (using instructions here: https://docs.microsoft.com/en-us/azure/active-directory/develop/app-objects-and-service-principals#service-principal-object). I used the Azure portal to add the "Monitoring Metrics Publisher" role for my service principal - you can make this permission for the resource group, log analytics workspace or virtual machine metrics that you are using for the metrics.

Then I had to pass the credentials to the Telegraf agent - this is done via Linux environment variables, so I had to arrange for these to be sent to the telegraf system service. First, the file /usr/lib/telegraf/telegraf.service should have the following line:

EnvironmentFile=/etc/default/telegraf

This tells systemd or systemctl to load the file /etc/default/telegraf into the environment before invoking the telegraf agent. Next, I added the following to the /etc/default/telegraf file:

AZURE_TENANT_ID="<your tentant id>"
AZURE_CLIENT_ID="<your client id>"
AZURE_CLIENT_SECRET="<your client secret>"

Of course the values were replaced with the values I got when I created the service principal.

I changed the /etc/telegraf/telegraf.conf file, making the following changes, filling in the values from my deployment:

[[outputs.azure_monitor]]
region = "<your azure region>"
resource_id = "<your log analytics resource id>"

Finally, I restarted the telegraf agent using systemctl restart telegraf. I checked the /var/log/messages file to make sure there were no telegraf errors, and I looked at the Azure portal, and found my new metrics under the resource id I used above.

Blog Post

Collecting and displaying niping network latency measurements with Azure Monitor