Blog Post

Azure Virtual Desktop Blog
2 MIN READ

Connection Reliability in Azure Virtual Desktop Insights

Rachelle_Cheung's avatar
Jul 01, 2024

We are thrilled to announce that the Connection Reliability tab in Azure Virtual Desktop Insights is now generally available. IT administrators can now monitor the connection resilience between users and Azure Virtual Desktop host pools. This gives administrators a simpler experience when it comes to understanding disconnection events and correlations between errors that affect their end users.

The Connection Reliability tab provides two primary visuals.

The first is a graph that analyzes and plots the number of disconnections over the concurrent connections during a given time range. This allows administrators to easily detect clusters of disconnects that are impacting connection reliability. Administrators can also analyze connection errors by different pivots—for example client version and IP range—to determine the root cause of disconnects and improve connection reliability.

The second visual provides a table of the top 20 disconnection events and lists the top 20 specific time intervals where the most disconnections occurred. Administrators can select a row in the table to highlight specific segments of the chart to view the disconnections that occurred during those time segments.

To experience the benefits of the Azure Virtual Desktop Insights Connection Reliability tab, sign in to Azure Virtual Desktop Insights and navigate to the Connection Reliability tab. More information can be found here.

Our team is dedicated to enhancing Azure Virtual Desktop Insights and expanding its capabilities to address the evolving needs of our users. We encourage you to explore the features of the Connection Reliability tab and share your experiences to help us guide future development of this and other Azure Virtual Desktop Insights features.


Stay up to date! Bookmark the Azure Virtual Desktop Tech Community.

Updated Jun 28, 2024
Version 1.0
  • Hi MarekS,

    Check out this query and see if it works for you:

     

    WVDConnections
    | as ConnectionData
    | where State == "Started"
    | join kind = leftsemi
    (
        // Only include connections that actually reached the host to prevent short (failed) attempts from skewing the data
        WVDCheckpoints
        | where Source == "RDStack" and Name == "RdpStackConnectionEstablished"
    ) on CorrelationId
    | join kind = leftsemi
    (
        WVDCheckpoints // New sessions only
        | where Name == "LoadBalancedNewConnection"
        | extend LoadBalanceOutcome=Parameters.LoadBalanceOutcome
        | where (LoadBalanceOutcome == "NewSession") 
    ) on CorrelationId
    | join kind=innerunique  // remove connections that do not have ShellStart
    (
        WVDCheckpoints
        | where Name == "ShellStart"
        | project ShellStart= TimeGenerated, CorrelationId
    ) on CorrelationId |project-away CorrelationId1
    | join kind=leftanti // remove connections that have ShellReady
    (
        WVDCheckpoints
        | where Name == "ShellReady"
        | project ShellStart= TimeGenerated, CorrelationId
    ) on CorrelationId 
    | join kind=inner
    (
        WVDConnections  
        | where State == "Completed"  
        | project EndTime=TimeGenerated, CorrelationId
    ) on CorrelationId | project-away  CorrelationId1
    | project CorrelationId, UserName, TimeGenerated, Hostname = trim_end("[.].*", SessionHostName), Duration = EndTime -ShellStart

     

  • MarekS's avatar
    MarekS
    Copper Contributor

    Hello, we are currently struggling with issue caused by KB5043064. People are waiting for hours to login. Some of them just give up, and cancel connection. On Azure Virtual Desktop Insights I see maximum connection time, but only for people who waited till desktop loads:

    If somebody complain I can check per user, but I would like to have dashboard with maximum wait time before users disconnected and desktop not loaded.

    So is the case when ShellStart event is triggered, but ShellRedy is not there, only OnClientDisconnected:

     

     

  • AstraDoc's avatar
    AstraDoc
    Copper Contributor

    Rachelle_Cheung Thanks for providing the query. I tried to run the query as provided and after removing all the whitespaces and didn't run. Keep throwing the error message as Top-level expressions must return tabular results. (Hint: If you're trying to output a scalar value, use the 'print' operator.) 

  • Thank you for reaching out ZackB903!

     

    At a high level, Azure Virtual Desktop Insights is designed to help IT administrators monitor and manage their Azure Virtual Desktop environments. By setting up Log Analytics, you can pipe Azure Virtual Desktop diagnostics data into Azure Virtual Desktop Insights. This setup will provide you with a comprehensive view of your deployment, making it easier to find and troubleshoot problems, and understand resource utilization. If you haven’t already set up Log Analytics, please refer to our documentation here 

     

    In general, all data available in Azure Virtual Desktop Insights can be accessed in a tabular format. You can customize our charts, graphs and queries to suit your specific use cases. For more information, check out this link 

    To collect client disconnect events as time-series data for your monitoring system, consider trying the query below.  

     

    The query takes two parameters at the top: 

    • WindowStart: The beginning of the time frame being analyzed. For example, you might set this to be 12 hours ago. 
    • WindowDuration: The length of the time frame being analyzed. For example, if starting 12 hours ago, to cover even the most recent events, you would set it to 12 hours. 

    The query will output four time series: 

    • TimeStamp: The slices of timestamps mapping to the results reported in the other time series data. 
    • ConcurrentConnections: The peak number of active concurrent connections to the Azure Virtual Desktop session hosts in this time slice. 
    • Disconnects: The number of times a connection is unexpectedly terminated. 
    • PercentageDrop: The portion of connections that disconnected, expressed as a percentage of all concurrent connections at that time.

     

    let WindowStart = ago(3d); // Set this to the start of the analysis window 
    let WindowDuration = 1d;   // Set this to the length of the window 
    let WindowEnd = WindowStart + WindowDuration; 
    let WindowResolution = max_of(WindowDuration / 144, 1m); // Provide 144 data points, but no sub-minute granularity 
    WVDConnections 
    | where (State == "Started" and TimeGenerated between(WindowStart .. WindowEnd))  // All connections that started within the time window 
        or (State == "Completed" and TimeGenerated > WindowStart)                     // All connections that ended after the time window 
        or (State == "Connected" and TimeGenerated between(WindowStart .. WindowEnd)) // All connections established within the time window 
    | summarize 
        StartDate = minif(TimeGenerated, State == "Started"), // Bring the main properties of the connection into a single row 
        EndDate = minif(TimeGenerated, State == "Completed"), 
        ConnectedTime = minif(TimeGenerated, State == "Connected"), 
        UserName = take_any(UserName), 
        SessionHostName = take_any(SessionHostName) 
        by CorrelationId 
    | where isnotempty(ConnectedTime) // Only keep connections that actually got connected to avoid noise from authentication errors, etc. 
    | project CorrelationId, UserName, StartDate, EndDate = coalesce(EndDate, now()), // If the connection does not have a termination event, assume it terminates now 
        SessionHostName, ConnectedTime 
    | where EndDate > WindowStart and ConnectedTime < WindowEnd // Only keep connections that have some overlap with the time window selected 
    | join kind=leftouter 
    ( 
        WVDErrors 
        | where TimeGenerated > WindowStart  
        | summarize FirstError = min(TimeGenerated) by CorrelationId 
    ) 
    on CorrelationId 
    | extend 
        Probe=range(WindowStart, WindowEnd, WindowResolution), 
        EndDate=coalesce(FirstError, EndDate), // For connections that have an error, the earliest error is the disconnection point 
        HasErrors=isnotempty(FirstError) 
    | mv-apply Probe to typeof(datetime) on 
    ( 
        where (Probe between (StartDate .. EndDate) or (EndDate between(Probe .. (WindowResolution + Probe)))) 
        | extend 
            ConnectionContainsProbe = Probe between(StartDate .. EndDate), 
            ConnectionEndedHere=EndDate between(Probe .. (WindowResolution + Probe)) 
    ) 
    | where ConnectionContainsProbe 
    | extend ErrorMarker = (ConnectionEndedHere and HasErrors)               // This connection will be rendered as an error in this time probe 
    | extend ActiveConnection = ConnectionContainsProbe and not(ErrorMarker) // This connection will be rendered as an active connection in this time probe 
    | make-series 
        Disconnects=dcountif(CorrelationId, ErrorMarker), 
        ConcurrentConnections=dcountif(CorrelationId, ActiveConnection), 
        default=0 
        on Probe 
        from WindowStart to WindowEnd step WindowResolution 
    | extend PercentageDrop = series_fill_const(series_divide(Disconnects, series_add(ConcurrentConnections, Disconnects)), 0, todouble("NaN")) // For times with no connections, percentage drop should be zero 
    | project TimeStamp=Probe, ConcurrentConnections, Disconnects, PercentageDrop 
    | render timechart 

     

    If you’d rather handle the results as individual rows, you can use the mv-expand operator. 

     

    With these outputs, you might define an alert based on an absolute number of disconnects happening in a single time slice. You might also set one based on a percentage drop, but only allow it to trigger if there were more than a certain number of other connections, to avoid alerting noise from a single user disconnecting, etc. E.g. You may want to set an alert based on the percentage drop, but only if there are more than 10 connections to avoid alerting based on a single user disconnecting. 

     

    The query shared is based on the visualizations from the Connection Reliability tab in Azure Virtual Desktop Insights, and this new feature will give you a starting point to investigate any disconnection alerts that trigger in your deployments. Let us know how it works out for you, and we would love to hear more about how you monitor your environment in general. Your feedback is incredibly valuable to us, so please feel free to share any insights and questions you have! 

  • ZackB903's avatar
    ZackB903
    Copper Contributor

    Is this data available in tabular format?  It would be very helpful if we could collect client disconnect events as time-series data into our monitoring system so that we can alert on exceptions and proactively handle disruptions.