Failover Clustering issues

Hi guys, I have just been reading the May 2018 patch notes for WS2016 and noticed these two points:

Improves resiliency in handling network issues that may cause highly available VMs to be turned off because of I/O timeouts or Cluster Shared Volumes dismounted messages.

Addresses an issue that causes the Drain Manager Cluster service to sometimes be stuck in the draining state.

We have been having issues ever since a network storm when we drain nodes or the Cluster Volumes move between the nodes. These issues can include the disks getting stuck in "Online Pending" for close to 5 minutes which kills the Virtual Machines running on the affected volume.

We have installed the latest updates on the nodes (bar the May 2018 patches) and updated the firmware on the SAN and SAN disks. When we did this though we ended up with a few corrupt Cluster Volumes which we managed to get back using chkdsk. Even after the chkdsk the volumes took over 5 minutes to go fully online from "Online Pending".

0 Replies

Windows Admin Center version 2211 is now generally available!

by Trung_Tran on December 13, 2022

50977 Views

5 Likes

26 Replies

Network ATC: What's coming in Azure Stack HCI 22H2

by Dan Cuomo on August 16, 2022

12136 Views

2 Likes

5 Replies

Windows Admin Center version 2110 is now generally available!

by Prasidh_Arora on November 02, 2021

88144 Views

10 Likes

46 Replies

Failover Clustering in Azure

by John Marlin on July 16, 2021

23383 Views

2 Likes

4 Replies

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Failover Clustering issues