Home

Failover Clustering issues

%3CLINGO-SUB%20id%3D%22lingo-sub-191827%22%20slang%3D%22en-US%22%3EFailover%20Clustering%20issues%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-191827%22%20slang%3D%22en-US%22%3EHi%20guys%2C%20I%20have%20just%20been%20reading%20the%20May%202018%20patch%20notes%20for%20WS2016%20and%20noticed%20these%20two%20points%3A%3CBR%20%2F%3E%3CBR%20%2F%3E%3CB%3EImproves%20resiliency%20in%20handling%20network%20issues%20that%20may%20cause%20highly%20available%20VMs%20to%20be%20turned%20off%20because%20of%20I%2FO%20timeouts%20or%20Cluster%20Shared%20Volumes%20dismounted%20messages.%3C%2FB%3E%3CBR%20%2F%3E%3CBR%20%2F%3E%3CB%3EAddresses%20an%20issue%20that%20causes%20the%20Drain%20Manager%20Cluster%20service%20to%20sometimes%20be%20stuck%20in%20the%20draining%20state.%3C%2FB%3E%3CBR%20%2F%3E%3CBR%20%2F%3EWe%20have%20been%20having%20issues%20ever%20since%20a%20network%20storm%20when%20we%20drain%20nodes%20or%20the%20Cluster%20Volumes%20move%20between%20the%20nodes.%20These%20issues%20can%20include%20the%20disks%20getting%20stuck%20in%20%22Online%20Pending%22%20for%20close%20to%205%20minutes%20which%20kills%20the%20Virtual%20Machines%20running%20on%20the%20affected%20volume.%3CBR%20%2F%3E%3CBR%20%2F%3EWe%20have%20installed%20the%20latest%20updates%20on%20the%20nodes%20(bar%20the%20May%202018%20patches)%20and%20updated%20the%20firmware%20on%20the%20SAN%20and%20SAN%20disks.%20When%20we%20did%20this%20though%20we%20ended%20up%20with%20a%20few%20corrupt%20Cluster%20Volumes%20which%20we%20managed%20to%20get%20back%20using%20chkdsk.%20Even%20after%20the%20chkdsk%20the%20volumes%20took%20over%205%20minutes%20to%20go%20fully%20online%20from%20%22Online%20Pending%22.%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-191827%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EClustering%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EHyper-V%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E
Andrew Baker
Occasional Contributor
Hi guys, I have just been reading the May 2018 patch notes for WS2016 and noticed these two points:

Improves resiliency in handling network issues that may cause highly available VMs to be turned off because of I/O timeouts or Cluster Shared Volumes dismounted messages.

Addresses an issue that causes the Drain Manager Cluster service to sometimes be stuck in the draining state.

We have been having issues ever since a network storm when we drain nodes or the Cluster Volumes move between the nodes. These issues can include the disks getting stuck in "Online Pending" for close to 5 minutes which kills the Virtual Machines running on the affected volume.

We have installed the latest updates on the nodes (bar the May 2018 patches) and updated the firmware on the SAN and SAN disks. When we did this though we ended up with a few corrupt Cluster Volumes which we managed to get back using chkdsk. Even after the chkdsk the volumes took over 5 minutes to go fully online from "Online Pending".
Related Conversations
Extentions Synchronization
Deleted in Discussions on
3 Replies
Tabs and Dark Mode
cjc2112 in Discussions on
36 Replies
flashing a white screen while open new tab
Deleted in Discussions on
14 Replies
Stable version of Edge insider browser
HotCakeX in Discussions on
35 Replies