Oct 01 2023 06:43 AM
Hello Clusterlovers,
and Specialists. I know there are roundabout 1000 threads in the net about this issue, but it seems everything i tap and read right now is getting purple and im getting stucked. I Invested a lot of hours and could have installed my system several times completly new, but wana find the problem and increase FCI troubleshooting.
new Azure VMs - 2 Core 8gb
Server 2022 + SQL 2022 - fully updated
80 + 5gb discs
2 nodes FCI + Azure LB
Ping fine between nodes and DC (do i need accelerated network?)
FW turned off
NSG Rules allow traffic between servers
Everything worked fine till one day i restarted the whole environment (only programspecific changes before absolutley not regarding the FCI or AD).
The service on the nodes always shutting down. The nodes switching all the time and starting/stopping the whole time. If i start the cluster it comes up after roundabout 30sek for 1 or 2 sek in the GUI.
The 2 clustermanaged Discs are all the time offline!
It doesnt work as well if i choose only one node, eg stopping one.
Cluster Validation has shown problems with the discs bringing online at first. Automount is on. Finally could have solve the discproblems with additional AllowBusTypeRAID entry in the registry. (very confused why it worked before without ... winupdate?)
Right now the complete validation with the 2 nodes only says that there should probably a second nic be installed them. (and node2 suffering winupdates, but this is nonsense). Everything else green.
ID 1146 usually 4 times than 1795
I searched the whole (700mb now) log for ID 1230 an 1068 but couldnt find those. Than i restarted all machines and made a log of the last minutes and tried to interprete line per line what happens, but it simply to specific for my knowledge it seems.
May i post the log here from the last minutes after start and some1 catches the problem instantly.
Showing events 7024 (and for shure lots of 7028) to the cluster svc itsself combined with one 1177 anf 1135. Found articles to rebuild CLUSDB.BLF but the event still occur.
As well there is an ID 4 Event for SPNs, but all my resources are registered and shown up.
For me it looks like theres a core problem with the discs, or one step before. That leads me to a egg hen problem and im rotating in circles. Quiet a lot of different solutions needs the clu svc to work, so im limited there. Once i forcestarted the clu without quorum, tryin to fix quorum but yesterday when i kast tried this way it didnt work, same failure on nodes always stopping.
Im aware this is a big field, but any help or hints appreciated.
Cheers, Klaus