Forum Discussion

DanielF1395's avatar
DanielF1395
Copper Contributor
Aug 28, 2024

HCI Stack 23H2 WMI - RPC Server is Unavailable

After updating to Solution 10.2405.2.7 I am unable to manage my cluster through the failover manager and unable to run cluster validations due to the error RPC Server is unavailable. I have restarted the nodes, confirmed firewall rules are allowing WMI and RPC, and confirmed DNS. Azure and WAC are reporting no issues with the cluster, I am worried about applying the next solution in case this WMI issue will cause issues with the updates. 

 

Any guidance to resolve this fault would be much appreciated

 

 

  • FrankKeunen's avatar
    FrankKeunen
    Copper Contributor

    DanielF1395 

    We are encountering the same issue with a two-node Azure Stack HCI cluster (Dell MC-760) of one of our customers (same version). We are collaborating with Dell and Microsoft to resolve it.

     

    "

    That error is straight out of COM code inspection of firewall behavior.  Firewall (or something else?) is preventing the automatically generated DCOM code in our service from opening a TCP port. It's not even reaching our custom code, it's in DCOM machinery. 

     

    Let me set expectation that this may take time during the investigation as it becomes a bit complicated. There will be some modification into the RPCSS process and iDNA may require with the network stack debugging (it depend how far we will go through this). Therefore, it wont be straight forward solution.

    "

     

    Once I have an update, I will provide you with the details.

     

     

     

    • DanielF1395's avatar
      DanielF1395
      Copper Contributor
      I was going to update today, thank you for the reminder.

      Please find the fix below from MS Support

      Found that there has been a known issue since the latest update, a rule seems to be causing the issue. The workaround is to run the following on all nodes

      Disable-NetFirewallRule AzsHci-ImdsAttestation-Block-TCP-In
      • FrankKeunen's avatar
        FrankKeunen
        Copper Contributor

        DanielF1395 

        That's correct — we identified the same fix during a troubleshooting session last night. We enabled Drift Control, which allowed us to disable the Windows Firewall on one of the nodes. Once the firewall was disabled, we were able to perform a CUA scan on the affected node.

         

        We also compared the firewall ruleset with a working cluster running an older version of Azure Stack HCI and disabled the new rules that were added with the latest release from Microsoft. We concluded that the issue was - indeed - caused by the Firewall rule "AzsHci-ImdsAttestation-Block-TCP-In".

         

         

Resources