Forum Discussion

shotime's avatar
shotime
Brass Contributor
Jan 15, 2024
Solved

About HPC run MPI Service

Hi All,   I am a beginner in MS HPC. My environment, Head Node HA and Compute Node, they are in different network segments but only have one network adapter. My diagnostic tests for MPI will fail,...
  • kyazaferr's avatar
    Sep 02, 2024
    Different Network Segments: Your Head Nodes (in a High Availability setup) and Compute Nodes are on different network segments and have only one network adapter each. This setup can cause MPI communication issues if the nodes cannot "see" each other on the same network.

    Diagnostic Test Failure: The error message indicates that the diagnostic tests for MPI are failing. The reasons mentioned include:

    Network issues between nodes.
    Missing files required for running the test on Compute Nodes.
    Firewall rules blocking the necessary communication ports or types.

    MPI Configuration:

    Verify that your MPI settings are correct. Ensure that all nodes are properly configured in the MPI environment and can resolve each other's hostnames.
    You may need to configure the MPI environment to use a specific interface or network adapter that has access across segments.

    Diagnostics and Logs:

    Review the diagnostic logs from the HPC cluster to identify any specific errors or messages related to network or file access issues.
    Use tools like ping or tracert to check the network connectivity between nodes. Additionally, try running simpler tests or diagnostics to isolate network-related issues.

    File Availability:

    Ensure that any files needed for the MPI diagnostic tests are present on all nodes. If this is a custom diagnostic test, double-check that all required files have been deployed to every Compute Node, especially new ones.

Resources