Would these same instructions work for a node pool with standard_nc24rs_v3? I have tried this setup for a node pool with standard_nc24rs_v3 nodes, and it seems to not work. The error that I see in the rdma pod is:
rdma-shared-dp 2023/12/12 09:46:36 error creating new device: "error getting driver info for device 32ea:00:02.0 readlink /sys/bus/pci/devices/32ea:00:02.0/driver: no such file or directory" │
│ rdma-shared-dp 2023/12/12 09:46:36 Warning: no devices in device pool, creating empty resource server for rdma_shared_device_a │
│ rdma-shared-dp 2023/12/12 09:46:36 Warning: no Rdma Devices were found for resource rdma_shared_device_a │
│ rdma-shared-dp 2023/12/12 09:46:36 Starting all servers... │
│ rdma-shared-dp 2023/12/12 09:46:36 starting rdma/rdma_shared_device_a device plugin endpoint at: rdma_shared_device_a.sock │
│ rdma-shared-dp 2023/12/12 09:46:36 rdma/rdma_shared_device_a device plugin endpoint started serving │
│ rdma-shared-dp 2023/12/12 09:46:36 All servers started. │
│ rdma-shared-dp 2023/12/12 09:46:36 Listening for term signals │
│ rdma-shared-dp 2023/12/12 09:46:36 Starting OS watcher. │
│ rdma-shared-dp 2023/12/12 09:46:37 rdma_shared_device_a.sock gets registered successfully at Kubelet │
│ rdma-shared-dp 2023/12/12 09:46:37 ListAndWatch called by kubelet for: rdma/rdma_shared_device_a │
│ rdma-shared-dp 2023/12/12 09:46:37 Updating "rdma/rdma_shared_device_a" devices │
│ rdma-shared-dp 2023/12/12 09:46:37 exposing "0" devices
Edit: It seems you need to have an older OFED driver according to https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/ . There doesn't seem to be a container which has 4.9 which is required for NVIDIA ConnectX-3 Pro