Super helpful, good stuff!
Commenting with a couple of issues I encountered that may help others:
- The driver installation script tries to install azcopy from a domain that no longer exists.
In in `azhpc-images/common/install_azcopy.sh`, change `AZCOPY_DOWNLOAD_URL="https://azcopyvnext.azureedge.net/releases/release-${azcopy_release}/${TARBALL}"` to `AZCOPY_DOWNLOAD_URL="https://azcopyvnext-awgzd8g7aagqhzhe.b02.azurefd.net/releases/release-${azcopy_release}/${TARBALL}"` instead. - I followed the steps to create and save a custom VMI to my gallery, but any VM I created using the image had a lot of issues (different ones every time). Was never able to get SGLang running using a VM bootstrapped with the custom VMI.
I went back to provisioning the VM manually which resolved all related issues. - Encountered oom issues when using docker image `rocm/sglang-staging:20250212` which couldn't be resolved by limiting mem block sizes; resolved by using newer `rocm/sglang-staging:20250303` instead.