Thank you for this great post. This documentation contains some additional information to backends in APIM, including circuit breaker and load-balanced pools (both features are in preview state currently): https://learn.microsoft.com/en-us/azure/api-management/backends.
One thing to consider with load-balancing Open AI instances: the deployment needs to be set up identically in all resources, including its name. The name is a param on each REST method. Alternatively it could be injected/overwritten by APIM policies to make it work correctly.
Published Feb 02, 2024
Version 1.0