From Drift to Self-Healing: Building a Multi-Repo Azure AI Infrastructure You Can Actually Trust