AI-Ready Infrastructure Design - A pattern for Enterprise Scale
Published Sep 04 2024 06:15 AM 1,894 Views
Microsoft

I. Introduction

As artificial intelligence (AI) continues to percolate into the industries across the board, enterprises are increasingly looking to integrate AI capabilities into their operations. However, the adoption of AI at an enterprise scale comes with its own set of challenges. Organizations often struggle with managing multiple AI services, ensuring consistent governance, and maintaining security across various business units.

II. The Problem: Challenges in Enterprise AI Adoption

Before diving into the solution, let's examine the key challenges that enterprises face when adopting AI at scale:

  1. Decentralized AI service management: As different departments adopt AI solutions independently, organizations often end up with a fragmented landscape of AI services, making it difficult to maintain consistency and control.
  2. Inconsistent governance and security practices: With decentralized AI adoption, ensuring uniform governance policies and security measures across all AI implementations becomes a significant challenge.
  3. Difficulty in tracking usage and implementing charge-back mechanisms: Without a centralized system, it's challenging to monitor AI service usage across the organization and implement fair charge-back processes for different business units.
  4. Scalability and resilience concerns: As AI usage grows within an organization, ensuring that the infrastructure can scale accordingly and remain resilient becomes increasingly important.
  5. Cost Management of AI Services: Unmonitored and inefficient API usage can lead to budget overruns, misaligned expenditures, and challenges in fairly allocating costs across departments, making it difficult to ensure that API expenses directly contribute to business objectives.

III. The Solution: AI Hub Gateway Landing Zone

The AI Hub Gateway Landing Zone is a solution accelerator that addresses these challenges by providing a centralized architecture for managing AI services within an organization. It serves as a single point of entry for AI services, enabling consistent management and governance. 

Key features and benefits:

  1. Centralized AI API Gateway: Acts as a hub for all AI services, providing a unified entry point that can be shared across multiple use-cases in a secure and governed manner.
  2. Seamless integration with Azure AI services: Allows for easy updating of endpoints and keys in existing applications to switch to the AI Hub Gateway.
  3. AI routing and orchestration: Provides mechanisms to route and orchestrate AI services based on priority and target models, ensuring consistent management and governance.
  4. Granular access control: Uses managed identities instead of master keys to access AI services, enhancing security while allowing consumers to use gateway keys.
  5. Private connectivity: Designed to be deployed in a private network, utilizing private endpoints to access AI services securely.
  6. Capacity management: Offers mechanisms to manage capacity based on requests and tokens, ensuring optimal resource utilization.
  7. Usage & charge-back: Implements tracking of usage and charge-back to respective business units, with flexible integration options for existing charge-back and data platforms.
  8. Resilient and scalable: Utilizes Azure API Management with zonal redundancy and regional gateways to provide a scalable and resilient solution.
  9. Full observability: Integrates with Azure Monitor, Application Insights, and Log Analytics to provide detailed insights into performance, usage, and errors.
  10. Hybrid support: Supports deployment of backends and gateways on Azure, on-premises, or other clouds, offering flexibility in infrastructure choices.

IV. How It Works: Architecture and Components

The AI Hub Gateway Landing Zone leverages several Azure components to create a robust and scalable solution:

  1. Azure API Management: Powers most of the GenAI gateway capabilities, serving as the central hub for managing API requests.
  2. Application Insights: Provides critical insights on the gateway's operational performance, including a dashboard for key metrics.
  3. Event Hub: Used for streaming usage and charge-back data to target data and charge-back platforms.
  4. Azure OpenAI: Deploys instances across multiple regions to provide access to cutting-edge generative models.
  5. Cosmos DB: Stores usage and charge-back data in a fully managed NoSQL database.
  6. Azure Function App: Supports real-time event processing for usage and charge-back data.
  7. User Managed Identity: Enables secure access between different Azure services without exposing credentials.
  8. Virtual Network: Hosts the Azure API Management and other Azure resources in a secure network environment.
  9. Private Endpoints & Private DNS Zones: Enable private connectivity for various Azure services, enhancing security.

architecture-1-0-6.png

 

AI Hub Gateway Primary Components:

  • Hub Performance Monitoring:
    • Monitors the performance and health of the AI services and infrastructure.
    • Connected to other components via private links, ensuring secure and dedicated communication.
  • AI Usage Metrics (Chargeback) Event Hub:
    • Gathers and processes usage data for AI services.
    • Likely used for chargeback or cost management, providing insights into how resources are consumed.
    • Data is processed and sent to the Data Platform for visualization and reporting.

API Management (AI API Gateway):

  • Centralizes the management of API calls related to AI services.
  • Routes API calls for AI services such as OpenAI, cognitive services, and other 3rd party models.
  • Facilitates secure access to various AI/ML services, ensuring consistent and controlled API usage.
  • Configured with auto-failover to enhance reliability, indicating a robust disaster recovery setup.

Data Platform:

  • Handles Visualization & Reports for AI usage metrics.
  • Connected through a private link, ensuring secure data transport from the Event Hub.

Public and Private Traffic:

  • Public Traffic:
    • From external applications like the Retail Smart Shopping App, Customer Care Chat, and Finance Smart Analysis, API calls are made to the API Gateway via the Internet.
    • These API calls are handled by DNS and DMZ network appliances before being routed to the API management layer.
  • Private Traffic:
    • Used for communication between internal components (e.g., Hub Performance Monitoring, AI Usage Metrics, etc.) and for secure access to backend AI services.

AI Services:

  • Backend Systems:
    • Includes services like Azure App Services, AKS (Azure Kubernetes Service), ACA (Azure Container Apps), which host and orchestrate AI workloads.
  • AI/ML Services:
    • Cognitive services, Azure AI Search, 3rd Party LLMs (Large Language Models), and custom Machine Learning models.

Integration with OpenAI and Other AI Providers:

  • Primary and Secondary Service Regions:
    • Endpoint 1 (Primary Region, e.g., PTU) and Endpoint 2 (Secondary Region, e.g., PAYG) indicate the geographical distribution of AI services, ensuring availability and redundancy.
    • API calls to OpenAI services are routed via the API Management layer.

Security & Networking:

  • DMZ Network Appliances:

    • These are likely firewalls, intrusion detection/prevention systems (IDS/IPS), or other network security appliances, safeguarding the entry point of the AI hub.
  • Private Links:
    • Emphasize secure communication between internal components, reducing exposure to external threats.

AI Orchestrators:

  • Likely responsible for managing the deployment, scaling, and lifecycle of AI models and services.

Overall Architecture Insights:

  • The architecture is designed for centralized governance of AI services, with emphasis on security, performance monitoring, and cost management.
  • The API Gateway plays a pivotal role in standardizing and securing access to AI services, whether internal or third-party.
  • Redundancy and failover mechanisms ensure high availability and business continuity.
  • The use of private links highlights the priority on secure communication within the infrastructure.

The deployment process is streamlined using Azure Developer CLI (azd) or Bicep (IaC), allowing for a one-click deploy option that sets up all necessary components in your Azure subscription.

V. Getting Started

To get started with the AI Hub Gateway Solution Accelerator, you'll need:

Prerequisites

  • An Azure Account (new users can get free credits to start)
  • Azure subscription with access to Azure OpenAI service
  • Appropriate Azure account permissions (e.g., User Access Administrator or Owner)

For local development:

  • Azure CLI
  • Azure Developer CLI (azd)
  • VS Code

Deployment Options

The solution offers a one-click deploy option using either Azure Developer CLI (azd) or Bicep (IaC). Here's a basic deployment process:

  1. Clone the repository
  2. Review and adjust the main.bicep file for your configuration needs
  3. Run the following commands:

 

 

 

 

azd auth login
azd env new ai-hub-gateway-dev
azd up

 

 

 

 

Basic Configuration and Customization

The main.bicep file allows you to customize various aspects of the deployment, including:

  • OpenAI instances and their locations
  • Model deployments and capacities
  • Network configuration

Always ensure you have sufficient OpenAI capacity in the selected regions before deployment.

VI. Conclusion

The AI Hub Gateway Solution Accelerator represents a significant step forward in enterprise AI adoption. By providing a centralized, secure, and scalable architecture for managing AI services, it addresses many of the challenges organizations face when implementing AI at scale.

 

We encourage you to try out the AI Hub Gateway Solution Accelerator and experience firsthand how it can transform your organization's approach to AI service management. Whether you're just starting your AI journey or looking to optimize your existing AI infrastructure, this solution provides a solid foundation for growth and innovation.

1 Comment
Co-Authors
Version history
Last update:
‎Sep 04 2024 06:19 AM
Updated by: