Created by
Chafik Belhaoues

Azure AI Model Routing and Circuit Breaker (APIM)

    Azure ,OpenAI ,Gateway,AI Load Balancing,Fault Isolation,Intelligent Failover,Multi-Region,Request Distribution,API Gateway Resilience,AI Service Availability,Model Endpoint Balancing,AI Service Mesh,AI Availability Zones
    Azure AI Model Routing and Circuit Breaker (APIM)
    ## Description ![image](https://github.com/Azure-Samples/AI-Gateway/raw/main/images/model-routing.gif) This solution provides an enterprise-grade Azure OpenAI implementation with built-in redundancy, scalability, and monitoring. It features three separate OpenAI service pools distributed across multiple regions, all unified behind an API Management gateway that handles load balancing and access control. A comprehensive monitoring stack with Application Insights, Log Analytics, and custom dashboards offers detailed visibility into system performance and usage patterns. Key advantages include: High availability through multi-region deployment Enhanced token throughput via distributed model instances Centralized API management and security controls Complete observability with pre-configured analytics Flexible scaling options to accommodate growing demands This architecture is designed to support production workloads with the reliability, security, and governance features required for business-critical AI applications. **N.B:** - The Terraform code is automatically generated with best practices and contains variables that you can customize to fir your needs. - You have full control to change, add, delete resources or their configuration. The newly generated code will reflect these changes. - You can replace some resources with Terraform modules. > terraform apply status: successful > ## Architecture components | Component | Description | Purpose | |-----------|-------------|---------| | **Azure OpenAI Services** | | OpenAI Pool 1 | Multiple Azure OpenAI instances in different regions | Provides primary model deployment with geographic redundancy | | OpenAI Pool 2 | Multiple Azure OpenAI instances in different regions | Provides secondary model deployment with geographic redundancy | | OpenAI Pool 3 | Multiple Azure OpenAI instances in different regions | Provides tertiary model deployment with geographic redundancy | | **API Management** | | APIM Instance | Central API Management service | Acts as gateway, providing a single endpoint for all OpenAI interactions | | APIM Logger | Logging component for API Management | Captures detailed information about API requests and responses | | APIM Subscription | API key management | Controls access to the OpenAI APIs | | **Backend Pools** | | Backend Pool 1 | Configuration for OpenAI Pool 1 endpoints | Manages load balancing across Pool 1 instances | | Backend Pool 2 | Configuration for OpenAI Pool 2 endpoints | Manages load balancing across Pool 2 instances | | Backend Pool 3 | Configuration for OpenAI Pool 3 endpoints | Manages load balancing across Pool 3 instances | | **Monitoring** | | Log Analytics Workspace | Centralized logging repository | Stores and indexes all system logs for analysis | | Application Insights | Application monitoring service | Provides real-time telemetry and performance monitoring | | Usage Analysis Workbook | Custom dashboard | Visualizes OpenAI usage patterns and metrics | | Diagnostic Settings | Resource logging configuration | Captures detailed logs from all Azure resources | | Alerts | Automated notifications | Triggers notifications based on predefined thresholds | | Azure Monitor | Global monitoring platform | Provides a unified view of all infrastructure metrics | | Resource Health | Service health monitoring | Monitors the health status of Azure OpenAI services | | **Security** | | IP Restrictions | Network security rules | Limits API access to specified IP addresses | | Subscription Keys | API authentication | Secure access control for the OpenAI endpoints | ## Requirements | Name | Configuration | | --- | --- | | Terraform | all versions | | Provider: azurerm | >= 5.33.0 | | Provider: azapi | >= 1.11.0 | | Provider: random | >= 3.5.1 | | Provider: time | >= 0.10.0 | | Access | Admin access | | Azure CLI | Required for authentication | | Azure OpenAI | Resource provider registration required | | API Management | Resource provider registration required | | Log Analytics | Resource provider registration required | | Application Insights | Resource provider registration required | | Subscription | Access to create Cognitive Services | | Regions | Support for Azure OpenAI service | | Resource quotas | Sufficient OpenAI deployment quotas | ## How to use the architecture Clone the architecture and modify the following variables according to your needs: | Variable | Description | |---------------|-------------| | **Log Analytics & Application Insights** | | `log_analytics_name` | Name of the Log Analytics resource | | `log_analytics_location` | Location of the Log Analytics resource | | `application_insights_name` | Name of the Application Insights resource | | `application_insights_location` | Location of the Application Insights resource | | **API Management Logger** | | `apim_logger_name` | Name of the APIM Logger | | `apim_logger_description` | Description of the APIM Logger | | `api_diagnostics_log_bytes` | Number of bytes to log for API diagnostics | | **Workbook** | | `workbook_name` | Name of the Workbook | | `workbook_location` | Location of the Workbook | | `workbook_display_name` | Display name of the Workbook | | **OpenAI Configuration** | | `index` | Index used to generate unique names | | `openai_config_1` | List of OpenAI resources to create for Pool 1 | | `openai_config_2` | List of OpenAI resources to create for Pool 2 | | `openai_config_3` | List of OpenAI resources to create for Pool 3 | | `openai_deployment_name_1` | Name of deployment 1 | | `openai_deployment_name_2` | Name of deployment 2 | | `openai_deployment_name_3` | Name of deployment 3 | | `openai_sku` | Azure OpenAI SKU | | `openai_model_name_1` | Name of model 1 | | `openai_model_version_1` | Version of model 1 | | `openai_model_name_2` | Name of model 2 | | `openai_model_version_2` | Version of model 2 | | `openai_model_name_3` | Name of model 3 | | `openai_model_version_3` | Version of model 3 | | `openai_model_capacity` | Model capacity | | **API Management** | | `apim_resource_name` | Name of the API Management resource | | `apim_resource_location` | Location of the APIM resource | | `apim_sku` | The pricing tier of this API Management service | | `apim_sku_count` | The instance size of this API Management service | | `apim_publisher_email` | The email address of the service owner | | `apim_publisher_name` | The name of the service owner | | **OpenAI API Configuration** | | `openai_api_name` | The name of the APIM API for the OpenAI API | | `openai_api_path` | The relative path of the APIM API for the OpenAI API | | `openai_api_display_name` | The display name of the APIM API for the OpenAI API | | `openai_api_description` | The description of the APIM API for the OpenAI API | | `openai_api_spec_url` | Complete URL for the OpenAI API specification | | **OpenAI Subscription** | | `openai_subscription_name` | The name of the APIM subscription for the OpenAI API | | `openai_subscription_description` | The description of the APIM subscription for the OpenAI API | | **OpenAI Backend Pools** | | `openai_backend_pool_name_1` | The name of the OpenAI backend pool 1 | | `openai_backend_pool_name_2` | The name of the OpenAI backend pool 2 | | `openai_backend_pool_name_3` | The name of the OpenAI backend pool 3 | | `openai_backend_pool_description` | The description of the OpenAI backend pool | | **Additional Infrastructure** | | `tags` | Default tags to apply to all resources | | | | **N.B:** - Feel free to remove the resources that are not relevant to your use-case. - Some variables have default values, please change it if it doesn't fit your deployment. ## Maintainer(s) You can reach out to these maintainers if you need help or assistance: - [Brainboard team](mailto:support@brainboard.co)

    It’s up to you now to build great things.