## Description
# Azure AI Resilient Priority Load Balancing Architecture
This architecture establishes a robust infrastructure for serving OpenAI's GPT-4o models on Azure with high availability and resilience. It features:
1. **Reliable AI Model Access**: Deploys multiple instances of the GPT-4o model across different Azure AI services, enabling applications to access these models via API.
2. **Intelligent Load Balancing**: Distributes requests across multiple OpenAI instances through a backend pool, with each instance having defined priority and weight settings.
3. **Fault Management**: Implements a "circuit breaker" mechanism that temporarily halts calls to failing backends to prevent cascade failures.
4. **Rate Limit Handling**: Automatically manages 429 errors (too many requests) by retrying after the recommended delay.
5. **Enhanced Security**: Uses managed identity authentication, eliminating the need to store secrets or API keys.
6. **API Abstraction**: API Management serves as a facade, providing a unified interface and allowing backend modifications without affecting API consumers.
This architecture is typically employed by enterprise applications requiring reliable and scalable access to AI models, with availability guarantees and protection against demand spikes. It specifically implements a prioritized PTU (Provisioned Throughput Unit) with fallback consumption scenario, where higher priority backends are fully utilized before gracefully falling back to equally-weighted lower priority backends.
**N.B:**
- The Terraform code is automatically generated with best practices and contains variables that you can customize to fir your needs.
- You have full control to change, add, delete resources or their configuration. The newly generated code will reflect these changes.
- You can replace some resources with Terraform modules.
> terraform apply status: successful
>
## Architecture components
| **Component** | **Description** |
|---------------|-----------------|
| Resource Group | Container for all Azure resources |
| Azure OpenAI Service (Priority 1) | GPT-4o deployment in UK South |
| Azure OpenAI Service (Priority 2) | GPT-4o deployments in Sweden Central and France Central |
| API Management | Gateway service with BasicV2 SKU |
| Backend Pool | Collection of OpenAI backends |
| Circuit Breaker | Failure detection mechanism |
| Retry Policy | Request retry mechanism |
| Managed Identity | Authentication mechanism |
| API Management Policy | XML configuration for request handling |
| API Management Subscription | Access control for API consumers |
## Requirements
| Name | Configuration |
| --- | --- |
| Terraform | all versions |
| AzureRM | = 4.17.0 |
| AzAPI | >= 2.2.0 |
| Random | >= 3.6.3 |
| Access | Admin access |
## How to use the architecture
Clone the architecture and modify the following variables according to your needs:
# Azure AI Architecture - Variables and Descriptions
| Variable Name | Description |
|---------------|-------------|
| `api_management_subscription_name` | Name of the API Management subscription for OpenAI |
| `apim_api_openai_desc` | Description for the OpenAI API |
| `apim_api_openai_displayname` | Display name for the OpenAI API |
| `apim_api_openai_format` | Format of the OpenAI API definition |
| `apim_api_openai_name` | Name of the OpenAI API |
| `apim_api_openai_path` | Path for the OpenAI API |
| `apim_api_sub_openai_displayname` | Display name for the OpenAI API subscription |
| `apim_backend_pool_openai_name` | Name of the backend pool for OpenAI |
| `apim_publisher_email` | Publisher email for API Management |
| `apim_publisher_name` | Publisher name for API Management |
| `apim_resource_location` | Location for the API Management resource |
| `apim_resource_name` | Name of the API Management resource |
| `apim_sku` | SKU for the API Management service |
| `openai_api_spec_url` | URL for the OpenAI API specification |
| `openai_api_version` | Version of the OpenAI API |
| `openai_backend_pool_name` | Name of the OpenAI backend pool |
| `openai_config` | Configuration for OpenAI services with priorities and weights |
| `openai_deployment_name` | Name of the OpenAI deployment |
| `openai_model_capacity` | Capacity for the OpenAI model |
| `openai_model_name` | Name of the OpenAI model |
| `openai_model_version` | Version of the OpenAI model |
| `openai_sku` | SKU for the OpenAI service |
| `prefix` | Prefix for resource names |
| `resource_group_location` | Location for the resource group |
| `resource_group_name` | Name of the resource group |
| `tags` | Tags to apply to resources |
| | |
**N.B:**
- Feel free to remove the resources that are not relevant to your use-case.
- Some variables have default values, please change it if it doesn't fit your deployment.
## Maintainer(s)
You can reach out to these maintainers if you need help or assistance:
- [Brainboard team](mailto:support@brainboard.co)
Brainboard is an AI driven platform to visually design and manage cloud infrastructure, collaboratively. It's the only solution that automatically generates IaC code for any cloud provider, with an embedded CI/CD.