Azure Blue-Green AKS Deployment

This architecture implements a comprehensive Blue-Green deployment strategy for Azure Kubernetes Service (AKS) clusters, following Microsoft's architectural guidance for high-availability deployments.
Chafik Belhaoues
Chafik Belhaoues
Updated
September 14, 2025
2
Azure Blue-Green AKS Deployment
# Azure AKS Blue-Green Deployment Architecture This architecture implements a comprehensive Blue-Green deployment strategy for Azure Kubernetes Service (AKS) clusters, following Microsoft's architectural guidance for high-availability deployments. ## Prerequisites - **Azure CLI** configured and authenticated (`az login`) - **Terraform** installed (>= 1.3) - **Azure subscription** with sufficient permissions to create resources - **Kubernetes CLI (kubectl)** for cluster management - **Helm** (optional, for application deployments) ## Architecture Overview This implementation creates a complete Blue-Green deployment infrastructure for AKS including: ### Core Components - **Two AKS Clusters** (Blue and Green) for zero-downtime deployments - **Azure Application Gateway** (separate instances for each cluster) - **Azure Load Balancer** (Standard SKU for each cluster with health probes) - **Azure VPN Gateway** (dedicated Point-to-Site VPN for each environment) - **Azure Front Door** for global load balancing and traffic management - **Azure DNS** for DNS-based traffic switching - **Azure Container Registry** for container image storage - **Azure Key Vault** for secrets and certificate management - **Azure Monitor** with Log Analytics for comprehensive monitoring - **Azure Functions** for sync state management and automation - **Virtual Network** with dedicated subnets for isolation ### Traffic Flow 1. **Azure Front Door** provides the public endpoint 2. **Azure DNS** manages traffic switching between clusters 3. **Application Gateways** handle HTTP/HTTPS load balancing 4. **Azure Load Balancers** distribute traffic within each cluster 5. **AKS Clusters** run containerized applications 6. **Azure Functions** manage synchronization between blue and green environments 7. **VPN Gateways** provide secure Point-to-Site access to each environment ## Project Structure ``` ├── provider.tf # Terraform and Azure provider configuration ├── locals.tf # Local values and naming conventions ├── variables.tf # Input variable definitions ├── terraform.tfvars # Variable values (customize for your environment) ├── main.tf # Core Azure resources and infrastructure └── readme.md # This documentation file ``` ## Usage ### 1. Initial Setup ```bash # Clone or download the Terraform files # Customize terraform.tfvars with your specific values # Initialize Terraform terraform init # Review the planned infrastructure terraform plan -var-file="terraform.tfvars" # Deploy the infrastructure terraform apply -var-file="terraform.tfvars" ``` ### 2. Post-Deployment Configuration After successful deployment, configure kubectl access to both clusters: ```bash # Blue cluster az aks get-credentials --resource-group mycompany-aks-production-rg --name mycompany-aks-production-aks-blue # Green cluster az aks get-credentials --resource-group mycompany-aks-production-rg --name mycompany-aks-production-aks-green ``` ### 4. VPN Gateway Configuration After deployment, configure VPN access for each environment: ```bash # Generate and download VPN client configuration for Blue environment az network vnet-gateway vpn-client generate \ --resource-group mycompany-aks-production-rg \ --name mycompany-aks-production-vpngw-blue \ --authentication-method EAPTLS # Generate and download VPN client configuration for Green environment az network vnet-gateway vpn-client generate \ --resource-group mycompany-aks-production-rg \ --name mycompany-aks-production-vpngw-green \ --authentication-method EAPTLS ``` **Important**: Replace the placeholder certificate data in the VPN Gateway configuration with your actual root certificate: 1. Generate a root certificate for P2S authentication 2. Update the `public_cert_data` in both VPN Gateway resources 3. Deploy client certificates to user devices ### 5. Blue-Green Deployment Workflow #### Stage T0: Blue Cluster Active - Blue cluster serves production traffic - Green cluster is either off or running previous version #### Stage T1: Deploy Green Cluster - Deploy new version to green cluster - Validate platform health using Azure Monitor - Test applications without production traffic #### Stage T2: Sync and Validate - Sync Kubernetes state between clusters - Deploy applications to green cluster - Perform functional and load testing - Validate using dedicated green DNS endpoint #### Stage T3: Traffic Switch Update the `active_cluster` variable and apply: ```bash # Switch to green cluster terraform apply -var="active_cluster=green" -var-file="terraform.tfvars" # Switch back to blue cluster if needed terraform apply -var="active_cluster=blue" -var-file="terraform.tfvars" ``` #### Stage T4: Cleanup After validating the switch, optionally scale down or destroy the inactive cluster to optimize costs. ## Resource Connections and Dependencies ### Network Architecture - **Virtual Network** provides isolated networking - **Blue Subnet** hosts blue AKS cluster nodes and load balancer - **Green Subnet** hosts green AKS cluster nodes and load balancer - **Application Gateway Subnet** hosts both Application Gateway instances - **VPN Gateway Subnet** (GatewaySubnet) hosts VPN Gateways for secure access ### Load Balancing Architecture - **Azure Front Door** → **Azure DNS** → **Application Gateway** → **Azure Load Balancer** → **AKS Pods** - **Application Gateway** provides L7 load balancing with SSL termination and WAF capabilities - **Azure Load Balancer** provides L4 load balancing within each cluster subnet - **Health Probes** at both Application Gateway and Load Balancer levels ensure traffic routing to healthy endpoints ### VPN Access Architecture - **Blue VPN Gateway** provides Point-to-Site VPN access to blue environment - **Green VPN Gateway** provides Point-to-Site VPN access to green environment - **Dedicated VPN client pools** (192.168.0.0/24) for secure remote access - **OpenVPN and IkeV2** protocols supported for cross-platform compatibility ### Identity and Access - **AKS Managed Identity** provides secure Azure service access - **Role Assignments** grant AKS clusters pull access to Container Registry - **Key Vault Access Policies** secure secret and certificate access ### Monitoring and Logging - **Log Analytics Workspace** centralizes logs from both clusters - **Azure Monitor** provides platform and application insights - **Container Insights** enabled for both AKS clusters ### Automation and Synchronization - **Azure Functions** provide serverless sync state management - **Function App** handles blue-green cluster synchronization - **Event-driven architecture** for automated deployment workflows ### DNS and Traffic Management - **DNS A Records** for `aks-blue.{domain}` and `aks-green.{domain}` - **Active DNS Record** for `aks.{domain}` points to active cluster - **Front Door Origin** dynamically routes to active cluster endpoint - **Short TTL** (60 seconds) enables fast traffic switching ## Configuration Variables ### Essential Variables | Variable | Description | Default | |----------|-------------|---------| | `project_name` | Project name for resource naming | `"aks-bluegreen"` | | `environment` | Environment (dev/staging/prod) | `"dev"` | | `location` | Azure region | `"East US"` | | `dns_zone_name` | DNS zone for the application | `"contoso.com"` | | `active_cluster` | Active cluster (blue/green) | `"blue"` | ### AKS Configuration | Variable | Description | Default | |----------|-------------|---------| | `kubernetes_version` | Kubernetes version | `"1.27.3"` | | `aks_node_count` | Initial node count | `3` | | `aks_node_vm_size` | VM size for nodes | `"Standard_D2s_v3"` | | `enable_private_cluster` | Enable private clusters | `false` | ### Load Balancer Configuration | Variable | Description | Default | |----------|-------------|---------| | `lb_sku` | Azure Load Balancer SKU | `"Standard"` | | `health_probe_path` | Health probe endpoint path | `"/healthz"` | ### Application Gateway Configuration | Variable | Description | Default | |----------|-------------|---------| | `appgw_sku` | Application Gateway SKU | `"Standard_v2"` | | `appgw_tier` | Application Gateway tier | `"Standard_v2"` | | `appgw_capacity` | Application Gateway capacity | `2` | ### Azure Functions Configuration | Variable | Description | Default | |----------|-------------|---------| | `function_app_sku` | Function App service plan SKU | `"Y1"` | ### VPN Gateway Configuration | Variable | Description | Default | |----------|-------------|---------| | `vpn_gateway_sku` | VPN Gateway SKU | `"VpnGw1"` | | `vpn_gateway_generation` | VPN Gateway generation | `"Generation1"` | | `vpn_client_address_space` | VPN client IP range | `["192.168.0.0/24"]` | | `enable_point_to_site_vpn` | Enable P2S VPN | `true` | | `vpn_gateway_subnet_address_prefix` | VPN Gateway subnet CIDR | `"10.0.4.0/24"` | ### Networking | Variable | Description | Default | |----------|-------------|---------| | `vnet_address_space` | VNet address space | `["10.0.0.0/16"]` | | `blue_subnet_address_prefix` | Blue subnet CIDR | `"10.0.1.0/24"` | | `green_subnet_address_prefix` | Green subnet CIDR | `"10.0.2.0/24"` | ## Monitoring and Observability ### Azure Monitor Integration - **Container Insights** automatically enabled for both clusters - **Log Analytics** centralized logging with configurable retention - **Platform Metrics** for infrastructure monitoring - **Application Insights** (configure separately for applications) ### Health Checks - **Application Gateway** health probes to AKS services - **Front Door** health probes with configurable intervals - **Kubernetes** native health checks and readiness probes ### Recommended Monitoring Setup 1. Configure **Azure Monitor Workbooks** for AKS 2. Set up **Azure Alerts** for critical metrics 3. Implement **Application Performance Monitoring** (APM) 4. Create **Custom Dashboards** for operational visibility ## Security Considerations ### Network Security - **Private AKS clusters** option for enhanced security - **Network Security Groups** for subnet-level protection - **Azure CNI** networking with Azure Network Policy - **Application Gateway WAF** (upgrade to WAF_v2 tier) ### Identity and Access Management - **Managed Identity** for AKS cluster identity - **RBAC** enabled by default on AKS clusters - **Key Vault** integration for secrets management - **Azure AD** integration for user authentication ### VPN Security - **Point-to-Site VPN** with certificate-based authentication - **Separate VPN Gateways** for blue and green environment isolation - **OpenVPN and IkeV2** protocols for secure connectivity - **Certificate Management** via Azure Key Vault integration - **Client IP Pool Isolation** (192.168.0.0/24) for VPN clients ### Best Practices - Use **Azure Policy** for governance - Implement **Pod Security Standards** - Configure **Network Policies** for micro-segmentation - Enable **Azure Defender** for container security ## Cost Optimization ### Resource Scaling - **AKS Node Pool Autoscaling** based on demand - **Application Gateway** autoscaling capabilities - **Spot Instances** for non-production workloads - **Reserved Instances** for predictable workloads ### Blue-Green Cost Management - **Temporary Dual Clusters** only during deployments - **Automated Cleanup** of inactive cluster resources - **Shared Services** (ACR, Key Vault, DNS) across clusters - **Azure Cost Management** monitoring and alerts ## Troubleshooting ### Common Issues 1. **DNS Propagation Delays** - Check TTL settings and DNS propagation 2. **Application Gateway Backend Health** - Verify AKS service endpoints 3. **AKS Node Connectivity** - Check subnet routing and NSG rules 4. **Front Door Origin Health** - Validate Application Gateway accessibility 5. **VPN Gateway Connectivity** - Verify certificate configuration and client setup 6. **VPN Client Authentication** - Check root and client certificate validity ### Debugging Commands ```bash # Check AKS cluster status az aks show --resource-group {rg-name} --name {cluster-name} # Verify Application Gateway backend health az network application-gateway show-backend-health --resource-group {rg-name} --name {appgw-name} # Check DNS resolution nslookup aks.{your-domain} # Front Door endpoint status az cdn endpoint show --resource-group {rg-name} --name {endpoint-name} --profile-name {profile-name} # VPN Gateway status and connections az network vnet-gateway show --resource-group {rg-name} --name {vpn-gateway-name} az network vnet-gateway list-vpn-connections --resource-group {rg-name} --name {vpn-gateway-name} # Test VPN connectivity ping 10.0.1.1 # Blue cluster subnet gateway ping 10.0.2.1 # Green cluster subnet gateway ``` ## Disaster Recovery ### Multi-Region Deployment - Deploy blue-green clusters across **multiple Azure regions** - Configure **global DNS** load balancing - Implement **cross-region** data replication - Plan for **region failover** scenarios ### Backup Strategy - **AKS cluster backup** using Velero or similar tools - **Application data backup** to Azure Storage - **Infrastructure as Code** for rapid reconstruction - **Regular DR testing** procedures ## Contributing To extend or modify this configuration: 1. **Test changes** in a development environment first 2. **Follow naming conventions** established in `locals.tf` 3. **Update documentation** for any new variables or resources 4. **Validate with** `terraform plan` before applying changes ## Support For issues related to: - **Terraform Configuration**: Review variable definitions and resource dependencies - **AKS Cluster Issues**: Check Azure Monitor logs and AKS troubleshooting guides - **Network Connectivity**: Verify DNS, NSG rules, and routing tables - **Application Deployment**: Consult Kubernetes and application-specific documentation ## Version History - **v1.0**: Initial blue-green AKS architecture implementation - Support for both public and private cluster scenarios - Integrated monitoring and DNS-based traffic switching - Complete Infrastructure as Code deployment --- **Important Notes:** - This configuration creates production-ready infrastructure but requires application-specific customization - Review and adjust security settings based on your compliance requirements - Monitor costs closely during initial deployment and testing phases - Test disaster recovery procedures regularly in non-production environments
Share:

It’s up to you now to build great things.