# Azure AKS Blue-Green Deployment Architecture
This architecture implements a comprehensive Blue-Green deployment strategy for Azure Kubernetes Service (AKS) clusters, following Microsoft's architectural guidance for high-availability deployments.
## Prerequisites
- **Azure CLI** configured and authenticated (`az login`)
- **Terraform** installed (>= 1.3)
- **Azure subscription** with sufficient permissions to create resources
- **Kubernetes CLI (kubectl)** for cluster management
- **Helm** (optional, for application deployments)
## Architecture Overview
This implementation creates a complete Blue-Green deployment infrastructure for AKS including:
### Core Components
- **Two AKS Clusters** (Blue and Green) for zero-downtime deployments
- **Azure Application Gateway** (separate instances for each cluster)
- **Azure Load Balancer** (Standard SKU for each cluster with health probes)
- **Azure VPN Gateway** (dedicated Point-to-Site VPN for each environment)
- **Azure Front Door** for global load balancing and traffic management
- **Azure DNS** for DNS-based traffic switching
- **Azure Container Registry** for container image storage
- **Azure Key Vault** for secrets and certificate management
- **Azure Monitor** with Log Analytics for comprehensive monitoring
- **Azure Functions** for sync state management and automation
- **Virtual Network** with dedicated subnets for isolation
### Traffic Flow
1. **Azure Front Door** provides the public endpoint
2. **Azure DNS** manages traffic switching between clusters
3. **Application Gateways** handle HTTP/HTTPS load balancing
4. **Azure Load Balancers** distribute traffic within each cluster
5. **AKS Clusters** run containerized applications
6. **Azure Functions** manage synchronization between blue and green environments
7. **VPN Gateways** provide secure Point-to-Site access to each environment
## Project Structure
```
├── provider.tf # Terraform and Azure provider configuration
├── locals.tf # Local values and naming conventions
├── variables.tf # Input variable definitions
├── terraform.tfvars # Variable values (customize for your environment)
├── main.tf # Core Azure resources and infrastructure
└── readme.md # This documentation file
```
## Usage
### 1. Initial Setup
```bash
# Clone or download the Terraform files
# Customize terraform.tfvars with your specific values
# Initialize Terraform
terraform init
# Review the planned infrastructure
terraform plan -var-file="terraform.tfvars"
# Deploy the infrastructure
terraform apply -var-file="terraform.tfvars"
```
### 2. Post-Deployment Configuration
After successful deployment, configure kubectl access to both clusters:
```bash
# Blue cluster
az aks get-credentials --resource-group mycompany-aks-production-rg --name mycompany-aks-production-aks-blue
# Green cluster
az aks get-credentials --resource-group mycompany-aks-production-rg --name mycompany-aks-production-aks-green
```
### 4. VPN Gateway Configuration
After deployment, configure VPN access for each environment:
```bash
# Generate and download VPN client configuration for Blue environment
az network vnet-gateway vpn-client generate \
--resource-group mycompany-aks-production-rg \
--name mycompany-aks-production-vpngw-blue \
--authentication-method EAPTLS
# Generate and download VPN client configuration for Green environment
az network vnet-gateway vpn-client generate \
--resource-group mycompany-aks-production-rg \
--name mycompany-aks-production-vpngw-green \
--authentication-method EAPTLS
```
**Important**: Replace the placeholder certificate data in the VPN Gateway configuration with your actual root certificate:
1. Generate a root certificate for P2S authentication
2. Update the `public_cert_data` in both VPN Gateway resources
3. Deploy client certificates to user devices
### 5. Blue-Green Deployment Workflow
#### Stage T0: Blue Cluster Active
- Blue cluster serves production traffic
- Green cluster is either off or running previous version
#### Stage T1: Deploy Green Cluster
- Deploy new version to green cluster
- Validate platform health using Azure Monitor
- Test applications without production traffic
#### Stage T2: Sync and Validate
- Sync Kubernetes state between clusters
- Deploy applications to green cluster
- Perform functional and load testing
- Validate using dedicated green DNS endpoint
#### Stage T3: Traffic Switch
Update the `active_cluster` variable and apply:
```bash
# Switch to green cluster
terraform apply -var="active_cluster=green" -var-file="terraform.tfvars"
# Switch back to blue cluster if needed
terraform apply -var="active_cluster=blue" -var-file="terraform.tfvars"
```
#### Stage T4: Cleanup
After validating the switch, optionally scale down or destroy the inactive cluster to optimize costs.
## Resource Connections and Dependencies
### Network Architecture
- **Virtual Network** provides isolated networking
- **Blue Subnet** hosts blue AKS cluster nodes and load balancer
- **Green Subnet** hosts green AKS cluster nodes and load balancer
- **Application Gateway Subnet** hosts both Application Gateway instances
- **VPN Gateway Subnet** (GatewaySubnet) hosts VPN Gateways for secure access
### Load Balancing Architecture
- **Azure Front Door** → **Azure DNS** → **Application Gateway** → **Azure Load Balancer** → **AKS Pods**
- **Application Gateway** provides L7 load balancing with SSL termination and WAF capabilities
- **Azure Load Balancer** provides L4 load balancing within each cluster subnet
- **Health Probes** at both Application Gateway and Load Balancer levels ensure traffic routing to healthy endpoints
### VPN Access Architecture
- **Blue VPN Gateway** provides Point-to-Site VPN access to blue environment
- **Green VPN Gateway** provides Point-to-Site VPN access to green environment
- **Dedicated VPN client pools** (192.168.0.0/24) for secure remote access
- **OpenVPN and IkeV2** protocols supported for cross-platform compatibility
### Identity and Access
- **AKS Managed Identity** provides secure Azure service access
- **Role Assignments** grant AKS clusters pull access to Container Registry
- **Key Vault Access Policies** secure secret and certificate access
### Monitoring and Logging
- **Log Analytics Workspace** centralizes logs from both clusters
- **Azure Monitor** provides platform and application insights
- **Container Insights** enabled for both AKS clusters
### Automation and Synchronization
- **Azure Functions** provide serverless sync state management
- **Function App** handles blue-green cluster synchronization
- **Event-driven architecture** for automated deployment workflows
### DNS and Traffic Management
- **DNS A Records** for `aks-blue.{domain}` and `aks-green.{domain}`
- **Active DNS Record** for `aks.{domain}` points to active cluster
- **Front Door Origin** dynamically routes to active cluster endpoint
- **Short TTL** (60 seconds) enables fast traffic switching
## Configuration Variables
### Essential Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `project_name` | Project name for resource naming | `"aks-bluegreen"` |
| `environment` | Environment (dev/staging/prod) | `"dev"` |
| `location` | Azure region | `"East US"` |
| `dns_zone_name` | DNS zone for the application | `"contoso.com"` |
| `active_cluster` | Active cluster (blue/green) | `"blue"` |
### AKS Configuration
| Variable | Description | Default |
|----------|-------------|---------|
| `kubernetes_version` | Kubernetes version | `"1.27.3"` |
| `aks_node_count` | Initial node count | `3` |
| `aks_node_vm_size` | VM size for nodes | `"Standard_D2s_v3"` |
| `enable_private_cluster` | Enable private clusters | `false` |
### Load Balancer Configuration
| Variable | Description | Default |
|----------|-------------|---------|
| `lb_sku` | Azure Load Balancer SKU | `"Standard"` |
| `health_probe_path` | Health probe endpoint path | `"/healthz"` |
### Application Gateway Configuration
| Variable | Description | Default |
|----------|-------------|---------|
| `appgw_sku` | Application Gateway SKU | `"Standard_v2"` |
| `appgw_tier` | Application Gateway tier | `"Standard_v2"` |
| `appgw_capacity` | Application Gateway capacity | `2` |
### Azure Functions Configuration
| Variable | Description | Default |
|----------|-------------|---------|
| `function_app_sku` | Function App service plan SKU | `"Y1"` |
### VPN Gateway Configuration
| Variable | Description | Default |
|----------|-------------|---------|
| `vpn_gateway_sku` | VPN Gateway SKU | `"VpnGw1"` |
| `vpn_gateway_generation` | VPN Gateway generation | `"Generation1"` |
| `vpn_client_address_space` | VPN client IP range | `["192.168.0.0/24"]` |
| `enable_point_to_site_vpn` | Enable P2S VPN | `true` |
| `vpn_gateway_subnet_address_prefix` | VPN Gateway subnet CIDR | `"10.0.4.0/24"` |
### Networking
| Variable | Description | Default |
|----------|-------------|---------|
| `vnet_address_space` | VNet address space | `["10.0.0.0/16"]` |
| `blue_subnet_address_prefix` | Blue subnet CIDR | `"10.0.1.0/24"` |
| `green_subnet_address_prefix` | Green subnet CIDR | `"10.0.2.0/24"` |
## Monitoring and Observability
### Azure Monitor Integration
- **Container Insights** automatically enabled for both clusters
- **Log Analytics** centralized logging with configurable retention
- **Platform Metrics** for infrastructure monitoring
- **Application Insights** (configure separately for applications)
### Health Checks
- **Application Gateway** health probes to AKS services
- **Front Door** health probes with configurable intervals
- **Kubernetes** native health checks and readiness probes
### Recommended Monitoring Setup
1. Configure **Azure Monitor Workbooks** for AKS
2. Set up **Azure Alerts** for critical metrics
3. Implement **Application Performance Monitoring** (APM)
4. Create **Custom Dashboards** for operational visibility
## Security Considerations
### Network Security
- **Private AKS clusters** option for enhanced security
- **Network Security Groups** for subnet-level protection
- **Azure CNI** networking with Azure Network Policy
- **Application Gateway WAF** (upgrade to WAF_v2 tier)
### Identity and Access Management
- **Managed Identity** for AKS cluster identity
- **RBAC** enabled by default on AKS clusters
- **Key Vault** integration for secrets management
- **Azure AD** integration for user authentication
### VPN Security
- **Point-to-Site VPN** with certificate-based authentication
- **Separate VPN Gateways** for blue and green environment isolation
- **OpenVPN and IkeV2** protocols for secure connectivity
- **Certificate Management** via Azure Key Vault integration
- **Client IP Pool Isolation** (192.168.0.0/24) for VPN clients
### Best Practices
- Use **Azure Policy** for governance
- Implement **Pod Security Standards**
- Configure **Network Policies** for micro-segmentation
- Enable **Azure Defender** for container security
## Cost Optimization
### Resource Scaling
- **AKS Node Pool Autoscaling** based on demand
- **Application Gateway** autoscaling capabilities
- **Spot Instances** for non-production workloads
- **Reserved Instances** for predictable workloads
### Blue-Green Cost Management
- **Temporary Dual Clusters** only during deployments
- **Automated Cleanup** of inactive cluster resources
- **Shared Services** (ACR, Key Vault, DNS) across clusters
- **Azure Cost Management** monitoring and alerts
## Troubleshooting
### Common Issues
1. **DNS Propagation Delays** - Check TTL settings and DNS propagation
2. **Application Gateway Backend Health** - Verify AKS service endpoints
3. **AKS Node Connectivity** - Check subnet routing and NSG rules
4. **Front Door Origin Health** - Validate Application Gateway accessibility
5. **VPN Gateway Connectivity** - Verify certificate configuration and client setup
6. **VPN Client Authentication** - Check root and client certificate validity
### Debugging Commands
```bash
# Check AKS cluster status
az aks show --resource-group {rg-name} --name {cluster-name}
# Verify Application Gateway backend health
az network application-gateway show-backend-health --resource-group {rg-name} --name {appgw-name}
# Check DNS resolution
nslookup aks.{your-domain}
# Front Door endpoint status
az cdn endpoint show --resource-group {rg-name} --name {endpoint-name} --profile-name {profile-name}
# VPN Gateway status and connections
az network vnet-gateway show --resource-group {rg-name} --name {vpn-gateway-name}
az network vnet-gateway list-vpn-connections --resource-group {rg-name} --name {vpn-gateway-name}
# Test VPN connectivity
ping 10.0.1.1 # Blue cluster subnet gateway
ping 10.0.2.1 # Green cluster subnet gateway
```
## Disaster Recovery
### Multi-Region Deployment
- Deploy blue-green clusters across **multiple Azure regions**
- Configure **global DNS** load balancing
- Implement **cross-region** data replication
- Plan for **region failover** scenarios
### Backup Strategy
- **AKS cluster backup** using Velero or similar tools
- **Application data backup** to Azure Storage
- **Infrastructure as Code** for rapid reconstruction
- **Regular DR testing** procedures
## Contributing
To extend or modify this configuration:
1. **Test changes** in a development environment first
2. **Follow naming conventions** established in `locals.tf`
3. **Update documentation** for any new variables or resources
4. **Validate with** `terraform plan` before applying changes
## Support
For issues related to:
- **Terraform Configuration**: Review variable definitions and resource dependencies
- **AKS Cluster Issues**: Check Azure Monitor logs and AKS troubleshooting guides
- **Network Connectivity**: Verify DNS, NSG rules, and routing tables
- **Application Deployment**: Consult Kubernetes and application-specific documentation
## Version History
- **v1.0**: Initial blue-green AKS architecture implementation
- Support for both public and private cluster scenarios
- Integrated monitoring and DNS-based traffic switching
- Complete Infrastructure as Code deployment
---
**Important Notes:**
- This configuration creates production-ready infrastructure but requires application-specific customization
- Review and adjust security settings based on your compliance requirements
- Monitor costs closely during initial deployment and testing phases
- Test disaster recovery procedures regularly in non-production environments