Something went wrong!
Hang in there while we get back on track
RDS Cost Spike with Multi-AZ (200% increase)
Diagnoses an RDS cost tripling with Multi-AZ recently enabled, breaking down cost components, verifying the root cause, and providing resolution options with expected savings.
Input
What you provide to the skill
Our RDS costs went from $150 to $450 this month. Using db.t3.medium PostgreSQL. We enabled Multi-AZ last week but don’t think that’s related.
Output
What the skill generates for you
Bill Spike Diagnosis: RDS
Initial Assessment
- Previous bill: $150/month
- Current bill: $450/month
- Increase: $300 (200% increase)
- Instance type: db.t3.medium PostgreSQL
- Recent change: Multi-AZ enabled last week (user reports)
RDS Cost Components for db.t3.medium PostgreSQL
| Component | Rate | Single-AZ Expected | Multi-AZ Expected | Notes |
|---|---|---|---|---|
| Instance hours | $0.072-0.11/hr | $52-80/month | $104-160/month | Multi-AZ doubles instance cost |
| Storage (GP2) | $0.115/GB-month | ~$25-50/month | ~$50-100/month | Multi-AZ doubles storage cost |
| Backups | $0.095/GB-month | ~$10-20/month | ~$10-20/month | Only charged above allocated storage |
| IOPS (if provisioned) | $0.10/IOPS-month | $0 (gp2 default) | $0 (gp2 default) | Only for io1/io2 storage |
| Extended Support | Varies by version | $0-50/month | $0-50/month | Only if running old PostgreSQL version |
Analysis: Your 3x increase ($150 -> $450) suggests Multi-AZ is absolutely the cause, combined with one or more secondary factors.
Investigation Steps
Step 1: Verify Multi-AZ Was Recently Enabled
aws rds describe-events \
--source-type db-instance \
--source-identifier your-db-name \
--start-time $(date -u -d '30 days ago' +%Y-%m-%dT%H:%M:%S) \
--duration 43200 \
--query 'Events[?contains(Message, `Multi-AZ`) || contains(Message, `storage`) || contains(Message, `instance class`)]'
Look for events like:
- “Multi-AZ instance modification complete”
- “Converted from single-AZ to Multi-AZ”
Step 2: Check Current RDS Configuration
aws rds describe-db-instances \
--db-instance-identifier your-db-name \
--query 'DBInstances[0].{
InstanceClass:DBInstanceClass,
MultiAZ:MultiAZ,
Storage:AllocatedStorage,
StorageType:StorageType,
EngineVersion:EngineVersion,
BackupRetentionPeriod:BackupRetentionPeriod
}' \
--output table
Expected output shows MultiAZ: True if enabled.
Step 3: Check PostgreSQL Version for Extended Support Fees
If your PostgreSQL version is 11 or earlier, you’re paying Extended Support fees:
aws rds describe-db-instances \
--db-instance-identifier your-db-name \
--query 'DBInstances[0].EngineVersion'
Likely Root Causes (Ranked by Probability)
1. Multi-AZ Enabled (95% probability) - PRIMARY CAUSE
Why this fits:
- You explicitly stated Multi-AZ was enabled last week
- Multi-AZ doubles both instance hours AND storage costs
- db.t3.medium Single-AZ: ~$52-80/month -> Multi-AZ: ~$104-160/month (instance only)
Cost impact: Multi-AZ doubles:
- Instance hours: $52-80/month -> $104-160/month (+$52-80)
- Storage (assuming 200GB): $23/month -> $46/month (+$23)
- Total Multi-AZ impact alone: ~$75-103/month increase
2. Storage Auto-Scaling Triggered (60% probability) - LIKELY CONTRIBUTING
Why this fits:
- Multi-AZ alone explains ~$75-103 increase, but you saw $300 increase
- Storage may have auto-scaled from 100GB to 300-400GB
- With Multi-AZ, storage costs are doubled
3. PostgreSQL Extended Support Fees (40% probability)
If running PostgreSQL 11 or earlier:
- db.t3.medium Extended Support:
$0.036/hour ($26/month) - With Multi-AZ, this could be ~$52/month
Resolution Options
Option A: Keep Multi-AZ, Optimize Other Costs (Recommended for Production)
- Upgrade PostgreSQL version (if running 11 or earlier):
# Take snapshot first
aws rds create-db-snapshot \
--db-instance-identifier your-db-name \
--db-snapshot-identifier pre-upgrade-snapshot-$(date +%Y%m%d)
# Upgrade to PostgreSQL 12+
aws rds modify-db-instance \
--db-instance-identifier your-db-name \
--engine-version 12.19 \
--apply-immediately
Savings: ~$26-52/month
- Cap storage auto-scaling:
aws rds modify-db-instance \
--db-instance-identifier your-db-name \
--max-allocated-storage 250
Option B: Disable Multi-AZ (If High Availability Not Required)
aws rds modify-db-instance \
--db-instance-identifier your-db-name \
--no-multi-az \
--apply-immediately
# WARNING: Causes 1-2 minutes downtime
# WARNING: Schedule during off-peak hours
# WARNING: No automatic failover after this
Savings: ~$75-180/month
New expected bill: ~$170-270/month
Option C: Purchase Reserved Instance (If Keeping Current Setup)
# Find available 1-year RI offerings
aws rds describe-reserved-db-instances-offerings \
--db-instance-class db.t3.medium \
--product-description postgresql \
--multi-az \
--duration 31536000 \
--offering-type "No Upfront"
Savings: ~$180-270/month (40-60% off On-Demand)
Prevention Measures
Immediate:
aws cloudwatch put-metric-alarm \
--alarm-name RDS-Monthly-Cost-Alert \
--alarm-description "Alert when RDS costs exceed $200" \
--metric-name EstimatedCharges \
--namespace AWS/Billing \
--statistic Maximum \
--period 86400 \
--evaluation-periods 1 \
--threshold 200 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=ServiceName,Value=AmazonRDS \
--alarm-actions arn:aws:sns:us-east-1:123456789012:billing-alerts
What NOT to Do
- Don’t panic and immediately disable Multi-AZ without understanding HA requirements - Multi-AZ provides automatic failover
- Don’t try to reduce allocated storage - AWS does not support shrinking RDS storage
- Don’t delete automated backups - Essential for disaster recovery, minimal cost savings
- Don’t upgrade PostgreSQL in production without testing - Always test in staging first
- Don’t change multiple settings at once - Isolate changes to identify impact
About This Skill
Diagnose unexpected AWS, GCP, or Azure cost increases with structured root cause analysis, investigation steps, and prevention measures.
View Skill DetailsMore Examples
Data Transfer Cost Spike (200% increase)
Diagnoses a 3x bill increase attributed to Data Transfer, providing investigation commands to identify whether the source is internet egress, cross-AZ traffic, or bot activity, with ranked causes and resolution steps.
General 40% Bill Increase with RI Expiration
Provides systematic investigation framework when the user doesn't know which service spiked, with focus on Reserved Instance expiration as the likely cause based on user context.