All examples for Cloud Bill Spike Analyzer

Data Transfer Cost Spike (200% increase)

Diagnoses a 3x bill increase attributed to Data Transfer, providing investigation commands to identify whether the source is internet egress, cross-AZ traffic, or bot activity, with ranked causes and resolution steps.

Input

What you provide to the skill

AWS bill jumped from $800 to $2,400. Biggest increase is Data Transfer. We didn’t launch anything new.

Output

What the skill generates for you

Bill Spike Diagnosis: Data Transfer

Initial Assessment

  • Previous bill: $800/month
  • Current bill: $2,400/month
  • Increase: $1,600 (200% increase)
  • Primary suspect: Data Transfer
  • Context: No new launches or deployments

Data Transfer Cost Breakdown

Transfer Type AWS Rate Common Causes
Internet egress (out to internet) $0.09/GB (first 10TB) API responses, file downloads, CDN misconfig, backup to external
Cross-region transfer $0.02/GB Multi-region replication, cross-region backups
Cross-AZ transfer $0.01/GB (in/out) Load balancers, RDS Multi-AZ, inter-AZ communication
CloudFront to origin $0.02/GB Origin fetch charges

Scale of spike: $1,600 increase = ~17.8 TB at internet egress rates ($0.09/GB)

For context: 17.8 TB = serving ~3.5M files at 5MB each, or 178M API responses at 100KB each


Investigation Steps

Step 1: Identify Data Transfer Type in Cost Explorer

Console Navigation:

  1. Go to AWS Cost Explorer -> Reports
  2. Date range: Last 2 months (compare spike month to baseline)
  3. Filters: Service = “EC2 - Other” (data transfer lives here)
  4. Group by: Usage Type
  5. Look for these patterns:
Usage Type Meaning Rate
DataTransfer-Out-Bytes Internet egress $0.09/GB
DataTransfer-Regional-Bytes Cross-region $0.02/GB
DataTransfer-In-Bytes Usually free $0.00
LoadBalancerDataTransfer Between AZs via LB $0.01/GB

Step 2: Find the Source EC2 Instance or Service

If Internet Egress is high, run CloudWatch metrics per instance:

# List all running instances
aws ec2 describe-instances \
--filters "Name=instance-state-name,Values=running" \
--query 'Reservations[*].Instances[*].[InstanceId,Tags[?Key==`Name`].Value|[0]]' \
--output table
# Check NetworkOut metric for each instance
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name NetworkOut \
--dimensions Name=InstanceId,Value=i-xxxxx \
--start-time $(date -u -d '30 days ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 86400 \
--statistics Sum \
--output table

If Cross-AZ is high, check for Multi-AZ RDS or Load Balancers:

# Check if RDS Multi-AZ is enabled
aws rds describe-db-instances \
--query 'DBInstances[*].[DBInstanceIdentifier,MultiAZ,AllocatedStorage]' \
--output table

Likely Root Causes (Ranked by Probability)

1. Bot/Crawler Hitting API or Static Assets (50% probability)

Why this fits your pattern:

  • Sudden 3x spike with no deployments suggests external traffic source
  • 17.8 TB could be a bot scraping assets, crawling API endpoints, or DDoS attempt

How to verify:

# Check access logs for repeat IPs
zcat alb-logs/*.log.gz | awk '{print $3}' | sort | uniq -c | sort -rn | head -5

How to fix:

  • Immediate: Block malicious IPs in security group or WAF
  • Short-term: Add rate limiting (AWS WAF rate-based rule)
  • Long-term: Add CloudFront with caching to reduce origin requests

2. Backup Job Pushing Data to External Location (25% probability)

Why this fits your pattern:

  • Scheduled jobs can cause egress spikes if misconfigured
  • 17.8 TB = ~18 database backups at 1TB each

How to verify:

# Check CloudWatch NetworkOut for time-of-day patterns
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name NetworkOut \
--dimensions Name=InstanceId,Value=i-xxxxx \
--start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 3600 \
--statistics Sum \
--output table

How to fix:

  • Move backups to S3 (internal transfer is free)
  • Use AWS Backup service instead of third-party backup

3. RDS Multi-AZ Cross-AZ Data Transfer (15% probability)

Why this fits your pattern:

  • Multi-AZ RDS replicates data across availability zones
  • Less likely to cause 3x spike unless recently enabled

How to verify:

aws rds describe-db-instances \
--query 'DBInstances[*].[DBInstanceIdentifier,MultiAZ,AllocatedStorage]' \
--output table

Prevention Measures

Immediate (Set up today):

# Create CloudWatch alarm for estimated charges
aws cloudwatch put-metric-alarm \
--alarm-name DataTransferSpike \
--alarm-description "Alert when data transfer exceeds $500/month" \
--metric-name EstimatedCharges \
--namespace AWS/Billing \
--statistic Maximum \
--period 21600 \
--threshold 500 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=ServiceName,Value=AmazonEC2 \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-east-1:account-id:billing-alerts

This week:

# Enable AWS Cost Anomaly Detection
aws ce create-anomaly-monitor \
--anomaly-monitor '{
"MonitorName": "DataTransferMonitor",
"MonitorType": "DIMENSIONAL",
"MonitorDimension": "SERVICE"
}'

What NOT to Do

  1. Don’t panic and shut down services - Data transfer charges are for previous month
  2. Don’t block all unknown IPs without investigation - May block legitimate users
  3. Don’t change multiple things at once - Change one variable to isolate cause
  4. Don’t disable Multi-AZ without understanding impact - Multi-AZ provides HA
  5. Don’t ignore security implications - Sudden spike could indicate compromised credentials