Something went wrong!
Hang in there while we get back on track
New Relic Full Stack Cost Optimization
Reduce New Relic costs from $1200/month to $600-650/month through log filtering, APM right-sizing, metrics cardinality fixes, and user seat optimization with migration alternatives.
Input
What you provide to the skill
We’re spending $1200/month on New Relic (APM, logs, infrastructure). 15 hosts, 8-person team. Can we reduce this without losing critical visibility?
Output
What the skill generates for you
New Relic Cost Optimization ($1200/month, 15 hosts, 8-person team)
Current Situation Analysis
- Monthly cost: $1,200/month ($14,400/year)
- Team size: 8 people
- Infrastructure: 15 hosts
- Services: APM, logs, infrastructure monitoring
- Cost per host: ~$80/host/month (well above typical infrastructure monitoring costs)
Based on New Relic’s usage-based pricing model ($0.30/GB for data ingest + user seat costs), your estimated breakdown is likely:
| Component | Est. Monthly Cost | % of Total | Est. Volume/Users |
|---|---|---|---|
| Log ingestion | $400-500 | 33-42% | ~1,500 GB/month at $0.30/GB |
| APM (traces/spans) | $400-500 | 33-42% | 15 hosts with full APM enabled |
| Infrastructure monitoring | $150-200 | 13-17% | 15 hosts + custom metrics |
| User seats | $100-200 | 8-17% | 2-4 full platform users at $99-549/user |
| Total | $1,200 | 100% | - |
Phase 1: Immediate Quick Wins (Week 1, 3-4 hours effort)
Tactic 1: Log Filtering and Sampling (40-60% log cost reduction)
Add drop filters in New Relic to eliminate low-value logs:
# In New Relic: Logs → Data management → Parsing
# Filter 1: Drop debug/trace logs (save ~30-40% log volume)
WHERE level IN ('DEBUG', 'TRACE')
ACTION: Drop
# Filter 2: Drop health check noise (save ~10-15% log volume)
WHERE request.uri IN ('/health', '/healthz', '/ping', '/metrics', '/ready', '/live')
ACTION: Drop
# Filter 3: Sample successful requests (keep 5%, save ~20-25% log volume)
WHERE http.statusCode >= 200 AND http.statusCode < 300 AND duration < 1000
ACTION: Sample at 5%
# Filter 4: Drop verbose cloud provider SDK logs (save ~5-10% log volume)
WHERE logger.name LIKE '%boto3%' OR logger.name LIKE '%aws-sdk%' OR logger.name LIKE '%azure-sdk%'
ACTION: Drop or Sample at 10%
Expected log reduction: 1,500 GB → 600-700 GB (60% reduction)
Estimated savings: $240-270/month
Tactic 2: Right-Size APM Coverage (30-50% APM cost reduction)
Audit which hosts actually need full APM:
# In New Relic: APM & Services → Service Map
# Identify and disable APM on:
1. Development environments (save ~$50-80/month)
2. Staging/test environments (save ~$50-80/month)
3. Internal tools/admin services (save ~$30-50/month)
4. Database replicas (monitor primary only, save ~$20-40/month)
# Keep full APM only on:
- Production application servers
- Critical API services
- Customer-facing web servers
Implementation:
# Remove APM agent from non-production hosts
# For each dev/staging host:
sudo systemctl stop newrelic-infra
sudo systemctl disable newrelic-infra
# Or set in environment:
NEW_RELIC_ENABLED=false # For dev/staging
Expected APM reduction: 15 hosts → 7-9 production hosts
Estimated savings: $150-200/month
Tactic 3: Reduce High-Cardinality Metrics (15-30% metrics cost reduction)
Identify and fix expensive custom metrics:
# In New Relic: Metrics explorer → Sort by cardinality
# Common high-cardinality culprits:
BAD: http.requests{user_id:*, session_id:*, request_id:*}
GOOD: http.requests{endpoint:/api/users, method:GET, status:200}
BAD: cache.operations{key:*} # Millions of unique keys
GOOD: cache.operations{operation:get, cache_name:redis-main}
BAD: background.job{job_id:*}
GOOD: background.job{job_type:email_worker, queue:default}
Code fix example (Python):
# Before: Creates millions of unique metric combinations
newrelic.agent.record_custom_metric(
f'Custom/User/{user_id}/requests', 1
)
# After: Aggregate by user tier instead
newrelic.agent.record_custom_metric(
f'Custom/UserTier/{user.tier}/requests', 1
)
Expected savings: $50-80/month
Phase 2: Application-Level Changes (Week 2, 4-6 hours effort)
Tactic 4: Reduce Log Verbosity at Source (20-40% additional log reduction)
Update application logging configuration:
Environment variables (fastest approach):
# Production
LOG_LEVEL=WARN # Instead of INFO or DEBUG
LOG_SAMPLE_RATE=0.05 # Sample INFO logs at 5%
NEW_RELIC_LOG_LEVEL=info # Reduce agent verbosity
# Staging
LOG_LEVEL=INFO
LOG_SAMPLE_RATE=0.2
# Development
LOG_LEVEL=DEBUG
LOG_SAMPLE_RATE=1.0
Tactic 5: Optimize User Seat Allocation
Review your user assignments in New Relic:
# In New Relic: Account → User management
# Audit current allocation:
- Full Platform Users: 2-4 users at $99-549/user
- Core Users: Typically $49/user
- Basic Users: Free (view-only)
# Optimization:
- Keep only 1-2 Full Platform Users (senior engineers who configure monitoring)
- Downgrade to Core Users for most developers (can view APM, create basic queries)
- Use Basic Users for PMs, support staff, managers (dashboard viewing only)
Expected savings: $50-200/month (depending on current allocation)
Cost Projection After Optimization
| Optimization Phase | Est. Monthly Cost | Savings | Cumulative Savings |
|---|---|---|---|
| Current state | $1,200 | - | - |
| After Phase 1 (Week 1) | $750-800 | $400-450 | 33-38% |
| After Phase 2 (Week 2) | $600-650 | $150-200 | 46-50% |
Optimized annual cost: $7,200-7,800 (down from $14,400)
Total annual savings: $6,600-7,200 (46-50% reduction)
Implementation Checklist
Week 1: Quick Wins (3-4 hours)
- Add 4 log drop filters in New Relic Data management
- Audit APM hosts, disable on dev/staging (save $150-200)
- Identify top 5 high-cardinality metrics
- Verify changes via New Relic Usage dashboard
- Expected result: $1,200 → $750-800/month
Week 2: Deeper Changes (4-6 hours)
- Update LOG_LEVEL=WARN in production
- Deploy metric cardinality fixes
- Optimize user seat allocation
- Monitor for 5-7 days
- Expected result: $750-800 → $600-650/month
Week 3: Validation (1 hour)
- Check key dashboards for missing data
- Verify alerts still firing correctly
- Review New Relic bill estimate
- Document changes for team
Rollback Procedures
If you lose critical visibility:
-
Re-enable logs:
- Disable drop filter in Data management
- Restore LOG_LEVEL=INFO temporarily
-
Re-enable APM:
- Reinstall New Relic infrastructure agent on host
- Add service to APM configuration
-
Restore metrics:
- Revert metric tag changes
- Redeploy application
What You Gain
- $6,600-7,200/year savings (46-50% cost reduction)
- Maintained visibility for critical production issues
- Faster log queries (less indexed data)
- Cleaner metrics (lower cardinality = better query performance)
What You Lose
- Full dev/staging APM (use production APM for troubleshooting)
- Verbose debug logs (use Live Tail when needed)
- Per-user metrics (aggregate by cohort/feature instead)
Alternative: Migrate to Grafana Cloud
If $600-650/month still feels expensive for an 8-person team:
Grafana Cloud Free Tier includes:
- 10K metrics series (vs. limited free tier)
- 50GB logs/month
- 50GB traces/month
- 3 users
- 14-day retention
For your scale:
- Likely $0/month on free tier for 12+ months
- Even paid tier: ~$50-100/month (vs. $600-650 optimized New Relic)
Migration effort: 1-2 days
Annual savings: $5,400-7,200 vs. optimized New Relic
About This Skill
Reduce monitoring costs (Datadog, New Relic) by 30-70% with optimization strategies, configuration changes, and migration paths to cheaper alternatives.
View Skill DetailsMore Examples
Datadog Log Cost Reduction
Optimize Datadog log costs from $400/month to $140/month through exclusion filters, agent-level filtering, and retention optimization with copy-paste configurations.
Datadog $2500/month Migration Options Analysis
Comprehensive 5-option analysis for a startup spending $2500/month on Datadog with 100GB daily logs, including optimization, Grafana Cloud, New Relic, SigNoz, and self-hosted paths.