Something went wrong!
Hang in there while we get back on track
Customer Revenue Analysis with Segmentation
Analyzing a small customer dataset (10 rows) to understand revenue patterns and data quality. Demonstrates detection of strong age-revenue correlation (r=0.76), identification of missing emails (30%), and Enterprise vs SMB segmentation insights showing 10x revenue difference between segments.
Input
What you provide to the skill
Analyze /tmp/sample_customers.csv - 10 customers with revenue data
Output
What the skill generates for you
DATASET PROFILE REPORT
Generated: 2025-12-02
File: /tmp/sample_customers.csv
═══════════════════════════════════════════════════════════
OVERVIEW
═══════════════════════════════════════════════════════════
• Total Rows: 10 customers
• Total Columns: 8
• File Size: ~1 KB
• Overall Completeness: 96.2% (3 missing values)
═══════════════════════════════════════════════════════════
DATA STRUCTURE
═══════════════════════════════════════════════════════════
Column Types:
• Numeric: 2 columns
- age, revenue
• Text/Categorical: 5 columns
- customer_id, name, email, region, customer_type
• Date: 1 column
- signup_date
═══════════════════════════════════════════════════════════
NUMERIC COLUMNS - KEY STATISTICS
═══════════════════════════════════════════════════════════
AGE
Range: 28 to 52 years
Mean: 36.7 years
Median: 35.0 years
Std Dev: 7.5 years
Distribution: Moderately right-skewed (most customers under 40)
✓ No outliers detected
✓ All values present (no missing data)
REVENUE
Range: $0.00 - $15,600.00
Mean: $3,440.20
Median: $1,175.25
Std Dev: $4,990.69
Distribution: Highly right-skewed (most customers under $2,000)
⚠ Outliers Detected: 2 values identified as statistical outliers
- Alice Brown (C004): $15,600.00 - High-value Enterprise customer
- Frank Miller (C008): $8,900.50 - High-value Enterprise customer
⚠ Zero Revenue: 1 customer with $0.00 revenue
- Henry Ford (C010): New customer, signed up 2023-03-20, no purchases yet
═══════════════════════════════════════════════════════════
CATEGORICAL COLUMNS - KEY INSIGHTS
═══════════════════════════════════════════════════════════
CUSTOMER_TYPE
- SMB: 6 customers (60%)
- Enterprise: 4 customers (40%)
REGION
Customers are evenly distributed across four regions:
- North: 3 customers (30%)
- South: 3 customers (30%)
- East: 2 customers (20%)
- West: 2 customers (20%)
✓ 7 valid email addresses
⚠ 3 missing emails (30%) - Impacts marketing capabilities
═══════════════════════════════════════════════════════════
DATE ANALYSIS
═══════════════════════════════════════════════════════════
SIGNUP_DATE
Date Range: November 5, 2022 - April 2, 2023
Time Span: 148 days (~5 months)
Most recent signups occurred in early 2023, with steady customer acquisition over the period.
═══════════════════════════════════════════════════════════
CORRELATION INSIGHTS
═══════════════════════════════════════════════════════════
Strong Positive Correlation:
✓ age ↔ revenue: r=0.76 (strong positive)
→ Older customers generate significantly higher revenue
→ This could indicate:
- More established financial capacity
- Greater business needs (especially for Enterprise)
- Higher decision-making authority
This is a key finding for targeting and segmentation strategies.
═══════════════════════════════════════════════════════════
DATA QUALITY ISSUES
═══════════════════════════════════════════════════════════
Missing Values:
⚠ email: 3 missing (30% of dataset)
- Affected customers: C003 (Bob Wilson), C006 (Diana Prince), C010 (Henry Ford)
- Impact: Cannot send email marketing campaigns to 30% of customer base
- Pattern: No clear geographic or customer type pattern
Potential Data Issues:
⚠ Zero Revenue Customer:
- Henry Ford (C010): $0.00 revenue despite signup
- Most recent signup (March 20, 2023) - may be inactive or in trial period
- Action needed: Investigate customer status and engagement
Quality Checks Passed:
✓ No duplicate rows detected
✓ All customer IDs are unique
✓ All names are unique
✓ No inconsistent data formats
✓ Date formats are consistent
═══════════════════════════════════════════════════════════
KEY INSIGHTS
═══════════════════════════════════════════════════════════
1. ENTERPRISE CUSTOMERS DRIVE REVENUE
Critical Finding: Enterprise customers (40% of base) generate 10x higher revenue than SMB customers on average.
- Enterprise average: $7,510
- Enterprise median: $6,050
- SMB average: $727
- SMB median: $780
→ Implication: Focus acquisition and retention efforts on Enterprise segment for maximum revenue impact.
2. AGE = REVENUE PREDICTOR
Strong correlation (r=0.76) between customer age and revenue generation.
- Customers 40+ generate significantly higher revenue
- Younger customers (under 35) cluster in lower revenue ranges
→ Implication: Age could be a valuable targeting criterion for marketing campaigns and lead scoring.
3. REVENUE CONCENTRATION
Revenue is highly concentrated:
- Top 2 customers (20%) generate ~$24,500 (71% of total revenue)
- Bottom 6 customers (60%) generate only ~$4,400 (13% of total revenue)
→ Implication: High customer concentration risk - losing top customers would significantly impact revenue.
4. CUSTOMER LIFECYCLE STAGES
Dataset shows customers at different lifecycle stages:
- Active high-value: 2 Enterprise customers generating >$8K
- Moderate contributors: 6 SMB customers with steady revenue
- New/Inactive: 1 customer with zero revenue (potential churn risk)
═══════════════════════════════════════════════════════════
RECOMMENDATIONS
═══════════════════════════════════════════════════════════
IMMEDIATE ACTIONS (Week 1):
-
✅ Collect Missing Email Addresses
- Contact C003 (Bob Wilson), C006 (Diana Prince), C010 (Henry Ford)
- Implement mandatory email collection at signup
- Without emails, 30% of customers cannot receive communications
-
✅ Investigate Zero-Revenue Customer
- Follow up with Henry Ford (C010) - signed up but no purchases
- Determine if customer is in trial, inactive, or needs support
- Consider activation campaign or win-back strategy
-
✅ Review High-Value Customers
- Verify Alice Brown ($15,600) and Frank Miller ($8,900) transactions
- If accurate, analyze what drove their high spend
- Create case studies to replicate success
STRATEGIC RECOMMENDATIONS:
-
Segment All Future Analyses by Customer Type
- Enterprise and SMB behave completely differently (10x revenue gap)
- Build separate strategies, campaigns, and metrics for each segment
- Consider different pricing, features, or support tiers
-
Age-Based Targeting for Acquisition
- Strong age-revenue correlation (r=0.76) suggests targeting older prospects
- Focus marketing on 40+ demographic for higher-value customers
- Test age-based messaging and channel selection
-
Address Revenue Concentration Risk
- Top 2 customers represent 71% of revenue - high risk
- Implement customer success program for high-value accounts
- Diversify customer base to reduce dependency
-
Customer Activation Program
- Monitor new signups for revenue activity within 30 days
- Create automated onboarding and activation campaigns
- Identify and address barriers to first purchase
ANALYSIS RECOMMENDATIONS:
-
For Revenue Modeling: Use log transformation due to extreme right skew (skewness: 2.05)
-
For Forecasting: Consider Enterprise and SMB separately - different behavior patterns
-
For Segmentation: Include age as a key variable given strong predictive power
═══════════════════════════════════════════════════════════
VISUALIZATIONS
═══════════════════════════════════════════════════════════
Included visualizations show:
- Revenue distribution with median/mean markers (highly right-skewed)
- Revenue by customer type (Enterprise dominance visible)
- Age vs Revenue scatter plot (positive correlation visible)
- Correlation heatmap (strong age-revenue relationship)
- Missing data visualization (30% email gap highlighted)
All visualizations saved to /tmp/ directory.
═══════════════════════════════════════════════════════════
DATASET READINESS ASSESSMENT
═══════════════════════════════════════════════════════════
Current State: ✅ READY for analysis with caveats
Strengths:
- Clean data structure with no duplicates
- High completeness (96.2%)
- Clear segmentation variables (customer_type, region)
- Strong predictive relationships identified (age-revenue)
- Consistent date formatting
Limitations:
- Small sample size (n=10) - findings may not generalize
- 30% missing email addresses limits marketing capabilities
- High revenue skew requires careful statistical handling
- One inactive customer needs investigation
Best For:
- Customer segmentation exploration
- Revenue pattern analysis
- Hypothesis generation for larger datasets
- Proof-of-concept modeling
Not Suitable For:
- Statistical significance testing (sample too small)
- Predictive modeling without additional data
- Definitive business conclusions
═══════════════════════════════════════════════════════════
NEXT STEPS
═══════════════════════════════════════════════════════════
The dataset is small but clean and reveals important patterns.
Priority actions:
- Fix email gaps (30% missing) before any email campaigns
- Investigate zero-revenue customer (C010)
- Analyze Enterprise vs SMB separately - they are fundamentally different segments
Analysis opportunities:
- Customer lifetime value modeling (if you have historical data)
- Churn risk analysis (need more behavioral data)
- Revenue forecasting by segment
- Age-based targeting strategies
To expand insights, consider adding:
- Transaction history (frequency, recency, product mix)
- Customer engagement metrics (logins, support tickets)
- Acquisition channel data
- Tenure/lifetime metrics
Summary: This small dataset (10 customers) provides a clear snapshot showing Enterprise customers driving most revenue, a strong age-revenue correlation, and critical data quality issues (30% missing emails, 1 zero-revenue customer). Focus on segment-specific strategies and fix data gaps before scaling analysis.
About This Skill
Instantly profile CSV/Excel datasets to understand structure, data types, distributions, quality issues, and correlations. Generates comprehensive statistics, visualizations, and actionable insights.
View Skill DetailsMore Examples
Sales Pipeline Forecast Readiness Assessment
Quality assessment of an 8-deal sales pipeline before building forecast model. Identifies critical blocker (25% missing expected_close_date values preventing time-based forecasting), strong deal value-pipeline age correlation (r=0.88), and $2.5M outlier deal requiring validation. Provides stage-weighted analysis and forecast-specific recommendations.
Transaction Dataset Pattern and Quality Analysis
Comprehensive profiling of 100-row transaction dataset revealing regional performance differences, category distribution, and customer demographics. Discovers North region paradox (lowest volume but highest average transaction value at $3,151), identifies 4 missing customer ages (4%), and provides symmetric transaction value distribution indicating healthy business mix.