All examples for Dataset Explorer

Customer Revenue Analysis with Segmentation

Analyzing a small customer dataset (10 rows) to understand revenue patterns and data quality. Demonstrates detection of strong age-revenue correlation (r=0.76), identification of missing emails (30%), and Enterprise vs SMB segmentation insights showing 10x revenue difference between segments.

Input

What you provide to the skill

Analyze /tmp/sample_customers.csv - 10 customers with revenue data

Output

What the skill generates for you

DATASET PROFILE REPORT

Generated: 2025-12-02
File: /tmp/sample_customers.csv

═══════════════════════════════════════════════════════════

OVERVIEW

═══════════════════════════════════════════════════════════
Total Rows: 10 customers
Total Columns: 8
File Size: ~1 KB
Overall Completeness: 96.2% (3 missing values)

═══════════════════════════════════════════════════════════

DATA STRUCTURE

═══════════════════════════════════════════════════════════

Column Types:

Numeric: 2 columns

  • age, revenue

Text/Categorical: 5 columns

  • customer_id, name, email, region, customer_type

Date: 1 column

  • signup_date

═══════════════════════════════════════════════════════════

NUMERIC COLUMNS - KEY STATISTICS

═══════════════════════════════════════════════════════════

AGE

Range: 28 to 52 years
Mean: 36.7 years
Median: 35.0 years
Std Dev: 7.5 years
Distribution: Moderately right-skewed (most customers under 40)

✓ No outliers detected
✓ All values present (no missing data)

REVENUE

Range: $0.00 - $15,600.00
Mean: $3,440.20
Median: $1,175.25
Std Dev: $4,990.69
Distribution: Highly right-skewed (most customers under $2,000)

Outliers Detected: 2 values identified as statistical outliers

  • Alice Brown (C004): $15,600.00 - High-value Enterprise customer
  • Frank Miller (C008): $8,900.50 - High-value Enterprise customer

Zero Revenue: 1 customer with $0.00 revenue

  • Henry Ford (C010): New customer, signed up 2023-03-20, no purchases yet

═══════════════════════════════════════════════════════════

CATEGORICAL COLUMNS - KEY INSIGHTS

═══════════════════════════════════════════════════════════

CUSTOMER_TYPE

  • SMB: 6 customers (60%)
  • Enterprise: 4 customers (40%)

REGION

Customers are evenly distributed across four regions:

  • North: 3 customers (30%)
  • South: 3 customers (30%)
  • East: 2 customers (20%)
  • West: 2 customers (20%)

EMAIL

✓ 7 valid email addresses
3 missing emails (30%) - Impacts marketing capabilities

═══════════════════════════════════════════════════════════

DATE ANALYSIS

═══════════════════════════════════════════════════════════

SIGNUP_DATE

Date Range: November 5, 2022 - April 2, 2023
Time Span: 148 days (~5 months)

Most recent signups occurred in early 2023, with steady customer acquisition over the period.

═══════════════════════════════════════════════════════════

CORRELATION INSIGHTS

═══════════════════════════════════════════════════════════

Strong Positive Correlation:

age ↔ revenue: r=0.76 (strong positive)
Older customers generate significantly higher revenue
→ This could indicate:
- More established financial capacity
- Greater business needs (especially for Enterprise)
- Higher decision-making authority

This is a key finding for targeting and segmentation strategies.

═══════════════════════════════════════════════════════════

DATA QUALITY ISSUES

═══════════════════════════════════════════════════════════

Missing Values:

email: 3 missing (30% of dataset)

  • Affected customers: C003 (Bob Wilson), C006 (Diana Prince), C010 (Henry Ford)
  • Impact: Cannot send email marketing campaigns to 30% of customer base
  • Pattern: No clear geographic or customer type pattern

Potential Data Issues:

Zero Revenue Customer:

  • Henry Ford (C010): $0.00 revenue despite signup
  • Most recent signup (March 20, 2023) - may be inactive or in trial period
  • Action needed: Investigate customer status and engagement

Quality Checks Passed:

✓ No duplicate rows detected
✓ All customer IDs are unique
✓ All names are unique
✓ No inconsistent data formats
✓ Date formats are consistent

═══════════════════════════════════════════════════════════

KEY INSIGHTS

═══════════════════════════════════════════════════════════

1. ENTERPRISE CUSTOMERS DRIVE REVENUE

Critical Finding: Enterprise customers (40% of base) generate 10x higher revenue than SMB customers on average.

  • Enterprise average: $7,510
  • Enterprise median: $6,050
  • SMB average: $727
  • SMB median: $780

Implication: Focus acquisition and retention efforts on Enterprise segment for maximum revenue impact.

2. AGE = REVENUE PREDICTOR

Strong correlation (r=0.76) between customer age and revenue generation.

  • Customers 40+ generate significantly higher revenue
  • Younger customers (under 35) cluster in lower revenue ranges

Implication: Age could be a valuable targeting criterion for marketing campaigns and lead scoring.

3. REVENUE CONCENTRATION

Revenue is highly concentrated:

  • Top 2 customers (20%) generate ~$24,500 (71% of total revenue)
  • Bottom 6 customers (60%) generate only ~$4,400 (13% of total revenue)

Implication: High customer concentration risk - losing top customers would significantly impact revenue.

4. CUSTOMER LIFECYCLE STAGES

Dataset shows customers at different lifecycle stages:

  • Active high-value: 2 Enterprise customers generating >$8K
  • Moderate contributors: 6 SMB customers with steady revenue
  • New/Inactive: 1 customer with zero revenue (potential churn risk)

═══════════════════════════════════════════════════════════

RECOMMENDATIONS

═══════════════════════════════════════════════════════════

IMMEDIATE ACTIONS (Week 1):

  1. ✅ Collect Missing Email Addresses

    • Contact C003 (Bob Wilson), C006 (Diana Prince), C010 (Henry Ford)
    • Implement mandatory email collection at signup
    • Without emails, 30% of customers cannot receive communications
  2. ✅ Investigate Zero-Revenue Customer

    • Follow up with Henry Ford (C010) - signed up but no purchases
    • Determine if customer is in trial, inactive, or needs support
    • Consider activation campaign or win-back strategy
  3. ✅ Review High-Value Customers

    • Verify Alice Brown ($15,600) and Frank Miller ($8,900) transactions
    • If accurate, analyze what drove their high spend
    • Create case studies to replicate success

STRATEGIC RECOMMENDATIONS:

  1. Segment All Future Analyses by Customer Type

    • Enterprise and SMB behave completely differently (10x revenue gap)
    • Build separate strategies, campaigns, and metrics for each segment
    • Consider different pricing, features, or support tiers
  2. Age-Based Targeting for Acquisition

    • Strong age-revenue correlation (r=0.76) suggests targeting older prospects
    • Focus marketing on 40+ demographic for higher-value customers
    • Test age-based messaging and channel selection
  3. Address Revenue Concentration Risk

    • Top 2 customers represent 71% of revenue - high risk
    • Implement customer success program for high-value accounts
    • Diversify customer base to reduce dependency
  4. Customer Activation Program

    • Monitor new signups for revenue activity within 30 days
    • Create automated onboarding and activation campaigns
    • Identify and address barriers to first purchase

ANALYSIS RECOMMENDATIONS:

  1. For Revenue Modeling: Use log transformation due to extreme right skew (skewness: 2.05)

  2. For Forecasting: Consider Enterprise and SMB separately - different behavior patterns

  3. For Segmentation: Include age as a key variable given strong predictive power

═══════════════════════════════════════════════════════════

VISUALIZATIONS

═══════════════════════════════════════════════════════════

Included visualizations show:

  • Revenue distribution with median/mean markers (highly right-skewed)
  • Revenue by customer type (Enterprise dominance visible)
  • Age vs Revenue scatter plot (positive correlation visible)
  • Correlation heatmap (strong age-revenue relationship)
  • Missing data visualization (30% email gap highlighted)

All visualizations saved to /tmp/ directory.

═══════════════════════════════════════════════════════════

DATASET READINESS ASSESSMENT

═══════════════════════════════════════════════════════════

Current State:READY for analysis with caveats

Strengths:

  • Clean data structure with no duplicates
  • High completeness (96.2%)
  • Clear segmentation variables (customer_type, region)
  • Strong predictive relationships identified (age-revenue)
  • Consistent date formatting

Limitations:

  • Small sample size (n=10) - findings may not generalize
  • 30% missing email addresses limits marketing capabilities
  • High revenue skew requires careful statistical handling
  • One inactive customer needs investigation

Best For:

  • Customer segmentation exploration
  • Revenue pattern analysis
  • Hypothesis generation for larger datasets
  • Proof-of-concept modeling

Not Suitable For:

  • Statistical significance testing (sample too small)
  • Predictive modeling without additional data
  • Definitive business conclusions

═══════════════════════════════════════════════════════════

NEXT STEPS

═══════════════════════════════════════════════════════════

The dataset is small but clean and reveals important patterns.

Priority actions:

  1. Fix email gaps (30% missing) before any email campaigns
  2. Investigate zero-revenue customer (C010)
  3. Analyze Enterprise vs SMB separately - they are fundamentally different segments

Analysis opportunities:

  • Customer lifetime value modeling (if you have historical data)
  • Churn risk analysis (need more behavioral data)
  • Revenue forecasting by segment
  • Age-based targeting strategies

To expand insights, consider adding:

  • Transaction history (frequency, recency, product mix)
  • Customer engagement metrics (logins, support tickets)
  • Acquisition channel data
  • Tenure/lifetime metrics

Summary: This small dataset (10 customers) provides a clear snapshot showing Enterprise customers driving most revenue, a strong age-revenue correlation, and critical data quality issues (30% missing emails, 1 zero-revenue customer). Focus on segment-specific strategies and fix data gaps before scaling analysis.