Something went wrong!
Hang in there while we get back on track
Sales Transaction Data Analysis
Analyzing a 1,000-row sales dataset with revenue, units sold, profit margin, and discount rate columns. Demonstrates statistical summaries, outlier detection, correlation analysis, and actionable business recommendations.
Input
What you provide to the skill
Generate an EDA report for the sales data at /workspace/webserver/lib/smartvox/skill_hub/skills/distribution-analyzer/sales_data.csv
Output
What the skill generates for you
Your EDA report is ready: /workspace/webserver/lib/smartvox/skill_hub/eda_report_sales_data_20251203_175852.html
Executive Summary
Dataset: 1,000 sales transactions × 4 features
Data Quality: Excellent - no missing values, no duplicates
Total Revenue: $6,473,793.27
Key Findings
1. Revenue Patterns
- Distribution: Heavily right-skewed (skewness: 3.63) with extreme outliers
- Average Transaction: $6,474 | Median: $4,512 (significant gap indicates outliers)
- Range: $546 to $82,722 (one extreme outlier at $82k - 12× the average)
- Top 10% Transactions: ≥$13,602 (driving significant revenue)
- Outliers: 72 high-value transactions (7.2%) - likely B2B or bulk orders
2. Units Sold
- Total Units: 28,592 across 1,000 transactions
- Average: 28.6 units | Median: 24 units
- Distribution: Moderately right-skewed (skewness: 1.38)
- Range: 0 to 129 units (2 transactions with 0 units - returns or errors?)
- Outliers: 46 high-volume transactions (4.6%)
3. Profit Margins
- Average Margin: 27.9% | Median: 28.6% (healthy, consistent)
- Range: -2.2% to 48.6%
- Distribution: Slightly left-skewed (skewness: -0.37) - few low-margin outliers
- Low Margin (<15%): 98 transactions (9.8%) - review for profitability
- Negative Margin: 1 transaction (-2.2%) - potential error or promotional loss
4. Discount Strategy
- Average Discount: 15.4% | Median: 3.3% (median much lower - most get small discounts)
- Distribution: Right-skewed (skewness: 1.12) - few get large discounts
- No Discount: Only 2 transactions (0.2%)
- High Discount (>50%): 108 transactions (10.8%) - aggressive promotions?
5. Correlation Analysis
CRITICAL FINDING: No significant correlations detected between any variables (all |r| < 0.3)
- Revenue ↔ Units Sold: -0.009 (essentially zero - counter-intuitive!)
- Revenue ↔ Discount Rate: -0.040 (weak negative - expected but minimal)
- Profit Margin ↔ Discount Rate: -0.013 (no relationship - surprising)
Interpretation: Revenue is driven by price per unit, not volume or discounts. Higher discounts don’t meaningfully increase units sold or reduce margins.
Data Quality Issues
- Two Zero-Unit Transactions: Rows with units_sold = 0 but positive revenue (rows 658, 715) - likely returns, cancellations, or data errors
- One Negative Margin Transaction: Row 606 with -2.2% margin - investigate if intentional (loss leader) or error
- Extreme Revenue Outlier: $82,722 (row 739) - validate this transaction
- High Kurtosis in Revenue: 26.93 (heavy tails) - extreme outliers dominate distribution
Recommendations
1. Data Cleaning
- Investigate and potentially remove/correct the 2 zero-unit transactions
- Validate the $82k revenue outlier (row 739)
- Review the negative margin transaction (row 606)
2. Discount Strategy Reassessment
- Finding: Discounts show NO correlation with units sold (-0.016) or revenue (-0.040)
- Action: Evaluate if 10.8% of transactions receiving >50% discounts are cost-effective
- Recommendation: A/B test reducing discount frequency - may maintain revenue with better margins
3. Focus on High-Value Transactions
- Top 10% of transactions (≥$13,602) drive significant revenue
- Identify characteristics of these 100 transactions to replicate success
- Separate B2B/bulk orders from retail for targeted analysis
4. Revenue Driver Analysis
- Critical Gap: Revenue doesn’t correlate with units sold
- Implication: Price per unit varies dramatically across transactions
- Next Step: Calculate
revenue_per_unit = revenue / units_soldto identify premium vs. discount products
5. Modeling Considerations
- Log-transform revenue before regression modeling (extreme skewness: 3.63)
- Segment analysis by transaction size (retail vs. bulk) will be more meaningful than aggregate models
- Outlier treatment: Consider capping revenue at 99th percentile or using robust regression methods
6. Profitability Investigation
- 98 transactions (9.8%) have margins <15% - calculate total profit contribution
- If these low-margin transactions are high-volume, they may still be valuable
- If low-margin AND low-volume, consider raising prices or eliminating these products
Technical Notes
- File Location:
/workspace/webserver/lib/smartvox/skill_hub/eda_report_sales_data_20251203_175852.html - Analysis Method: Full statistical EDA with visualizations embedded
- Outlier Detection: IQR method (1.5× IQR from Q1/Q3)
- Report Format: Self-contained HTML with base64-encoded images (works offline)
About This Skill
Generate comprehensive exploratory data analysis HTML reports with visualizations, statistics, and automated insights from CSV/Excel datasets.
View Skill DetailsMore Examples
Global Economic Data Pattern Analysis
Analyzing a 13,979-row economic dataset spanning 64 years across 262 countries. Demonstrates handling large datasets with categorical variables, extreme skewness detection, and identification of data characteristics that inform analysis strategy.
IoT Sensor Readings Analysis
Analyzing 6,000 sensor readings with temperature, humidity, vibration, and error count measurements. Demonstrates custom output filename, detection of non-normal distributions, and equipment monitoring recommendations.