Something went wrong!
Hang in there while we get back on track
Demo Mode - Retail Purchase Dataset
Demonstrates demo mode generating synthetic retail/e-commerce data with realistic quality issues (negative ages, missing values). Shows full profiling capabilities without requiring any external file.
Input
What you provide to the skill
Profile a sample retail dataset with customer purchases - show me what the profiler can do
Output
What the skill generates for you
======================================================================
DATASET PROFILE REPORT: Demo Retail Purchase Dataset
DATASET OVERVIEW
Rows: 1,000 | Columns: 8
Missing: 23 cells (0.29%)
Memory usage: 110.5 KB
Data types: 6 numeric, 1 categorical, 1 boolean
Dataset: Synthetic retail purchase orders
VARIABLE PROFILES
customer_age (Numeric)
Mean: 37.9 | Median: 38.0 | Std: 11.4
Range: -5 to 84
Missing: 15 (1.5%)
⚠️ DATA QUALITY: 3 negative ages (likely errors)
Action: Fix impossible values before analysis
total_amount (Numeric)
Mean: $130.45 | Median: $94.67 | Std: $119.31
Range: $8.57 - $1158.04
Distribution: Right-skewed (skew=2.91)
Outliers (IQR method): 70 orders (>320 or <-104) = 7.1%
Missing: 8 (0.8%)
quantity (Numeric)
Mean: 4.0 | Median: 4.0 | Range: 1-10
Distribution: Most orders have 3-4 items
category (Categorical)
Unique values: 5
Top 3 categories:
- Electronics: 229 (22.9%)
- Clothing: 211 (21.1%)
- Sports: 191 (19.1%)
Distribution: Relatively balanced
satisfaction (Numeric - Ordinal Scale)
Mean: 3.77 | Median: 4.0 | Range: 1-5
Distribution: Left-skewed (most customers satisfied)
Rating breakdown:
- 1 stars: 52 (5.2%)
- 2 stars: 101 (10.1%)
- 3 stars: 196 (19.6%)
- 4 stars: 332 (33.2%)
- 5 stars: 319 (31.9%)
delivery_days (Numeric)
Mean: 5.0 days | Median: 5.0 | Range: 1-13
returned (Boolean)
Return rate: 79 orders (7.9%)
Non-returned: 921 orders (92.1%)
CORRELATION ANALYSIS
MISSING DATA ANALYSIS
Variables with missing data:
- customer_age: 15 (1.5%) - Low impact
- total_amount: 8 (0.8%) - Low impact
Pattern assessment: Random (MCAR) - missing values appear independent
Imputation recommendation:
- customer_age: Median imputation or remove rows (1.5% missing)
- total_amount: Median or drop rows (0.8% missing)
DATA QUALITY ALERTS
High Priority:
- Fix 3 impossible customer ages (negative values)
→ Set to NaN or impute with median age
Medium Priority:
- Review 70 high-value orders (>$320)
→ Verify these are legitimate large purchases, not data errors - Consider imputation strategy for 23 missing values
Low Priority:
- Return rate (7.9%) is within normal e-commerce range
RECOMMENDATIONS
Data Cleaning:
- Fix customer_age impossible values:
→ Replace negative ages with NaN
→ Impute with median (38) or remove 18 affected rows (1.8%) - Handle missing total_amount values:
→ Median imputation or drop 8 rows (0.8%) - Consider capping extreme outliers if modeling
Analysis Strategy:
- Segment analysis by category:
→ Compare satisfaction and return rates across product types
→ Identify high-value categories - Investigate delivery impact:
→ Correlation between delivery_days and satisfaction
→ Set delivery time targets to optimize satisfaction - Customer segmentation by age:
→ Analyze purchase patterns across age groups
→ Target marketing based on age preferences - Return analysis:
→ Identify categories with highest return rates
→ Correlate returns with satisfaction scores
SUMMARY STATISTICS TABLE
order_id customer_age ... satisfaction delivery_days
count 1000.00 985.00 … 1000.00 1000.00
mean 1500.50 37.74 … 3.76 5.00
std 288.82 11.61 … 1.15 2.05
min 1001.00 -5.00 … 1.00 1.00
25% 1250.75 30.00 … 3.00 4.00
50% 1500.50 38.00 … 4.00 5.00
75% 1750.25 45.00 … 5.00 6.00
max 2000.00 84.00 … 5.00 13.00
[8 rows x 6 columns]
======================================================================
Quality Score: 96/100 | Dataset Status: Excellent
About This Skill
Generate comprehensive data profile reports including statistics, correlations, missing data analysis, and quality insights. Works with files, inline data, or generates demo data.
View Skill DetailsMore Examples
Inline CSV - Business Data with Quality Issues
Profiles inline CSV data pasted directly in the prompt. Demonstrates detection of data quality issues including missing values (NA strings), negative employee counts, and type inconsistencies in a small business dataset.
Focused Analysis - Outliers and Correlations
Demonstrates requesting a profile with specific analytical focus. Generates demo data and provides detailed outlier detection (IQR and Z-score methods) and full correlation matrix with interpretation of relationships.