Something went wrong!
Hang in there while we get back on track
Focused Analysis - Outliers and Correlations
Demonstrates requesting a profile with specific analytical focus. Generates demo data and provides detailed outlier detection (IQR and Z-score methods) and full correlation matrix with interpretation of relationships.
Input
What you provide to the skill
Profile a sample dataset - I’m particularly interested in finding outliers and understanding correlations between variables
Output
What the skill generates for you
================================================================================
DATASET PROFILE REPORT: Demo E-Commerce Dataset
DATASET OVERVIEW
Rows: 1,000 | Columns: 8 | Missing: 23 cells (0.29%)
Data types: 6 numeric, 1 categorical, 1 boolean
Memory usage: 0.11 MB
Generated: Synthetic e-commerce order data
VARIABLE PROFILES
order_id (Numeric)
Mean: 1500.50 | Median: 1500.50 | Std: 288.82
Range: 1001.00 to 2000.00
Q1: 1250.75 | Q3: 1750.25
customer_age (Numeric)
Mean: 37.74 | Median: 38.00 | Std: 11.61
Range: -5.00 to 84.00
Q1: 30.00 | Q3: 45.00
Missing: 15 (1.5%)
OUTLIERS (IQR method): 10 values (1.0%)
Lower bound: 7.50 | Upper bound: 67.50
OUTLIERS (Z-score |z|>3): 5 extreme values
⚠️ DATA QUALITY ALERT: 3 negative ages (impossible values)
Action: Fix impossible values before analysis
total_amount (Numeric)
Mean: 130.45 | Median: 94.67 | Std: 119.31
Range: 8.57 to 1158.04
Q1: 55.33 | Q3: 161.27
Missing: 8 (0.8%)
OUTLIERS (IQR method): 70 values (7.1%)
Lower bound: -103.58 | Upper bound: 320.18
OUTLIERS (Z-score |z|>3): 20 extreme values
Distribution: right-skewed (skewness: 2.91)
quantity (Numeric)
Mean: 3.96 | Median: 4.00 | Std: 1.66
Range: 1.00 to 10.00
Q1: 3.00 | Q3: 5.00
OUTLIERS (IQR method): 6 values (0.6%)
Lower bound: 0.00 | Upper bound: 8.00
OUTLIERS (Z-score |z|>3): 6 extreme values
satisfaction (Numeric)
Mean: 3.77 | Median: 4.00 | Std: 1.15
Range: 1.00 to 5.00
Q1: 3.00 | Q3: 5.00
delivery_days (Numeric)
Mean: 5.00 | Median: 5.00 | Std: 2.05
Range: 1.00 to 13.00
Q1: 4.00 | Q3: 6.00
OUTLIERS (IQR method): 28 values (2.8%)
Lower bound: 1.00 | Upper bound: 9.00
OUTLIERS (Z-score |z|>3): 4 extreme values
category (Categorical)
Unique values: 5
Top categories:
- Electronics: 229 (22.9%)
- Clothing: 211 (21.1%)
- Sports: 191 (19.1%)
- Books: 190 (19.0%)
- Home: 179 (17.9%)
returned (Boolean)
False: 921 (92.1%)
True: 79 (7.9%)
CORRELATION ANALYSIS
No strong correlations found (all |r| < 0.3)
Full Correlation Matrix:
order_id customer_age total_amount quantity satisfaction delivery_days
order_id 1.00 0.02 -0.03 0.01 -0.04 -0.00
customer_age 0.02 1.00 -0.01 -0.03 0.03 0.03
total_amount -0.03 -0.01 1.00 0.00 -0.05 -0.03
quantity 0.01 -0.03 0.00 1.00 0.00 -0.04
satisfaction -0.04 0.03 -0.05 0.00 1.00 0.02
delivery_days -0.00 0.03 -0.03 -0.04 0.02 1.00
MISSING DATA ANALYSIS
Variable Missing Count Missing %
customer_age 15 1.5%
total_amount 8 0.8%
Pattern Assessment:
• Low overall missingness (<5%) - appears random
• Recommendation: Median/mode imputation or listwise deletion
DATA QUALITY ALERTS
HIGH PRIORITY:
- Fix 3 negative customer ages (impossible values)
MEDIUM PRIORITY:
- Review 20 extreme outliers in total_amount (|z| > 3)
- Review 6 extreme outliers in quantity (|z| > 3)
- Review 69 unusually high-value orders
RECOMMENDATIONS
DATA CLEANING:
- Fix customer_age impossible values (set to NaN or median imputation)
- Impute missing values using median for numeric, mode for categorical
ANALYSIS STRATEGY:
- Segment analysis by category to understand purchasing patterns
- Investigate correlation between quantity and total_amount
- Analyze relationship between delivery_days and satisfaction
- Study return patterns by category and customer demographics
- Focus on outlier orders (both high and low) to identify patterns
Quality Score: 87/100
Processing Time: <1 second
About This Skill
Generate comprehensive data profile reports including statistics, correlations, missing data analysis, and quality insights. Works with files, inline data, or generates demo data.
View Skill DetailsMore Examples
Demo Mode - Retail Purchase Dataset
Demonstrates demo mode generating synthetic retail/e-commerce data with realistic quality issues (negative ages, missing values). Shows full profiling capabilities without requiring any external file.
Inline CSV - Business Data with Quality Issues
Profiles inline CSV data pasted directly in the prompt. Demonstrates detection of data quality issues including missing values (NA strings), negative employee counts, and type inconsistencies in a small business dataset.