Something went wrong!
Hang in there while we get back on track
Inline CSV - Business Data with Quality Issues
Profiles inline CSV data pasted directly in the prompt. Demonstrates detection of data quality issues including missing values (NA strings), negative employee counts, and type inconsistencies in a small business dataset.
Input
What you provide to the skill
Profile this data: id,revenue,employees,rating,sector
1,250000,45,4.2,Tech
2,NA,12,3.8,Retail
3,890000,120,4.7,Tech
4,45000,-5,2.9,Food
Output
What the skill generates for you
Dataset Profile Report: Inline Business Data
Dataset Overview
- Rows: 4 | Columns: 5 | Missing: 1 cells (5.0%)
- Data types: 4 numeric, 1 categorical
Variable Profiles
id (int64)
- Range: 1-4
- Complete: No missing values
revenue (float64)
- Mean: $395,000 | Median: $250,000
- Range: $45,000 - $890,000
- ⚠️ DATA QUALITY: 1 missing value (stored as ‘NA’ string)
- Action: Convert to numeric type and handle missing value
employees (int64)
- Mean: 43.0 | Median: 28.5
- Range: -5 to 120
- ⚠️ DATA QUALITY: 1 negative value (impossible - likely error)
- Action: Fix impossible value before analysis
rating (float64)
- Mean: 3.90 | Median: 4.00
- Range: 2.9 - 4.7
- Distribution: Appears to be 1-5 scale with fractional values
sector (object)
- Unique values: 3 (Tech, Retail, Food)
- Mode: Tech (2 occurrences, 50.0%)
- Distribution:
- Tech: 2 (50.0%)
- Retail: 1 (25.0%)
- Food: 1 (25.0%)
Correlation Analysis
Numeric Variable Correlations (after cleaning):
| Variable 1 | Variable 2 | r | Interpretation |
|---|---|---|---|
| revenue | employees | 0.98 | Very strong positive |
| revenue | rating | 0.86 | Strong positive |
| employees | rating | 0.93 | Very strong positive |
Note: Correlations computed on 3 complete cases (excluding row with missing revenue and invalid employees)
Missing Data Analysis
- revenue: 1 missing (25%) - stored as ‘NA’ string
- Pattern: Single missing value in row 2
- Recommendation: Impute with median ($250,000) or investigate source
Data Quality Alerts
High Priority:
- Fix negative employees value in row 4 (-5 is impossible)
- Convert revenue to numeric type (currently stored as string with ‘NA’)
Medium Priority:
- Handle missing revenue value in row 2 (25% missing)
- Verify if rating scale is 1-5 as assumed
Recommendations
Data Cleaning:
- Convert revenue from string to numeric type
- Fix employees value in row 4 (set to NaN or investigate correct value)
- Impute missing revenue with median ($250,000) or domain knowledge
Analysis Strategy:
- After cleaning, compare revenue by sector (Tech appears higher)
- Investigate relationship between company size (employees) and rating
- Consider if the negative employees value indicates a data entry error (should be 5?)
Quality score: 50/100 | Issues: 2 critical errors, 1 missing value in 4-row dataset
About This Skill
Generate comprehensive data profile reports including statistics, correlations, missing data analysis, and quality insights. Works with files, inline data, or generates demo data.
View Skill DetailsMore Examples
Demo Mode - Retail Purchase Dataset
Demonstrates demo mode generating synthetic retail/e-commerce data with realistic quality issues (negative ages, missing values). Shows full profiling capabilities without requiring any external file.
Focused Analysis - Outliers and Correlations
Demonstrates requesting a profile with specific analytical focus. Generates demo data and provides detailed outlier detection (IQR and Z-score methods) and full correlation matrix with interpretation of relationships.