All examples for Automated Dataset Profiler

Inline CSV - Business Data with Quality Issues

Profiles inline CSV data pasted directly in the prompt. Demonstrates detection of data quality issues including missing values (NA strings), negative employee counts, and type inconsistencies in a small business dataset.

Input

What you provide to the skill

Profile this data: id,revenue,employees,rating,sector
1,250000,45,4.2,Tech
2,NA,12,3.8,Retail
3,890000,120,4.7,Tech
4,45000,-5,2.9,Food

Output

What the skill generates for you

Dataset Profile Report: Inline Business Data

Dataset Overview

  • Rows: 4 | Columns: 5 | Missing: 1 cells (5.0%)
  • Data types: 4 numeric, 1 categorical

Variable Profiles

id (int64)

  • Range: 1-4
  • Complete: No missing values

revenue (float64)

  • Mean: $395,000 | Median: $250,000
  • Range: $45,000 - $890,000
  • ⚠️ DATA QUALITY: 1 missing value (stored as ‘NA’ string)
  • Action: Convert to numeric type and handle missing value

employees (int64)

  • Mean: 43.0 | Median: 28.5
  • Range: -5 to 120
  • ⚠️ DATA QUALITY: 1 negative value (impossible - likely error)
  • Action: Fix impossible value before analysis

rating (float64)

  • Mean: 3.90 | Median: 4.00
  • Range: 2.9 - 4.7
  • Distribution: Appears to be 1-5 scale with fractional values

sector (object)

  • Unique values: 3 (Tech, Retail, Food)
  • Mode: Tech (2 occurrences, 50.0%)
  • Distribution:
    • Tech: 2 (50.0%)
    • Retail: 1 (25.0%)
    • Food: 1 (25.0%)

Correlation Analysis

Numeric Variable Correlations (after cleaning):

Variable 1 Variable 2 r Interpretation
revenue employees 0.98 Very strong positive
revenue rating 0.86 Strong positive
employees rating 0.93 Very strong positive

Note: Correlations computed on 3 complete cases (excluding row with missing revenue and invalid employees)

Missing Data Analysis

  • revenue: 1 missing (25%) - stored as ‘NA’ string
  • Pattern: Single missing value in row 2
  • Recommendation: Impute with median ($250,000) or investigate source

Data Quality Alerts

High Priority:

  1. Fix negative employees value in row 4 (-5 is impossible)
  2. Convert revenue to numeric type (currently stored as string with ‘NA’)

Medium Priority:

  1. Handle missing revenue value in row 2 (25% missing)
  2. Verify if rating scale is 1-5 as assumed

Recommendations

Data Cleaning:

  1. Convert revenue from string to numeric type
  2. Fix employees value in row 4 (set to NaN or investigate correct value)
  3. Impute missing revenue with median ($250,000) or domain knowledge

Analysis Strategy:

  1. After cleaning, compare revenue by sector (Tech appears higher)
  2. Investigate relationship between company size (employees) and rating
  3. Consider if the negative employees value indicates a data entry error (should be 5?)

Quality score: 50/100 | Issues: 2 critical errors, 1 missing value in 4-row dataset