Automated Dataset Profiler

Pro v1.0.0 1 view

Generate comprehensive data profile reports including statistics, correlations, missing data analysis, and quality insights. Works with files, inline data, or generates demo data.

What You Get

Get instant, actionable understanding of any dataset with statistical profiles, correlation analysis, data quality alerts, and cleaning recommendations - saving hours of manual exploration.

The Problem

Data analysts and business users waste significant time manually exploring new datasets to understand their structure, quality issues, and patterns before analysis. They need to know what data types exist, what's missing, what correlates with what, and what needs cleaning - but doing this manually for each dataset is tedious and error-prone.

The Solution

This skill profiles datasets from multiple input sources: CSV/Excel files, inline CSV data pasted in the prompt, or generates synthetic demo data on request. It produces comprehensive reports including: dataset overview (rows, columns, types, memory), statistical profiles for every variable (mean, median, quartiles, distributions), missing data analysis with patterns and imputation recommendations, correlation matrix with interpretation of strong relationships, outlier detection using IQR and Z-score methods, data quality alerts (type mismatches, impossible values, duplicates), and actionable recommendations. All processing uses Python (pandas, numpy, scipy) in-memory with no data storage.

How It Works

  1. 1 Determine input source: file path, inline CSV data, or demo mode (generates synthetic e-commerce dataset with realistic quality issues)
  2. 2 Load/generate dataset using pandas, handling encoding issues and sampling large datasets >100K rows
  3. 3 Generate dataset overview with row/column counts, data types breakdown, memory usage, and missing cell statistics
  4. 4 Profile each variable with descriptive statistics, distribution analysis, and outlier detection using IQR and Z-score methods
  5. 5 Analyze missing data patterns to identify random vs systematic missingness and recommend imputation strategies
  6. 6 Calculate correlation matrix for numeric variables and identify strong correlations (|r| > 0.7) with interpretation
  7. 7 Compile findings into structured report with prioritized alerts and actionable recommendations

What You'll Need

  • One of: CSV/Excel file path, inline CSV data in prompt, or request for demo/sample data
  • Python environment with pandas, numpy, scipy libraries
  • Tabular data format with rows and columns