All examples for EDA Report Generator

IoT Sensor Readings Analysis

Analyzing 6,000 sensor readings with temperature, humidity, vibration, and error count measurements. Demonstrates custom output filename, detection of non-normal distributions, and equipment monitoring recommendations.

Input

What you provide to the skill

Create EDA report from /workspace/webserver/lib/smartvox/skill_hub/skills/distribution-analyzer/sensor_readings.csv, save as sensor_analysis.html

Output

What the skill generates for you

EDA Report: Sensor Readings Analysis

Report Generated Successfully

File Location: /workspace/webserver/lib/smartvox/skill_hub/sensor_analysis.html

The comprehensive HTML report has been generated with embedded visualizations, statistical summaries, correlation heatmaps, and automated insights.


Step 4: Key Findings Summary

1. Dataset Overview

Dimensions & Quality:

  • Size: 6,000 sensor readings × 4 variables
  • Memory: 187.63 KB (lightweight, efficient storage)
  • Data Quality: Excellent - zero missing values (100% complete)
  • Duplicates: None detected
  • Data Types: All numeric (float64) - ready for quantitative analysis

Columns:

  • temperature - Sensor temperature readings
  • humidity - Environmental humidity measurements
  • vibration - Mechanical vibration levels
  • error_count - Equipment error frequency

2. Key Patterns & Insights

Temperature Characteristics

  • Mean: 72.0°F (range: 62.3°F to 83.8°F)
  • Distribution: Nearly perfect normal distribution (skewness: 0.01)
  • Stability: Low variance (σ = 3.0°F) indicates stable operating conditions
  • Outliers: 52 readings (0.87%) outside normal range - minimal concern

Humidity Characteristics

  • Mean: 44.9% (range: 13.6% to 73.2%)
  • Distribution: Symmetric and well-behaved
  • Variability: Moderate spread (σ = 8.0%) suggests environmental fluctuations
  • Outliers: 54 readings (0.90%) at extreme humidity levels

Vibration Characteristics

  • Mean: 0.041 units (median: 0.029)
  • Distribution: Right-skewed (skewness: 2.04) - most readings are low, with occasional spikes
  • Heavy-tailed: Kurtosis = 6.20 indicates more extreme values than normal
  • Outliers: 305 readings (5.08%) exceed normal vibration threshold of 0.12 units
  • Critical: High vibration readings may indicate mechanical issues

Error Count Analysis

  • Error Rate: 14.23% of readings show at least one error
  • Distribution: Highly right-skewed (skewness: 3.13) - most sensors error-free
  • Pattern: 854 records (14.23%) flagged as outliers by IQR method
  • Severity: Errors range from 1-10, with 85.77% of readings error-free
  • Breakdown:
    • 0 errors: 5,146 records (85.77%)
    • 1-3 errors: 523 records (8.72%)
    • 4-10 errors: 331 records (5.52%)

Correlation Analysis

  • Finding: No strong correlations detected (all |r| < 0.5)
  • Independence: Variables operate independently - no multicollinearity concerns
  • Implication: Errors are NOT driven by temperature, humidity, or vibration
  • Insight: Errors may be random or caused by external factors not captured in this dataset

3. Data Quality Issues

Strengths

  • Perfect completeness (0% missing data)
  • No duplicate records
  • Consistent data types
  • Appropriate value ranges for all sensors

Concerns

1. Vibration Spikes (Priority: HIGH)

  • 305 outliers (5.08%) with elevated vibration levels
  • Right-skewed distribution suggests occasional mechanical stress
  • Values exceed 0.12 units threshold (normal upper bound)
  • May indicate equipment wear or environmental interference

2. Error Count Distribution (Priority: MEDIUM)

  • 14.23% error rate may be acceptable depending on industry standards
  • Some sensors show up to 10 errors (extreme outliers)
  • No correlation with other variables complicates root cause analysis
  • Need to investigate: Are errors random or sensor-specific?

3. Temperature/Humidity Outliers (Priority: LOW)

  • ~1% outliers each - minimal impact
  • May represent calibration drift or environmental extremes
  • Within reasonable bounds for most applications

4. Missing Context

  • No timestamps (can’t analyze temporal patterns)
  • No sensor IDs (can’t identify problematic units)
  • No location data (can’t detect spatial patterns)
  • No metadata about normal operating ranges

4. Recommendations

Immediate Actions

  1. Investigate Vibration Outliers

    • Review the 305 high-vibration readings (>0.12 units)
    • Check for mechanical issues: loose mounting, bearing wear, imbalance
    • Consider adding vibration threshold alerts to monitoring system
    • May need log transformation for modeling due to right-skew
  2. Analyze Error Patterns

    • Add timestamp and sensor ID columns to identify if errors are:
      • Random across all sensors (acceptable)
      • Concentrated in specific units (requires maintenance)
      • Temporal (time-of-day, day-of-week patterns)
    • Investigate the 331 sensors with 4+ errors (top 5.5%)
  3. Address Distribution Skewness

    • Vibration: Apply log transformation: log(vibration + 0.001) for modeling
    • Error Count: Consider Poisson or negative binomial models rather than linear regression
    • Both variables violate normality assumptions for parametric tests

Data Collection Improvements

  1. Enhance Dataset for Future Analysis

    • Add sensor_id column to track individual unit performance
    • Add timestamp for time-series analysis and trend detection
    • Add location or zone for spatial pattern analysis
    • Include sensor_age or maintenance_date to assess wear patterns
    • Record operating_mode or load_level for context
  2. Set Monitoring Thresholds

    • Temperature: Alert if outside [64°F, 80°F] (based on outlier bounds)
    • Humidity: Alert if outside [23%, 67%]
    • Vibration: Alert if >0.12 units (95th percentile threshold)
    • Error count: Alert if ≥4 errors per reading (top 5%)

Modeling Considerations

  1. If Building Predictive Models:

    • Variables are independent (low correlations) - good for feature selection
    • Use tree-based models (Random Forest, XGBoost) for skewed distributions
    • Consider anomaly detection (Isolation Forest) for outlier identification
    • Error prediction: Use classification (error/no error) rather than regression
    • Feature engineering: Create high_vibration flag, error_rate_rolling_avg
  2. Statistical Testing:

    • Avoid parametric tests (t-tests, ANOVA) due to non-normality
    • Use non-parametric alternatives: Mann-Whitney U, Kruskal-Wallis
    • For error count analysis: Chi-square test for independence
    • Bootstrap methods for confidence intervals on skewed distributions

Summary

Your sensor monitoring system generates high-quality data with excellent completeness. The 6,000 readings reveal stable temperature/humidity conditions but concerning vibration spikes (5% of readings) and a 14% error rate. The lack of correlation between variables suggests errors are driven by unmeasured factors. Priority actions: investigate high vibration readings, add sensor IDs and timestamps for deeper analysis, and consider log transformations for skewed variables before modeling.

Report file: /workspace/webserver/lib/smartvox/skill_hub/sensor_analysis.html

The HTML report contains all visualizations (correlation heatmap, distribution plots with KDE overlays, missing data charts) and detailed statistical tables for deeper exploration.