All examples for Data Quality Checker

Employee Directory Referential Integrity Check

Validates an employee directory for referential integrity issues including circular manager references (employee manages themselves), orphaned manager IDs pointing to non-existent employees, and duplicate records. Critical for HR system migrations.

Input

What you provide to the skill

Quality check: emp_id,name,manager_id,dept
1,Alice,1,Sales
2,Bob,5,
3,Carol,1,Marketing
3,Carol,1,Marketing

Output

What the skill generates for you

DATA QUALITY HEALTH CHECK REPORT
============================================================
Records: 4
Columns: 4
Analysis Date: 2025-12-03
OVERALL QUALITY SCORE: 72/100 - GOOD
DIMENSION SCORES:
Completeness: 88/100
Validity: 100/100
Consistency: 50/100
Uniqueness: 0/100
Accuracy: 100/100
TOP 5 CRITICAL ISSUES (by impact)
------------------------------------------------------------
1. CRITICAL - Circular manager references
Impact: CRITICAL | Affected: 1 rows (25.0%)
Details: Row(s) with emp_id=[1]: employees listed as their own managers
Fix: Correct manager relationships - employees cannot manage themselves
2. CRITICAL - Orphaned manager references
Impact: CRITICAL | Affected: 1 rows (25.0%)
Details: Row(s) [1]: manager_id [5] not found in employee list
Fix: Verify manager IDs exist in employee records
3. CRITICAL - Duplicate records found
Impact: CRITICAL | Affected: 1 rows (25.0%)
Details: Row(s) [2, 3]: completely identical records
Fix: df.drop_duplicates(keep="first")
4. CRITICAL - Duplicate employee IDs
Impact: CRITICAL | Affected: 1 rows (25.0%)
Details: emp_id [3] appears multiple times
Fix: df.drop_duplicates(subset=["emp_id"], keep="first")
5. HIGH - Missing values in dept
Impact: HIGH | Affected: 1 rows (25.0%)
Details: 1 missing values (25.0%)
Fix: Collect missing data or use imputation strategy
ACTIONABLE RECOMMENDATIONS
------------------------------------------------------------
IMMEDIATE ACTIONS (Critical Priority):
1. Circular manager references
Action: Correct manager relationships - employees cannot manage themselves
2. Orphaned manager references
Action: Verify manager IDs exist in employee records
3. Duplicate records found
Action: df.drop_duplicates(keep="first")
4. Duplicate employee IDs
Action: df.drop_duplicates(subset=["emp_id"], keep="first")
ESTIMATED IMPROVEMENT IMPACT
------------------------------------------------------------
Current Score: 72/100
Projected Score: 100/100 (after fixes)
STATUS: CONDITIONAL PASS (fix critical issues first)