Something went wrong!
Hang in there while we get back on track
Employee Directory Referential Integrity Check
Validates an employee directory for referential integrity issues including circular manager references (employee manages themselves), orphaned manager IDs pointing to non-existent employees, and duplicate records. Critical for HR system migrations.
Input
What you provide to the skill
Quality check: emp_id,name,manager_id,dept
1,Alice,1,Sales
2,Bob,5,
3,Carol,1,Marketing
3,Carol,1,Marketing
Output
What the skill generates for you
DATA QUALITY HEALTH CHECK REPORT
============================================================
Records: 4
Columns: 4
Analysis Date: 2025-12-03
OVERALL QUALITY SCORE: 72/100 - GOOD
DIMENSION SCORES:
Completeness: 88/100
Validity: 100/100
Consistency: 50/100
Uniqueness: 0/100
Accuracy: 100/100
TOP 5 CRITICAL ISSUES (by impact)
------------------------------------------------------------
1. CRITICAL - Circular manager references
Impact: CRITICAL | Affected: 1 rows (25.0%)
Details: Row(s) with emp_id=[1]: employees listed as their own managers
Fix: Correct manager relationships - employees cannot manage themselves
2. CRITICAL - Orphaned manager references
Impact: CRITICAL | Affected: 1 rows (25.0%)
Details: Row(s) [1]: manager_id [5] not found in employee list
Fix: Verify manager IDs exist in employee records
3. CRITICAL - Duplicate records found
Impact: CRITICAL | Affected: 1 rows (25.0%)
Details: Row(s) [2, 3]: completely identical records
Fix: df.drop_duplicates(keep="first")
4. CRITICAL - Duplicate employee IDs
Impact: CRITICAL | Affected: 1 rows (25.0%)
Details: emp_id [3] appears multiple times
Fix: df.drop_duplicates(subset=["emp_id"], keep="first")
5. HIGH - Missing values in dept
Impact: HIGH | Affected: 1 rows (25.0%)
Details: 1 missing values (25.0%)
Fix: Collect missing data or use imputation strategy
ACTIONABLE RECOMMENDATIONS
------------------------------------------------------------
IMMEDIATE ACTIONS (Critical Priority):
1. Circular manager references
Action: Correct manager relationships - employees cannot manage themselves
2. Orphaned manager references
Action: Verify manager IDs exist in employee records
3. Duplicate records found
Action: df.drop_duplicates(keep="first")
4. Duplicate employee IDs
Action: df.drop_duplicates(subset=["emp_id"], keep="first")
ESTIMATED IMPROVEMENT IMPACT
------------------------------------------------------------
Current Score: 72/100
Projected Score: 100/100 (after fixes)
STATUS: CONDITIONAL PASS (fix critical issues first)
About This Skill
Automated data quality assessment across 5 dimensions with actionable fix recommendations
View Skill DetailsMore Examples
Customer Database Quality Check
Validates a customer database CSV with common issues: missing email addresses, duplicate customer IDs, invalid email formats, and impossible dates. Demonstrates completeness, validity, and uniqueness dimension scoring with specific row-level issue identification.
Sales Transaction Calculation Validation
Analyzes a sales orders dataset to detect calculation mismatches where total doesn't equal quantity times price. Demonstrates the consistency dimension by catching arithmetic errors and missing values in transactional data.