HR Data Quality Pipeline
I built a synthetic 5,000-record HR dataset and seeded it with realistic defects, then profiled it in SQL and cleaned it with a documented 12-rule Python pipeline (duplicates, date formats, controlled vocabularies, orphaned references, impossible dates). It includes a before/after view and an interactive workforce dashboard.