Open to full-time Data Engineer & Data Scientist roles

Hello, I'm Michael Whitfield Jr

Data Engineer & Data Scientist

Building data pipelines and ML systems that turn messy data into clear decisions.

About Me

Michael Whitfield Jr

I got into data the hard way — 3rd place at the 2024 CCAC research competition for a model that quantified geographic fan bias in March Madness brackets. The math was fun. What hooked me was the engineering: cleaning 76k brackets, building features from raw geography, shipping a Streamlit app someone could actually use.

Since then I've spent two summers at Grainger — first building ETL pipelines on Airflow that cut processing time 10%, then shipping predictive models that drove inventory decisions across 11 distribution centers. Pipelines that don't break and models that move a number: that's the seam I want to keep working in.

Currently looking for the next place to do that work. If you've got a hard data problem and want it solved cleanly, let's connect.

Currently building Python tooling for sprint automation in ClickUp · studying system design fundamentals · brushing up SQL window functions and DSA patterns.
๐Ÿฅ‰ CCAC 2024 3rd place, research competition
76k+ brackets analyzed, 67% accuracy
11 DCs distribution centers modeled

Skills & Stack

Languages

Python SQL JavaScript

Data & ML

pandas scikit-learn NumPy Jupyter Streamlit

Engineering

Airflow Spark AWS Docker REST APIs

Viz & Reporting

matplotlib seaborn Quarto Vite

Experience

Data Science Intern

Grainger ยท Aug 2024 โ€“ Nov 2024

Built predictive regression models and dashboards to optimize inventory transfer logic across 11 distribution centers.

Data Engineering Intern

Grainger ยท Jun 2024 โ€“ Aug 2024

Engineered end-to-end ETL pipelines on Apache Airflow + AWS S3, cutting data processing time by 10% and improving downstream accessibility.

Life Cycle Engineering Intern

Raytheon ยท May 2023 โ€“ Aug 2023

Modernized technical manuals and ran financial risk analysis on engineering lifecycles, contributing to meaningful program-level savings.

Featured Projects

March Madness Fan Predictor
ML ๐Ÿฅ‰ 3rd place ยท CCAC 2024

March Madness Fan Predictor

RandomForest model predicting NCAA bracket choices by quantifying geographic fan bias. 67% accuracy on 76k+ brackets using Haversine distance features and KenPom analytics.

Python scikit-learn Streamlit pandas
โšก
Automation

ClickUp Sprint Importer

Python CLI that turns a JSON sprint spec into a fully-structured ClickUp board โ€” Space โ†’ Folder โ†’ List โ†’ Tasks โ†’ Subtasks โ€” in one run. Tests included.

Python requests pytest REST API
๐Ÿ“Š
Data Analysis

Shopping Behavior Analysis

EDA on retail shopper data โ€” pandas/seaborn pipeline producing customer-segment insights and pairplot visualizations.

Python pandas seaborn Jupyter
Latitude โ€” Analysis & Reporting
Reporting

Latitude โ€” Analysis & Reporting

Reproducible Jupyter + Quarto workflow that turns raw data into a publishable HTML report. Clean separation between exploration and deliverable.

Python Jupyter Quarto Poetry

Education & Awards

Purdue University

2021 โ€“ 2025

B.S. in Economics, Minor in Computer Science.

Coursework: Data Structures, Algorithms, Machine Learning, Database Systems, Software Engineering.

๐Ÿฅ‰ 3rd Place โ€” CCAC 2024

NCAA & CCAC Research Competition

Won 3rd place for the March Madness Fan Predictor project โ€” RandomForest model on 76k+ brackets.

Purdue Undergraduate Research Symposium

2024

Third place finish for undergraduate research presentation.