Python SQL DuckDB PyMC XGBoost Tableau Mar – Apr 2026

Seattle Mariners
Front Office Analytics

Three front-office questions — payroll efficiency, development pipeline ROI, and free-agent targeting — answered through a Bayesian and machine learning lens. A hierarchical PyMC model quantifies true player talent with full uncertainty, a decade of draft data in DuckDB surfaces where the pipeline is producing value, and an XGBoost projection model ranks 2026 FA targets by surplus WAR over market cost.


10
Seasons of Data (2015–2025)
3
Analytical Pillars
~2,400
Player-Seasons Modeled

01 / The Problem

Three questions every front office faces every offseason.

Payroll-constrained teams can't afford bad contracts. The Mariners operate in a mid-market window where every dollar of AAV must justify itself in WAR — yet traditional point-estimate stats obscure how confident you should actually be in a player's true talent level, especially for bench players and early-season samples.

Beyond current contracts, two forward-looking questions shape roster construction: is the development pipeline producing MLB-caliber talent at the positions the team needs, and which free agents represent genuine surplus value relative to the market?

This project builds three interconnected analytical tools to answer all three, each feeding into the next.


02 / The Data

Four sources, one DuckDB warehouse.

FanGraphs (via pybaseball)

Player-season batting and pitching stats including WAR, PA, and contract data. 2015–2025. Ingested via curl_cffi to bypass Cloudflare; idempotent upserts into DuckDB.

Baseball Savant (Statcast)

Exit velocity, barrel%, and xwOBA per player-season from 2015 onward. Contact quality features used as inputs to the FA projection model.

Baseball Reference (Draft)

Amateur draft picks for classes 2013–2025, rounds 1–20, matched to mlbam_id for career WAR joins. Pick-slot baseline computed across all 30 teams.

Spotrac (Contracts)

Current Mariners contract AAV scraped via BeautifulSoup. Joined to player-seasons by normalized name. Full reload on each run — Spotrac is the authoritative source.


03 / Methodology

01

Hierarchical Bayesian WAR Model

Each player's latent true-talent WAR is drawn from a position-group distribution (C, 1B, 2B, 3B, SS, OF, UT), which is itself drawn from a league-wide hyperprior. Playing time is encoded as observation uncertainty — a player with 90 PA has a wide likelihood; one with 650 PA has a narrow one. This partial pooling prevents overreacting to small samples while still updating meaningfully on full-season evidence. Sampled with NUTS at target_accept=0.98 (4 chains × 1,000 draws, 0 divergences). Posterior summarized as mean ± 94% HDI per player.

02

Draft Cohort ROI — Pure SQL in DuckDB

Draft classes 2013–2025 analyzed entirely in DuckDB window functions and CTEs. Pick-slot WAR baselines computed as the league-average career WAR at each overall pick number across all 30 teams — any Mariners pick above that curve represents developmental surplus. MLB reach rate, time-to-debut, and a new development curve (cumulative WAR per draftee by years since draft) all exported as CSVs for Tableau. Positional pipeline gaps identified here feed directly into Pillar 3 FA targeting.

03

XGBoost FA Projection Model

Gradient-boosted regressor trained on player-seasons 2017–2023, predicting next-season WAR from trailing 3-year WAR, age, plate appearances, wOBA, and Statcast contact quality metrics (xwOBA, exit velocity, barrel%). Validation on 2024 → 2025. Point predictions from a standard XGBoost regressor; 80% prediction intervals from separate quantile models at p10 and p90, giving an honest uncertainty range per player. FA universe filtered to positional gaps from Pillar 2, then ranked by projected WAR surplus over the implied market rate.


04 / Key Findings

The data surfaced clear signals.

Cal Raleigh WAR surplus over pick-90 expectation +19.7 WAR
Draft classes above league MLB reach rate (2013–2020) 5 of 8 classes
Josh Naylor: full 94% HDI below league average WAR Highest-confidence contract flag
Elly De La Cruz projected WAR (top SS pipeline gap target) 5.9 WAR · $19.6M value
Dominic Canzone implied $/WAR vs league avg contract $0.54M/WAR (vs ~$9–11M market)

05 / Live Dashboard

Explore the full analysis interactively.

All three pillars are published to Tableau Public. The dashboard includes the Bayesian payroll efficiency scatter, draft cohort reach rates, surplus picks, and the FA target shortlist.


Tools & Technologies

Python 3 DuckDB PyMC ArviZ XGBoost pybaseball Pandas Matplotlib Tableau Public Jupyter SQL BeautifulSoup

Let's Work
Together?

Get in Touch View Resume