Applied ML models to global ocean monitoring data to identify pH levels and species counts as leading indicators of marine heatwave events — then synthesized findings into decision-ready insights for marine conservation stakeholders.
01 / The Problem
Marine heatwaves — prolonged periods of anomalously high sea surface temperatures — are increasing in frequency and severity. They bleach coral reefs, disrupt fisheries, and destabilize entire ecosystems. Yet most monitoring programs report conditions after stress events occur rather than flagging leading indicators early enough to act.
The goal: determine whether existing ocean chemistry and biodiversity measurements — pH levels, dissolved oxygen, species count changes — could reliably predict heatwave onset before temperatures spike, giving conservation teams a useful early-warning window.
02 / The Dataset
Environmental Indicators
Sea surface temperature, pH level, dissolved oxygen concentration, salinity, chlorophyll-a density, ocean current velocity, atmospheric pressure, solar irradiance.
Biodiversity Metrics
Species count, diversity index, migration patterns — tracked across 12+ global monitoring sites spanning the Pacific, Atlantic, and Indian Oceans over multi-year observation windows.
Temporal alignment across sites with different sampling frequencies required careful resampling and interpolation. Missing data patterns were non-random — sensors go offline more often during storm events, which are correlated with the stress conditions we were trying to predict.
03 / Methodology
Visualized temperature anomalies, pH trends, and species count fluctuations across all monitoring sites using Matplotlib and Seaborn. Identified strong site-level variation — Pacific equatorial buoys showed pH decline rates nearly double those of Atlantic sites — and surfaced clear seasonal patterns that needed to be controlled for in modeling.
Created lag features (pH and species count readings 2–8 weeks prior to target events), rolling averages, and rate-of-change derivatives. A Stress Index — combining pH deviation, dissolved oxygen drop, and species count decline into a single composite score — proved more predictive than any individual sensor reading. Correlation analysis removed 14 redundant features before modeling.
Trained and compared a Logistic Regression baseline, Random Forest, and Gradient Boosting classifier on labeled heatwave onset events. Used time-series cross-validation to prevent data leakage across temporal folds. Evaluated on recall and lead-time accuracy — a model that correctly flags a heatwave 3 weeks out is far more valuable than one that detects it 3 days out.
Translated model outputs into plain-language findings structured for two audiences: technical conservation scientists (feature importance rankings, confidence intervals) and policy stakeholders (risk threshold maps, recommended monitoring trigger points). Findings were visualized as geographic heatmaps and time-series trend panels.
04 / Key Findings
05 / Results & Impact
The Gradient Boosting model achieved the best balance of precision and recall, with an average lead-time of 19 days before confirmed heatwave onset — long enough for conservation teams to reposition monitoring equipment, alert local fisheries, and initiate protective measures for sensitive reef zones.
Average lead time
19 days
before heatwave onset
Model recall
84%
heatwave events detected
Features reduced
40 → 18
after feature selection
Policy Recommendations
Prioritize pH and species count sensors in Pacific equatorial zones — these showed the strongest predictive signal and the fastest deterioration rates.
Adopt the Composite Stress Index as a standardized alert threshold: values above 0.7 sustained for 5+ days should trigger conservation review protocols.
Expand real-time monitoring frequency at the 4 highest-risk sites identified by the model from weekly to daily sampling intervals.
Tools & Technologies
© 2026 Owen Duffy