Project  ·  Applied ML on wearable data

Training-load readiness

Predicting next-day RPE with calibrated uncertainty intervals.

How confidently can a model predict tomorrow's perceived exertion from a week of sleep, heart-rate variability, and training-load data — and, more importantly, does it know when it can't? I built an end-to-end pipeline on a 12-athlete, 120-day longitudinal dataset: ridge regression with split-conformal prediction intervals, evaluated on athletes the model never saw in training. The point accuracy is usable. The intervals reveal exactly the inter-athlete variability that motivates the doctoral research I want to do.

RPE MAE on held-out athletes (1–10 scale)
90% prediction-interval half-width
Empirical coverage of 90% PI (target 90%)
Athletes  /  days each

The problem

In elite combat sport, the coach's daily decision is "push or recover" — and they're making it from a coach's eye plus a wall of wearable data they don't have time to read. A useful model has to do two things at once: predict tomorrow's load tolerance accurately enough to act on, and communicate its own uncertainty so the coach knows when to override it. The point estimate alone is the easy part. The interval is what makes the model trustworthy in a setting where every wrong call costs training time.

Data & methodology

Data. A synthetic 12-athlete × 120-day longitudinal dataset generated with a sport-science-plausible causal structure — daily sleep, training load, HRV, soreness, and next-day RPE, with athlete-specific baselines and realistic load-sleep-HRV-fatigue dependencies. The structure mirrors the PMData and MMASH open wearable datasets; the pipeline below applies to either drop-in. Using synthetic data here keeps the demo reproducible and side-steps the data-access gating I'd resolve in a doctoral project.

Features. Per day: today's sleep, load, HRV, soreness; rolling 3-day and 7-day means of load, sleep, HRV; acute-to-chronic workload ratio (ACWR); a 7-day monotony index (Foster); and days-since-rest.

Model. Ridge regression (closed-form, λ = 1.5), standardised features. Linear is the right starting point: it's interpretable, it does not outrun the calibration set, and any gain from a richer model — gradient boosting, an LSTM — would matter less here than getting the uncertainty right.

Uncertainty. Split-conformal prediction (Lei & Wasserman, 2014). Athletes are split 7 train / 3 calibrate / 2 test. Half-widths for the 80% and 90% intervals come from the empirical quantiles of |residual| on the calibration athletes, with the standard (n+1)/n correction. This is the cleanest finite-sample, distribution-free way to put a number on "how confident is the model" — and crucially, it works on top of any point estimator.

Evaluation. Test athletes were not seen in training or calibration. Reported metrics are on those held-out athletes only.

Predicting an unseen athlete

Below: the model's per-day RPE prediction for a test athlete the model never saw, overlaid on the athlete's actual next-day RPE. Shaded bands are the 80% and 90% split-conformal prediction intervals. Toggle the legend to focus on any layer.

Calibration: predicted vs actual

Each point is one day, one held-out athlete. The diagonal is perfect prediction; the dashed band is the model's 90% interval half-width. Points outside the band are coverage failures — and there are a few too many of them, which is the finding the next section is about.

What the model learned

Standardised ridge coefficients on the training athletes. Direction matches sport-science priors: more recent sleep and higher HRV predict lower next-day RPE; days-since-rest and recent load predict higher RPE.

The interesting finding

The 90% interval covers only of test points, not 90%. Split conformal guarantees nominal coverage under exchangeability — and exchangeability breaks the moment you generalise across athletes, because each athlete has their own baseline, training history, and noise profile. The calibration residuals computed across one set of athletes systematically understate the test athletes' residual scale.

This is the load-bearing observation. Elite-sport cohorts are tiny, individuals are heterogeneous, and the standard ML assumptions — IID samples, exchangeable calibration — fail in characteristic ways. Closing this gap is the methodological core of what I want to study:

  • Athlete-stratified conformal — calibrate per-cohort, or per-cluster, so the residual scale matches the deployment distribution.
  • Hierarchical models — let each athlete have their own random intercept, so cross-athlete variance doesn't leak into the noise budget.
  • Transfer learning from sub-elite data — pre-train on larger cohorts, fine-tune per-athlete with the handful of sessions an elite competitor will ever sit through.

Limitations & honest caveats

  • Synthetic data. The dataset is generated to plausible sport-science correlations, not measured. The pipeline is data-shape-compatible with PMData and MMASH; what I haven't demonstrated here is the same numbers on real recordings.
  • Linear model. A gradient-boosted regressor would probably cut MAE by 10–20%. I deliberately kept it linear: with 7 training athletes, the limiting factor is data, not function class.
  • RPE as ground truth. RPE is the standard practical readiness metric but is itself self-reported and noisy. Hard injury or missed-session labels would be a stronger downstream target.
  • One-step horizon. The model predicts tomorrow's RPE. The interesting clinical question is multi-day, with the uncertainty band growing appropriately.

Why I built this

This is the smallest end-to-end project I could write that touches all three of the research directions I'm proposing for a PhD: a multimodal predictor with explicit uncertainty (direction 1), an honest confrontation with small-sample cross-athlete generalisation (direction 2), and a result you can actually show a strength-and-conditioning coach (direction 3). It is a proof-of-concept, not a paper — but the pipeline is the one I'd extend.

Source

Single Python file, standard library only, no external dependencies: notebooks/readiness/build_dataset.py in the repo. Generates the synthetic athletes, engineers features, fits ridge by closed form, calibrates conformal intervals, and exports the JSON this page is rendered from.