Pipeline overview (planned)¶
ThermoFooty is a staged pipeline. As of v0.1.0-dev0 the repo is a scaffold — only Phase 1 (this document, the SQLite schema, the package skeleton) is implemented. Subsequent phases land per the lab-internal project dev plan.
Stage 1 — Match data ingestion¶
football-data.co.uk for Big-5 tier 1 + 2 + EFL League One match results (score, cards per side, attendance, referee)
Per-source ingester writes into
matches+cards+ (where available)foulstables of the SQLite database
Stage 2 — Lineup + per-player card ingestion (fbref)¶
fbref.com via worldfootballR subprocess for:
Per-match lineup tables (every player who started or was substituted in — both carded and uncarded matches)
Per-card minute-of-issue + card reason
Joins onto
lineupsandcardstablesCritical: uncarded matches must be present in the
lineupstable — the per-player dose-response analyses (H_break_player, H_mobility_*) need the uncarded denominator and cannot be fit on card-event records alone
Stage 3 — Crowd-violence arrests ingestion¶
UK Home Office football-related arrests bulletins (PDF parse) for English football tiers 1–3, 1984–2026
Bundespolizei ZIS-Jahresberichte for German football tiers 1–3, 2003–2026
Pooled in the
arreststable with a country fixed effect for the H2 / H4b analyses
Stage 4 — Stadium-day weather backfill¶
Four-tier cascade vendored from ThermoStrife v0.1.1:
Tier 1 — METAR via meteostat 2.x
Tier 2 — HadCET (British Isles only)
Tier 3 — ECMWF ERA5 reanalysis (1981+)
Tier 4 — NOAA 20CRv3 reanalysis (1806–1980, only used for pre-1981 tournament matches)
Per-(stadium, date) results land in the
weathertable withtmax_obs_c,tmax_anomaly_c,baseline_mean_c,baseline_std_c,baseline_n_days,source_tier
Stage 5 — Analysis-panel materialisation¶
LEFT JOIN of
matches × lineups × cards × weather × ...producesanalysis_panel.parquetunder$THERMOFOOTY_DATA_ROOT/derived/One materialisation per ingestion pass; checksum captured in
data_provenance
Stage 6 — Hypothesis fits¶
H1 (primary) via
thermofooty.inference.run_h1()→rerandomstats.case_crossover_conditional_logitLeague auxiliary battery (H2 / H3 / H4 / H4b / H5 / H0_spec / H_league_het)
Dose-response battery (H_break_pop / H_break_player / H_mobility_transfer / H_mobility_dual)
Tournament battery (H6 / H6b / H7 / H7c / H8 / H_omnibus)
All BH-FDR + Bonferroni correction routed through
rerandomstats.benjamini_hochberg
Stage 7 — SOTA viz¶
Raincloud + null density + forest plot + superposed epoch + warming-stripes timeline. Triple-output SVG + PNG + CSV per the lab convention.