Analysis & Plotting

Data Ingestion

W.I.N.G.S. — Ingest GPU simulation results

Reads all CSV files from the GPU results directory, parses the experimental conditions from the filenames, and produces a single combined CSV ready for plot_results.py.

Filename format (matches both old CPU and new GPU outputs):

cytoplasmic_incompatibility_<True|False>_male_killing_<True|False>_ increased_exploration_rate_<True|False>_increased_eggs_<True|False>_<replicate>.csv

CSV columns expected: Population Size, Infection Rate

(Day is added automatically as row index + 1)

Output columns (what plot_results.py expects):

Day, Population Size, Infection Rate, Cytoplasmic Incompatibility, Male Killing, Increased Exploration Rate, Increased Eggs, Replicate ID

wings.analysis.ingest.parse_conditions_from_filename(filename)[source]

Extract experimental conditions and replicate ID from filename.

Returns

tuple : (ci: bool, mk: bool, er: bool, eg: bool, rep_id: int)

Parameters:

filename (str)

wings.analysis.ingest.normalise_columns(df)[source]

Ensure the DataFrame has ‘Population Size’ and ‘Infection Rate’ columns regardless of what the CSV header actually says. Handles the GPU output (already correct) and any minor variations.

Return type:

DataFrame

Parameters:

df (DataFrame)

wings.analysis.ingest.main(data_dir, output_file)[source]

CLI entry point for data ingestion.

Parses --input-dir and --output arguments, calls ingest_directory(), and writes the combined CSV.

Parameters:
  • data_dir (str)

  • output_file (str)

W.I.N.G.S. — Δp sweep data ingestion.

Reads individual ABM CSV outputs from the delta-p sweep and combines them into a single CSV for plotting.

Filename convention:

{PHENOTYPE}_frac{NNN}_rep{R}.csv e.g. CI_frac050_rep3.csv → phenotype=CI, fraction=0.50, rep=3

Each per-run CSV has columns: Day, Population Size, Infection Rate, …

Output CSV adds: Phenotype, Infected Fraction, Replicate ID

Usage:

python ingest_delta_p.py –input-dir /projects/…/abm_delta_p –output data/combined_delta_p.csv

wings.analysis.ingest_delta_p.parse_filename(fname)[source]

Extract phenotype, initial fraction, and replicate ID from filename.

Expected format: {PHENOTYPE}_frac{NNN}_rep{R}.csv (e.g. CI_frac050_rep3.csv → phenotype=CI, fraction=0.50, rep=3).

Parameters:

fname (str) – Filename (basename only).

Returns:

(phenotype, fraction, replicate) or None.

Return type:

tuple or None

wings.analysis.ingest_delta_p.main()[source]

CLI entry point for Δp sweep data ingestion.

Scans the input directory for per-run CSVs, adds a Day column from row index (raw CSVs lack a time column), and combines into a single dataset.

Publication Figures

W.I.N.G.S. — Publication figure generator.

Produces parallel figure sets for the Agent-Based Model (ABM) and the Wright-Fisher fixed-size Model (WFM), organised by biologically meaningful combination subsets.

Colour scheme: Paul Tol’s qualitative palette (colourblind-safe). Line plots differentiate combos via colour + dash pattern + line width. All figures exported as both PNG (300 dpi) and SVG (text as text objects).

Figure catalogue

For EACH model (ABM / WFM):

1–2. infection_over_time_{subset}.{png,svg} 3–4. final_infection_{subset}.{png,svg} 5–6. time_to_fixation_{subset}.{png,svg} 7. heatmap_infection.{png,svg} 8. heatmap_fixation_pct.{png,svg} 9. heatmap_fixation_time.{png,svg} ABM-only: 10–11. population_over_time_{subset}.{png,svg} 12. heatmap_population.{png,svg}

Usage

python plot_wings.py –model abm –input data/combined_abm.csv python plot_wings.py –model wfm –input data/combined_wfm.csv

wings.analysis.plot_wings.get_style(label)[source]

Return (colour, dash, linewidth) for a combo label.

wings.analysis.plot_wings.parse_exclude(exclude_str)[source]

Parse a comma-separated exclusion string into a set of mechanic names.

Parameters:

exclude_str (str or None) – Comma-separated mechanic abbreviations, e.g. "MK" or "MK,IE". Case-insensitive.

Returns:

Uppercase mechanic abbreviations to exclude.

Return type:

set[str]

Raises:

SystemExit – If an unrecognised abbreviation is provided.

wings.analysis.plot_wings.filter_subset_by_exclusion(subset, excluded)[source]

Remove combos from a subset that use any excluded mechanic.

Parameters:
  • subset (list[tuple]) – List of (ci, mk, er, ie) boolean tuples.

  • excluded (set[str]) – Mechanic abbreviations to exclude (e.g. {"MK"}).

Returns:

Filtered subset with excluded combos removed.

Return type:

list[tuple]

wings.analysis.plot_wings.filter_heatmap_configs(row_configs, col_configs, excluded)[source]

Remove heatmap rows/columns that contain an excluded mechanic.

Parameters:
  • row_configs (list[tuple]) – Row definitions (ci, mk, label).

  • col_configs (list[tuple]) – Column definitions (er, ie, label).

  • excluded (set[str]) – Mechanic abbreviations to exclude.

Returns:

Filtered (row_configs, col_configs).

Return type:

tuple[list, list]

wings.analysis.plot_wings.combo_label(ci, mk, er, ie)[source]

Build a human-readable label from boolean effect flags.

Parameters:
  • ci (bool) – Cytoplasmic incompatibility active.

  • mk (bool) – Male killing active.

  • er (bool) – Increased exploration rate active.

  • ie (bool) – Increased eggs active.

Returns:

Label like "CI+ER" or "None" (no effects).

Return type:

str

wings.analysis.plot_wings.save_fig(fig, path_stem)[source]

Save a matplotlib figure as PNG (300 dpi) and SVG (editable text).

Parameters:
  • fig (matplotlib.figure.Figure) – The figure to save.

  • path_stem (str) – Output path without extension (e.g. "figures/infection_rate"). Produces .png and .svg.

wings.analysis.plot_wings.load_data(path)[source]

Load a combined simulation CSV produced by ingest.ingest_directory().

Normalises boolean effect columns and adds a Combo label column.

Parameters:

path (str) – Path to the combined CSV.

Returns:

Loaded data with boolean columns and combo labels.

Return type:

pandas.DataFrame

wings.analysis.plot_wings.filter_combos(df, subset)[source]

Filter a DataFrame to only the specified effect combinations.

Parameters:
Returns:

Filtered copy containing only matching rows.

Return type:

pandas.DataFrame

wings.analysis.plot_wings.get_ordered_labels(subset)[source]

Return combo labels in the order defined by the subset.

Parameters:

subset (list[tuple]) – List of (ci, mk, er, ie) tuples.

Returns:

Combo label strings in subset order.

Return type:

list[str]

wings.analysis.plot_wings.compute_timeseries_stats(df, time_col='Day')[source]

Compute per-timepoint summary statistics across replicates.

For each combination × time point, calculates median, IQR (25th/75th percentiles), and 5th/95th percentiles for both infection rate and population size.

Parameters:
  • df (pandas.DataFrame) – Simulation data.

  • time_col (str) – Name of the time column. Defaults to "Day".

Returns:

Summary statistics with columns inf_median, inf_q25, inf_q75, etc.

Return type:

pandas.DataFrame

wings.analysis.plot_wings.compute_final_values(df, time_col='Day')[source]

Extract the final time-point values for each replicate.

Parameters:
  • df (pandas.DataFrame) – Simulation data.

  • time_col (str) – Name of the time column. Defaults to "Day".

Returns:

One row per combo × replicate, containing the last recorded infection rate and population size.

Return type:

pandas.DataFrame

wings.analysis.plot_wings.compute_time_to_fixation(df, time_col='Day', threshold=0.99)[source]

Find the first time point where infection rate ≥ threshold.

Parameters:
  • df (pandas.DataFrame) – Simulation data.

  • time_col (str) – Time column name. Defaults to "Day".

  • threshold (float) – Fixation threshold. Defaults to 0.99.

Returns:

Columns Combo, Replicate ID, t_fix (NaN if fixation never reached).

Return type:

pandas.DataFrame

wings.analysis.plot_wings.plot_timeseries(df, subset, subset_name, metric, ylabel, title, path_stem, time_col='Day', is_abm=True, skip_before=None)[source]

Line + ribbon plot of a metric over time for each combination.

Draws median line with IQR ribbon and 5th–95th percentile whisker band. ABM plots use symlog x-axis; WFM uses linear.

Parameters:
  • df (pandas.DataFrame) – Simulation data.

  • subset (list[tuple]) – Effect combinations to include.

  • subset_name (str) – Human-readable subset name for title.

  • metric (str) – Column prefix ("inf" or "pop").

  • ylabel (str) – Y-axis label.

  • title (str) – Figure title.

  • path_stem (str) – Output path without extension.

  • time_col (str) – Time column name. Defaults to "Day".

  • is_abm (bool) – Use ABM-specific formatting. Defaults to True.

  • skip_before (int, optional) – Skip time points before this day.

wings.analysis.plot_wings.plot_final_infection(df, subset, subset_name, title, path_stem, time_col='Day')[source]

Strip plot of final infection rate with fixation-% annotation.

Each replicate is a jittered dot; a diamond shows the median; a vertical bar shows the IQR. Fixation percentage is annotated above each combination.

Parameters:
  • df (pandas.DataFrame) – Simulation data.

  • subset (list[tuple]) – Effect combinations to include.

  • subset_name (str) – Subset label for title.

  • title (str) – Figure title.

  • path_stem (str) – Output path without extension.

  • time_col (str) – Time column. Defaults to "Day".

wings.analysis.plot_wings.plot_final_strip(df, subset, subset_name, metric_col, ylabel, title, path_stem, time_col='Day', csv_col_name=None, fixation_annot=False)[source]

Strip + median diamond + IQR bar for any final-timepoint metric. Exports companion CSV with columns: mechanic, {csv_col_name}.

wings.analysis.plot_wings.plot_time_to_fixation(df, subset, subset_name, title, path_stem, time_col='Day', is_abm=True)[source]

Violin + strip plot of time-to-fixation across replicates.

Combinations that never reach fixation are listed as italic annotations.

Parameters:
  • df (pandas.DataFrame) – Simulation data.

  • subset (list[tuple]) – Effect combinations.

  • subset_name (str) – Subset label.

  • title (str) – Figure title.

  • path_stem (str) – Output path without extension.

  • time_col (str) – Time column. Defaults to "Day".

  • is_abm (bool) – ABM formatting. Defaults to True.

wings.analysis.plot_wings.plot_heatmap(df, metric_func, cmap, cbar_label, title, path_stem, time_col='Day', fmt='.2f', vmin=None, vmax=None, csv_raw_func=None, csv_value_col='value', excluded=None)[source]

Heatmap of effect combinations, with optional mechanic exclusion.

Rows represent the CI × MK axis; columns represent the ER × IE axis. When mechanics are excluded via excluded, the corresponding rows or columns are removed and the heatmap shrinks accordingly.

Parameters:
  • df (pandas.DataFrame) – Simulation data.

  • metric_func (callable) – Function (df_subset, time_col) float computing the cell value.

  • cmap (str) – Matplotlib colourmap name.

  • cbar_label (str) – Colourbar label.

  • title (str) – Figure title.

  • path_stem (str) – Output path without extension.

  • time_col (str) – Time column. Defaults to "Day".

  • fmt (str) – Number format string. Defaults to ".2f".

  • vmin (float, optional) – Colourmap minimum.

  • vmax (float, optional) – Colourmap maximum.

  • csv_raw_func (callable, optional) – Per-replicate extraction function for CSV export.

  • excluded (set[str], optional) – Mechanic abbreviations to exclude.

wings.analysis.plot_wings.metric_median_final_infection(df_sub, time_col)[source]
wings.analysis.plot_wings.metric_fixation_pct(df_sub, time_col)[source]
wings.analysis.plot_wings.metric_median_time_to_fixation(df_sub, time_col)[source]
wings.analysis.plot_wings.raw_final_infection(df_sub, time_col)[source]

One row per replicate: final infection rate.

wings.analysis.plot_wings.raw_fixation_binary(df_sub, time_col)[source]

One row per replicate: 1 if reached fixation, 0 otherwise.

wings.analysis.plot_wings.raw_time_to_fixation(df_sub, time_col)[source]

One row per replicate: time to fixation (NaN if never).

wings.analysis.plot_wings.raw_final_population(df_sub, time_col)[source]

One row per replicate: final population size.

wings.analysis.plot_wings.metric_median_final_population(df_sub, time_col)[source]
wings.analysis.plot_wings.metric_median_final_infected(df_sub, time_col)[source]
wings.analysis.plot_wings.raw_final_infected_count(df_sub, time_col)[source]

One row per replicate: final number of infected beetles.

wings.analysis.plot_wings.generate_figures(df, model, outdir, time_col='Day', excluded=None)[source]

Generate the complete set of publication figures for one model.

Produces time series, strip plots, violin plots, and heatmaps for both combo subsets (Individual Effects, ER-Centric), plus ABM-only population size figures.

Parameters:
  • df (pandas.DataFrame) – Combined simulation data.

  • model (str) – "abm" or "wfm".

  • outdir (str) – Output directory for figures.

  • time_col (str) – Time column. Defaults to "Day".

  • excluded (set[str], optional) – Mechanic abbreviations to exclude (e.g. {"MK"}). Combos using excluded mechanics are removed from subsets, and heatmap rows/columns are pruned.

wings.analysis.plot_wings.main()[source]

CLI entry point for the publication figure generator.

Parses --model, --input, --outdir arguments, loads data, and generates all figures.

Δp Analysis

W.I.N.G.S. — Δp vs p figure (complementary strategies).

Extracts per-interval infection frequency change (Δp) from ABM time series and overlays analytical models.

Key design choices

  • First-passage extraction: for each replicate, only uses the FIRST time the trajectory passes through each p-bin (ascending), avoiding contamination from equilibrium wobble or stalled runs.

  • Asymmetric CI model: A·p^α·(1-p)^γ allows the peak to shift right, capturing spatial/growth effects the mean-field Turelli model misses.

  • Diminishing-returns ER model: s_0·(1-p)^β captures mate-finding saturation.

Outputs (PNG 300dpi + SVG with editable text):

delta_p_vs_p.{png,svg} — 3-panel (CI, ER, CI+ER) delta_p_overlay.{png,svg} — single-panel with crossing point

Usage:

python plot_delta_p.py –input data/combined_delta_p.csv python plot_delta_p.py –input data/combined_delta_p.csv –dt 14 –ascending-only

wings.analysis.plot_delta_p.detect_columns(df)[source]

Auto-detect the time and infection rate column names.

Searches for common variants (Day, Days, Time, etc.) case-insensitively.

Parameters:

df (pandas.DataFrame) – Input data.

Returns:

(time_column_name, infection_column_name).

Return type:

tuple[str, str]

Raises:

SystemExit – If either column cannot be found.

wings.analysis.plot_delta_p.model_ci_asymmetric(p, A, alpha, gamma)[source]

Asymmetric CI selection coefficient: A · p^α · (1-p)^γ.

Generalises the Turelli (1994) model p(1-p) · s_h / (1 - p·s_h) by allowing different exponents on the left (α) and right (γ) sides. When α > γ the peak shifts right of 0.5, capturing spatial clustering and population growth effects.

Parameters:
  • p (numpy.ndarray) – Infection frequency array.

  • A (float) – Amplitude parameter.

  • alpha (float) – Left exponent (rise rate at low p).

  • gamma (float) – Right exponent (decay rate at high p).

Returns:

Expected Δp per generation.

Return type:

numpy.ndarray

wings.analysis.plot_delta_p.model_er(p, s_0, beta)[source]

ER selection coefficient with diminishing returns: s_0 · (1-p)^β.

The mate-finding advantage of increased exploration decays as infected females become common and the mating landscape saturates.

Parameters:
  • p (numpy.ndarray) – Infection frequency array.

  • s_0 (float) – Base advantage at p → 0.

  • beta (float) – Decay exponent (higher = faster saturation).

Returns:

Expected Δp per generation from ER.

Return type:

numpy.ndarray

wings.analysis.plot_delta_p.model_ci_er(p, A, alpha, gamma, s_0, beta)[source]

Combined CI + ER selection (additive).

Parameters:
  • p (numpy.ndarray) – Infection frequency.

  • A (float) – CI amplitude.

  • alpha (float) – CI left exponent.

  • gamma (float) – CI right exponent.

  • s_0 (float) – ER base advantage.

  • beta (float) – ER decay exponent.

Returns:

Combined expected Δp.

Return type:

numpy.ndarray

wings.analysis.plot_delta_p.extract_delta_p(df, time_col, inf_col, dt=7, ascending_only=False)[source]

Extract (p, Δp) pairs from replicate time series.

For each replicate, samples the infection rate at intervals of dt days and computes the change. Skips the pre-hatch period (day < 23) and absorbing boundaries (p < 0.01 or p > 0.99).

Parameters:
  • df (pandas.DataFrame) – Combined simulation data.

  • time_col (str) – Name of the time column.

  • inf_col (str) – Name of the infection rate column.

  • dt (int) – Sampling interval in days. Defaults to 7.

  • ascending_only (bool) – If True, only use intervals where Δp > 0 and only the first passage through each 5%-bin. Defaults to False.

Returns:

Columns Phenotype, p, delta_p, initial_fraction, and optionally pop_size.

Return type:

pandas.DataFrame

wings.analysis.plot_delta_p.bin_delta_p(dp_df, n_bins=20)[source]

Bin (p, Δp) data by infection frequency and compute statistics.

Parameters:
  • dp_df (pandas.DataFrame) – Raw Δp data from extraction.

  • n_bins (int) – Number of equal-width bins over [0, 1]. Defaults to 20.

Returns:

Per-bin summary with Phenotype, p_mid, dp_median, dp_q25, dp_q75, dp_mean, n.

Return type:

pandas.DataFrame

wings.analysis.plot_delta_p.fit_analytical(binned_df)[source]

Fit the asymmetric CI and diminishing-returns ER models.

Uses scipy.optimize.curve_fit on the binned median Δp data. Falls back to default parameters if fitting fails or data is insufficient.

Parameters:

binned_df (pandas.DataFrame) – Binned Δp data.

Returns:

Fitted parameters {A, alpha, gamma, s_0, beta}.

Return type:

dict

wings.analysis.plot_delta_p.save_fig(fig, path_stem)[source]

Save a figure as PNG (300 dpi) and SVG (editable text).

Parameters:
wings.analysis.plot_delta_p.plot_delta_p_figure(binned_df, params, outdir, dt)[source]

Three-panel figure: CI, ER, CI+ER with ABM data and analytical overlay.

Each panel shows binned ABM median + IQR ribbon and a dashed analytical curve. CI panels mark the unstable equilibrium (Δp = 0 crossing).

Parameters:
  • binned_df (pandas.DataFrame) – Binned Δp data.

  • params (dict) – Fitted analytical parameters.

  • outdir (str) – Output directory.

  • dt (int) – Time interval used for Δp.

wings.analysis.plot_delta_p.plot_overlay_figure(binned_df, params, outdir, dt)[source]

Single-panel overlay of all three phenotypes with crossing point.

Marks the frequency where CI overtakes ER as the dominant driver of spread (“ER → CI handoff”) and shades the ER-dominant and CI-dominant frequency regions.

Parameters:
  • binned_df (pandas.DataFrame) – Binned Δp data.

  • params (dict) – Fitted analytical parameters.

  • outdir (str) – Output directory.

  • dt (int) – Time interval used for Δp.

wings.analysis.plot_delta_p.main()[source]

CLI entry point for the Δp figure generator.

Supports four modes via --mode:
  • all: Full trajectory averaging (default).

  • ascending: First-passage, Δp > 0 only.

  • initial: One Δp per replicate (Turelli-correct).

  • compare: Run all three and produce comparison figure.

Statistical Tests

wings.analysis.stats.read_data(file_path)[source]

Read a simulation CSV into a pandas DataFrame.

Parameters:

file_path (str) – Path to the CSV file.

Returns:

Raw simulation data.

Return type:

pandas.DataFrame

wings.analysis.stats.bootstrap_ci(data, n_bootstrap=1000, ci=0.95)[source]

Compute a bootstrap confidence interval for the median.

Parameters:
  • data (array-like) – Observed values.

  • n_bootstrap (int) – Number of bootstrap resamples. Defaults to 10000.

  • ci (float) – Confidence level (0–1). Defaults to 0.95.

Returns:

Lower and upper bounds of the confidence interval.

Return type:

tuple[float, float]

wings.analysis.stats.calculate_statistics(data)[source]

Calculate required statistics for population size and infection rate.

wings.analysis.stats.get_wolbachia_effects()[source]
wings.analysis.stats.combination_to_string(effects, combination)[source]
wings.analysis.stats.main(file_path, output_file)[source]