API Reference

Fisher’s Exact Test

┌──────────────────────────────────────────────────────────────────────┐ │ fisher_exact.py « Fisher’s Exact Test » │ │ │ │ Wrapper around scipy.stats.fisher_exact for 2×2 contingency │ │ tables. Returns the p-value for the null hypothesis that the │ │ odds ratio of the underlying population equals one. │ │ │ │ Author : Bart R.H. Geurten │ │ Licence: MIT │ └──────────────────────────────────────────────────────────────────────┘

class rerandomstats.fisher_exact.FisherExactTest(data_a, data_b, alternative='two-sided')[source]

Bases: object

Perform Fisher’s exact test on a 2×2 contingency table.

The two input tuples each contain two counts (option_A, option_B). For example (alive, dead) for treatment and control groups.

Parameters:

data_a (Tuple[int, int]) – Counts for group A as (count_optionA, count_optionB).
data_b (Tuple[int, int]) – Counts for group B as (count_optionA, count_optionB).
alternative (Literal['two-sided', 'less', 'greater']) – Sidedness of the test — 'two-sided', 'less', or 'greater'.

p_value: The p-value after main() has been called.

Example

>>> test = FisherExactTest((8, 2), (1, 5))
>>> p = test.main()
>>> p < 0.05
True

main()[source]

Run Fisher’s exact test and return the p-value.

Constructs the 2×2 contingency table from data_a and data_b, delegates to scipy.stats.fisher_exact(), and stores / returns the resulting p-value.

Returns:: The p-value of Fisher’s exact test.
Return type:: float

Fisher’s Resampling Test

┌──────────────────────────────────────────────────────────────────────┐ │ fisher_resampling.py « Fisher’s Resampling Test » │ │ │ │ Implements Sir Ronald Fisher’s re-randomisation test for │ │ comparing two independent samples. The observed test statistic │ │ (mean, median, or sum difference) is ranked against a null │ │ distribution built by combinatorial or random reshuffling. │ │ │ │ Author : Bart R.H. Geurten │ │ Licence: MIT │ └──────────────────────────────────────────────────────────────────────┘

class rerandomstats.fisher_resampling.FisherResamplingTest(data_a, data_b, func, combination_n=10000)[source]

Bases: object

Two-sample resampling test in the tradition of R.A. Fisher.

The observed difference between two groups (by mean, median, or sum) is ranked against a null distribution constructed by reshuffling group labels.

Parameters:

data_a (Sequence[float]) – First sample.
data_b (Sequence[float]) – Second sample.
func (Literal['meanDiff', 'medianDiff', 'sumDiff']) – Test-statistic identifier — 'meanDiff', 'medianDiff', or 'sumDiff'.
combination_n (Union[int, str]) – 'all' for exhaustive permutation or an integer for random resampling draws.

p_value: Two-sided p-value after main() has been called.

original_test_result: Observed test statistic.

shuffled_results: Sorted null distribution of the statistic.

Example

>>> test = FisherResamplingTest([1, 2, 3], [7, 8, 9], 'meanDiff', 10000)
>>> p = test.main()
>>> p < 0.05
True

get_shuffled_indices()[source]

Build the combinatorial index sets via GetNofK.

Return type:: None

main()[source]

Execute the full resampling pipeline and return the p-value.

Steps:

Generate shuffled index sets.
Compute the observed test statistic.
Build the null distribution via bootstrap resampling.
Rank the observed statistic and derive a two-sided p-value.

Returns:: Two-sided p-value.
Return type:: float

bootstrap_resampling()[source]

Compute the test statistic for every reshuffled split.

Returns:: List of resampled test-statistic values.
Return type:: List[float]

calculate_test(data_a, data_b)[source]

Dispatch to the chosen test-statistic function.

Parameters:

data_a (Sequence[float]) – First sample.
data_b (Sequence[float]) – Second sample.

Returns:

Scalar test statistic.

Raises:

ValueError – If func is not recognised.

Return type:

float

Hypothesis Tests

┌──────────────────────────────────────────────────────────────────────┐ │ hypothesis_tests.py « Classical Hypothesis Test Wrapper » │ │ │ │ Thin dispatcher around scipy.stats for two-sample hypothesis │ │ tests. Supported tests: Mann-Whitney U, Kruskal-Wallis, │ │ Chi-square, Kolmogorov-Smirnov, Mood’s Median, Wilcoxon │ │ Rank-Sum, and the independent t-test. │ │ │ │ Author : Bart R.H. Geurten │ │ Licence: MIT │ └──────────────────────────────────────────────────────────────────────┘

class rerandomstats.hypothesis_tests.HypothesisTests(data_a, data_b, func, alternative='two-sided')[source]

Bases: object

Unified interface for two-sample hypothesis tests.

This class wraps several scipy.stats tests behind a single func string selector. It is consumed primarily by MultiGroupTest when the test family is 'hypo'.

Supported func values:

Parameters:

data_a (Sequence[float]) – First sample.
data_b (Sequence[float]) – Second sample.
func (str) – Name of the test to perform.
alternative (Literal['two-sided', 'less', 'greater']) – 'two-sided', 'less', or 'greater'.

Example

>>> ht = HypothesisTests([1, 2, 3], [5, 6, 7], 'MannWhitneyU')
>>> p = ht.main()

main()[source]

Run the selected hypothesis test and return its p-value.

Returns:: The p-value produced by the chosen test.
Raises:: ValueError – If func is not a recognised test name.
Return type:: float

Binomial Statistics

┌──────────────────────────────────────────────────────────────────────┐ │ binomial_stats.py « Binomial Proportion Tests » │ │ │ │ Single-sample binomial test with Wilson confidence intervals, │ │ and a two-sample proportions test (z-test or chi-square) for │ │ comparing success rates across groups. │ │ │ │ Author : Bart R.H. Geurten │ │ Licence: MIT │ └──────────────────────────────────────────────────────────────────────┘

class rerandomstats.binomial_stats.BinomialStats(heads, total_flips, alpha=0.05, alternative='two-sided')[source]

Bases: object

Single-sample binomial test with confidence intervals.

Wraps scipy.stats.binomtest() and statsmodels.stats.proportion.proportion_confint() (Wilson method) for quick proportion analysis.

Parameters:

heads (int) – Number of successes observed.
total_flips (int) – Total number of trials.
alpha (float) – Significance level (used for the CI).
alternative (Literal['two-sided', 'greater', 'less']) – 'two-sided', 'greater', or 'less'.

Example

>>> bs = BinomialStats(heads=80, total_flips=100)
>>> result = bs.binomial_test(base_rate=0.5)
>>> result.pvalue < 0.05
True

binomial_test(base_rate=0.5)[source]

Perform a binomial test against base_rate.

Parameters:: base_rate (float) – Expected success probability under H₀.
Returns:: scipy.stats.BinomTestResult object (access .pvalue for the p-value).

exact_ci()[source]

Compute a Wilson confidence interval for the proportion.

Returns:: Dictionary with keys 'Proportion', 'Lower CI', and 'Upper CI' (all as percentages).
Return type:: Dict[str, float]

class rerandomstats.binomial_stats.MultipleBinomialTests(data_a, data_b, func, alternative='two-sided')[source]

Bases: object

Two-sample proportions test (z-test or chi-square).

Compares the success proportions in two groups. Each group is specified as a tuple (successes, failures).

Parameters:

data_a (Tuple[int, int]) – (successes, failures) for group A.
data_b (Tuple[int, int]) – (successes, failures) for group B.
func (Literal['ztest', 'chi2']) – 'ztest' or 'chi2'.
alternative (Literal['two-sided', 'smaller', 'larger']) – 'two-sided', 'smaller', or 'larger'.

Example

>>> mbt = MultipleBinomialTests((30, 70), (50, 50), 'ztest')
>>> p = mbt.main()

main()[source]

Run the proportions comparison and return the p-value.

Returns:: The p-value. Returns 1.0 if the result is NaN (identical samples or missing data).
Raises:: ValueError – If func is unrecognised.
Return type:: float

Multi-Group Testing

┌──────────────────────────────────────────────────────────────────────┐ │ multi_group_test.py « Pairwise Multi-Group Comparisons » │ │ │ │ Runs all pairwise statistical tests across >2 groups, then │ │ applies multiple-testing correction (default: Benjamini-Hochberg │ │ FDR). Supports Fisher resampling, Fisher exact, classical │ │ hypothesis tests, and binomial proportion tests. │ │ │ │ Author : Bart R.H. Geurten │ │ Licence: MIT │ └──────────────────────────────────────────────────────────────────────┘

class rerandomstats.multi_group_test.MultiGroupTest(data, group, test, combination_n=10000, correction_type='fdr_bh', combination_set=())[source]

Bases: object

Pairwise multi-group testing with FDR correction.

Given parallel lists of data values and group labels the class performs all (or a user-specified subset of) pairwise comparisons, then adjusts p-values for multiplicity.

The test is selected via a 'family:name' string:

Family	Available names
`Fisher`	`medianDiff`, `meanDiff`, `sumDiff`, `exact`
`Binomial`	`ztest`, `chi2`
`hypo`	`MannWhitneyU`, `KruskalWallis`, `ChiSquare`, `Kolmogorov`, `MoodMedian`, `WilcoxonRankSum`, `IndependentT`

Parameters:

data (Sequence[float]) – Flat list of observed values.
group (Sequence[str]) – Corresponding group labels (same length as data).
test (str) – 'family:name' string.
combination_n (Union[int, str]) – Passed to FisherResamplingTest when the Fisher family is selected.
correction_type (str) – Any method accepted by statsmodels.stats.multipletests().
combination_set (Sequence[Tuple[str, str]]) – Optional list of (groupA, groupB) tuples restricting which pairs are tested.

df: pandas.DataFrame with results (available after main()).

Example

>>> import numpy as np
>>> data = list(np.random.randn(30))
>>> groups = ['A']*10 + ['B']*10 + ['C']*10
>>> mgt = MultiGroupTest(data, groups, 'hypo:MannWhitneyU', 1000)
>>> result = mgt.main()
>>> isinstance(result, pd.DataFrame)
True

rearrange_data()[source]

Group flat data / group lists into per-group sublists.

Return type:: None

main()[source]

Run all pairwise tests and return a summary DataFrame.

Returns:: DataFrame with columns groupA, groupA_n, groupB, groupB_n, p value, p value corrected, h, and sig. level.
Return type:: DataFrame

static get_significance_level(p_value)[source]

Convert a p-value to the conventional star notation.

Parameters:: p_value (float) – Corrected p-value.
Returns:: 'n.s.', '*', '**', or '***'.
Return type:: str

Case-Crossover Estimators (v0.2.0)

Time-stratified case-crossover conditional logit with stratified- permutation backup and Burke-2015 σ-rescaled effect translator.

Time-stratified case-crossover estimators with permutation backup.

A case-crossover design (Maclure 1991, Am J Epidemiol 133:144–153) estimates the within-stratum association between a transient exposure and a binary outcome by matching each case observation against a set of control observations drawn from the same subject / stratum at other times. The stratum intercepts are integrated out in the conditional likelihood, so all time-invariant subject characteristics are absorbed by construction.

The events-list interchange convention used throughout this module mirrors the ThermoStrife pipeline (Geurten 2026, https://doi.org/10.5281/zenodo.20371612). Each event is a dict with the following keys:

event_id (str) — stratum identifier.
lat (float), lon (float) — only used by the daylight covariate and the within-event-contrast fetcher; pass dummy values if these are not needed.
when (datetime.date) — event-day timestamp.
tmax_event_c (float) — case-row exposure value.
baseline (pandas.DataFrame) — one row per control day, index = date, with a column named tmax carrying the control-day exposure value. The column is named tmax for historical reasons in the ThermoStrife data model; the value may be any continuous exposure variable in the caller’s units.

The case-crossover machinery is exposure-agnostic: tmax_c is literally just the column name. The functions in this module make no biological assumption about what that column represents.

rerandomstats.case_crossover.daylight_hours(lat_deg, when)[source]

Closed-form solar daylight hours for when at latitude lat_deg.

Uses the standard astronomical approximation (sunrise / sunset by solar declination, ignoring atmospheric refraction and astronomical twilight). Accurate to ~10 min outside the polar circles, which is well inside the noise of typical daily exposure inference and avoids a hard dependency on astral or pysolar.

Parameters:

lat_deg (float) – Latitude in decimal degrees (positive north).
when (date) – Calendar date.

Returns:

Daylight in hours; clamped to 0.0 at polar night and 24.0 at polar day.

Return type:

float

rerandomstats.case_crossover.build_case_crossover_frame(events)[source]

Stack per-event case + control rows into one long DataFrame.

Events with no baseline rows are silently skipped. Events with a None tmax_event_c produce a case row whose tmax_c is NaN; downstream estimators will drop those.

Parameters:: events (list[dict]) – List of event dicts conforming to the module-level interchange convention (see module docstring).
Returns:: Long DataFrame with columns [event_id, day, is_case, tmax_c, daylight_h], one row per (event, day) pair. Cases get is_case = 1, controls get is_case = 0.
Return type:: DataFrame

rerandomstats.case_crossover.case_crossover_conditional_logit(frame, *, covariates=None)[source]

Conditional logistic regression on matched case-control sets.

Fits logit P(case | event_id) = β · tmax_c + γ · covariates via statsmodels.discrete.conditional_models.ConditionalLogit. The stratum intercepts are integrated out by the conditional likelihood, so all time-invariant within-stratum characteristics are absorbed by construction. exp(β) is the odds ratio of case-vs-control per +1 unit of the exposure variable.

Parameters:

frame (DataFrame) – Output of build_case_crossover_frame() (or a DataFrame with the same column conventions).
covariates (list[str] | None) – Extra columns to include alongside tmax_c. Defaults to ["daylight_h"].

Returns:

Dict with the headline estimates and metadata. Carries a {"skipped": True, "reason": str} marker if the frame is empty or no stratum retains a case row after NA filtering.

Return type:

dict

rerandomstats.case_crossover.stratified_permutation(frame, n_perm=10000, rng=None)[source]

Within-event label-shuffle test on the mean case-vs-control gap.

Under the null, the case-day exposure is exchangeable with the baseline-day exposures within the same stratum. For each event, the test shuffles the case label across (case + controls), computes (case-exposure) minus (mean of control-exposures), and averages over events. The empirical permutation distribution of this statistic gives a distribution-free backup for case_crossover_conditional_logit().

Parameters:

frame (DataFrame) – Output of build_case_crossover_frame().
n_perm (int) – Number of permutation draws. Default 10 000.
rng (Generator | None) – Optional NumPy Generator. Default None → fresh non-deterministic RNG. Pass an explicit seeded generator for reproducibility.

Returns:

Dict carrying observed_diff_C, pvalue_one_sided, pvalue_two_sided, n_events, and n_perm. Returns a {"skipped": True} marker on empty frames or when no event has exactly one case row.

Return type:

dict

rerandomstats.case_crossover.stratify_case_crossover(events, key_fn, *, min_events=5, covariates=None)[source]

Split events into subgroups and re-fit the case-crossover per subgroup.

Parameters:

events (list[dict]) – List of event dicts (interchange convention per module docstring).
key_fn (Callable[[dict], str | None]) – Callable mapping an event to a stratum label. Events for which key_fn returns None are filtered out rather than bucketed under a "None" label.
min_events (int) – Strata with fewer events than this threshold are not fit; the result dict carries a {"skipped": True, "reason": ...} marker for them so the caller can still surface them in reports.
covariates (list[str] | None) – Forwarded to case_crossover_conditional_logit(). Defaults to [] rather than ["daylight_h"] because the daylight covariate often becomes collinear within small strata.

Returns:

Dict mapping stratum label to that stratum’s case_crossover_conditional_logit() result, with an extra n_events key on every entry.

Return type:

dict

rerandomstats.case_crossover.event_anomaly_profile(events, fetch_fn, offsets=(-7, -2, -1, 0, 1, 2, 7))[source]

Fetch per-event exposure at each offset and return as a long frame.

For each event, looks up the exposure on day when + offset via the caller-supplied fetch_fn, subtracts the event’s baseline-window mean to get an anomaly, and stacks the rows. Offsets that the fetcher returns None for are silently dropped — downstream consumers (e.g. superposed-epoch plots) handle the resulting ragged n per offset.

Parameters:

events (list[dict]) – List of event dicts. Each must additionally provide provenance and optionally station_id, which are forwarded to fetch_fn.
fetch_fn (Callable[[...], float | None]) – Callable with signature fetch_fn(provenance, lat, lon, day, *, station_id) returning either a float (the exposure value) or None (no data for that day). This is the seam through which the caller’s own data backend is plugged in.
offsets (tuple[int, ...]) – Day offsets relative to the event day. Default (-7, -2, -1, 0, 1, 2, 7) mirrors the superposed-epoch composite used in Geurten (2026) ThermoStrife v0.1.1.

Returns:

Long DataFrame with columns event_id, offset_days, anomaly_C. Empty DataFrame if no event resolves any offset.

Return type:

DataFrame

rerandomstats.case_crossover.h3_within_event_contrast(events, fetch_fn, *, window_offsets=(0, -1), surround_offsets=(-7, 7), n_boot=10000, rng=None)[source]

Per-event paired contrast: mean anomaly in window vs surround offsets.

For each event, fetches exposure on day t + offset for every offset in window_offsets ∪ surround_offsets via fetch_fn, converts to an anomaly by subtracting the existing baseline mean, then computes:

diff = mean(anomaly @ window_offsets) − mean(anomaly @ surround_offsets)

Across-event distribution of diff is tested with a one-sided Wilcoxon signed-rank against H₁: median > 0. A bootstrap 95 % CI on the mean is also returned.

Parameters:

events (list[dict]) – Event dicts (interchange convention; must provide provenance and optionally station_id for the fetcher).
fetch_fn (Callable[[...], float | None]) – Same signature as in event_anomaly_profile().
window_offsets (tuple[int, ...]) – Offsets that should sit AT or NEAR the event day under the hypothesis. Default (0, -1) (event day + day-before).
surround_offsets (tuple[int, ...]) – Offsets that should sit AWAY from the event day. Default (-7, 7).
n_boot (int) – Bootstrap draws for the mean CI.
rng (Generator | None) – Optional np.random.Generator.

Returns:

Dict with mean_diff_C, median_diff_C, ci95_low_C, ci95_high_C, wilcoxon_statistic, pvalue_one_sided, n_events_used, and n_events_skipped. Returns {"skipped": True, ...} if fewer than 5 events resolve all required offsets, or if every diff is exactly zero (Wilcoxon undefined).

Return type:

dict

rerandomstats.case_crossover.hsiang_sigma_rescaled(events, n_boot=10000, rng=None)[source]

Per-event z-score (event − baseline_mean) / baseline_std, averaged.

Following Burke, Hsiang & Miguel (2015, Ann Rev Econ 7:577–617), expressing each event’s exposure anomaly as a multiple of its own baseline-window standard deviation makes effects comparable across stations, climates, and study designs. Their headline meta-analytic number for interpersonal violence is +2.4 % per 1 σ contemporaneous warming.

Parameters:

events (list[dict]) – Event dicts (interchange convention). Events without a usable baseline (< 2 rows or zero standard deviation) are silently dropped.
n_boot (int) – Bootstrap draws for the CI.
rng (Generator | None) – Optional np.random.Generator.

Returns:

Dict with mean_z, median_z, ci95_low_z, ci95_high_z, fraction_positive, and n_events. Returns {"skipped": True} if no event has a usable baseline.

Return type:

dict

Model Comparison (v0.2.0)

Wald two-sample-β test, nested-model likelihood-ratio test, and the shared-algorithmic-source p-value correction helpers (correct_pvalues / correct_pvalues_array / benjamini_hochberg) routing through statsmodels.stats.multitest.multipletests.

Comparing fitted coefficients and correcting heterogeneous test batteries.

The four functions in this module operate on already-fitted models or already-computed p-values — they do NOT take raw data. For raw-data workflows that combine data ingestion, pairwise testing, and correction into one call, use rerandomstats.MultiGroupTest.

This separation matters when the test battery is heterogeneous (e.g. a case-crossover conditional logit, a Poisson IRR, a Wald two-sample-β contrast, and a likelihood-ratio test, all in the same auxiliary battery) and the corresponding p-values cannot all be produced by a single pairwise-comparisons pipeline. In that case the caller runs each test individually, collects the p-values into a dict, and passes the dict to correct_pvalues() (for a single method) or benjamini_hochberg() (for the BH + Bonferroni dual-method report) for correction.

Algorithmic-source guarantee. Both correct_pvalues() and benjamini_hochberg() delegate the underlying correction arithmetic to statsmodels.stats.multipletests; the same call is used internally by rerandomstats.MultiGroupTest. There is no parallel implementation of BH / Bonferroni / Holm / Sidak / FDR-BY in this package — every code path that adjusts p-values converges on statsmodels’ canonical implementation.

rerandomstats.model_comparison.wald_two_sample_beta(beta_a, se_a, beta_b, se_b, *, alternative='two-sided', name_a='a', name_b='b')[source]

Wald z-test on the difference between two estimated coefficients.

Tests H0: β_a = β_b (or directional variants) using

z = (β_a − β_b) / sqrt(SE(β_a)² + SE(β_b)²)

This is the right test when you have two independently-fitted estimates of related coefficients — e.g. a case-crossover OR fitted on a hot-host subsample vs the same estimator fitted on a cool-host subsample, or a Poisson IRR for an aggression-set outcome vs the same estimator on a non-aggression outcome. The formula assumes the two estimates are independent (no overlap in the data they were fit on); for nested models on the same data, use likelihood_ratio_test() instead.

Parameters:

beta_a (float) – Estimated coefficient from subsample / model A.
se_a (float) – Standard error of beta_a.
beta_b (float) – Estimated coefficient from subsample / model B.
se_b (float) – Standard error of beta_b.
alternative (Literal['two-sided', 'greater', 'less']) – "two-sided" (default) tests β_a ≠ β_b. "greater" tests β_a > β_b. "less" tests β_a < β_b.
name_a (str) – Optional label for the A side, surfaced in the output dict. Defaults to "a".
name_b (str) – Optional label for the B side. Defaults to "b".

Returns:

Dict with the input estimates, the difference and its SE, the Wald z statistic, the requested-tail p-value, and a 95 % confidence interval on the difference.

Raises:

ValueError – if alternative is not one of the three accepted strings.

Return type:

dict

rerandomstats.model_comparison.likelihood_ratio_test(model_full, model_reduced)[source]

Likelihood-ratio test of a full vs reduced nested fit.

Computes:

LR = 2 · (log L_full − log L_reduced)
p  = 1 − F_chi²(LR; df_full − df_reduced)

Both arguments may be any objects exposing .llf (log-likelihood) and .df_model (number of estimated parameters) attributes — the convention used by statsmodels result objects, which covers ConditionalLogit, Poisson, Logit, OLS, and most others.

Use this for nested model comparisons on the same data: e.g. testing whether adding an interaction term to a case-crossover model significantly improves fit, or whether a tournament-edition × anomaly interaction is needed beyond the main-effect model. For comparing two estimates of the SAME coefficient fitted on DIFFERENT data, use wald_two_sample_beta() instead.

Parameters:

model_full (Any) – Fitted result object for the larger (nesting) model.
model_reduced (Any) – Fitted result object for the smaller (nested) model.

Returns:

Dict with the LR statistic, the degrees-of-freedom difference, the chi² p-value, and both log-likelihoods.

Raises:

ValueError – if model_full does not have strictly more parameters than model_reduced.

Return type:

dict

rerandomstats.model_comparison.correct_pvalues(pvalues, *, alpha=0.05, method='fdr_bh')[source]

Adjust a heterogeneous battery of p-values via one correction method.

Thin wrapper around statsmodels.stats.multitest.multipletests. This is the shared algorithmic core for every multi-comparison correction in reRandomStats: both benjamini_hochberg() (the dual-method convenience helper) and the rerandomstats.MultiGroupTest pipeline ultimately route through the same statsmodels call. No parallel implementation of the underlying math exists in this package.

Use this directly when you want a single correction method (e.g. just BH FDR or just Holm) and need a tidy per-test result dict rather than DataFrame columns.

Parameters:

pvalues (dict[str, float]) – Dict mapping test name → raw p-value. Keys are carried verbatim through to the output results dict. Insertion order is preserved.
alpha (float) – Family-wise significance level. Default 0.05.
method (str) – Any method accepted by statsmodels.stats.multitest.multipletests: e.g. "fdr_bh" (Benjamini–Hochberg step-up, the default), "fdr_by" (Benjamini–Yekutieli), "bonferroni", "holm", "hommel", "sidak", "holm-sidak", "simes-hochberg", "hommel", "fdr_tsbh", "fdr_tsbky".

Returns:

Dict with top-level keys family_size, alpha, method, n_rejected, and results. results is a per-test dict carrying raw_p, rank (1 = smallest raw p-value), adjusted_p, and reject (bool). Empty input returns {"family_size": 0, "alpha": alpha, "method": method, "n_rejected": 0, "results": {}}.

Return type:

dict

rerandomstats.model_comparison.correct_pvalues_array(pvalues, *, alpha=0.05, method='fdr_bh')[source]

Adjust a NumPy array of p-values, preserving NaN entries.

Array-in / array-out wrapper around the same shared statsmodels.stats.multitest.multipletests call that correct_pvalues() uses. Designed as a one-line drop-in replacement for project-specific BH/Bonferroni helpers that take and return arrays — e.g. digimuh.stats_core.benjamini_hochberg — so refactors converging on reRandomStats are a single import change.

NaN entries in the input are preserved at the same positions in the output and are excluded from the correction (the family size used by statsmodels is the count of finite p-values, matching the DigiMuh convention).

Parameters:

pvalues (ndarray) – 1-D array of raw p-values. May contain NaN.
alpha (float) – Family-wise significance level. Default 0.05.
method (str) – Any method accepted by statsmodels.stats.multitest.multipletests. Default "fdr_bh".

Returns:

Array of adjusted p-values, same shape and dtype as input, with NaN entries preserved. Empty input returns an empty array.

Return type:

ndarray

rerandomstats.model_comparison.benjamini_hochberg(pvalues, alpha=0.05)[source]

Dual-method BH + Bonferroni report for a heterogeneous battery.

Convenience wrapper around correct_pvalues(): runs the correction twice (method="fdr_bh" and method="bonferroni") and merges the per-test results so reports can show BH q-values AND Bonferroni-adjusted p-values side-by-side. The BH FDR is typically the primary (less conservative for correlated batteries) with Bonferroni as a conservative reference.

Both corrections are computed via the same statsmodels.stats.multitest.multipletests call (one per method), so the underlying math has exactly one implementation.

Parameters:

pvalues (dict[str, float]) – Dict mapping test name → raw p-value.
alpha (float) – Target false-discovery rate. Default 0.05.

Returns:

Dict with per-test raw_p, rank, bh_adjusted_p, bh_threshold (the per-rank BH cut (rank/k)·alpha), bh_reject, bonferroni_adjusted_p, and bonferroni_reject. Top-level keys also expose family_size, alpha, bonferroni_threshold (the per-test alpha/k cut), bh_cutoff_rank (largest rank that passes BH), and n_bh_rejected / n_bonferroni_rejected counts. Empty input returns {"family_size": 0, "alpha": alpha, "results": {}}.

Return type:

dict

Dose-Response and Breakpoint Analysis (v0.2.0)

Broken-stick segmented regression with profile-RSS 95 % CI on the breakpoint, the Davies (1987 / 2002) and Muggeo (2016) Pseudo-Score breakpoint-existence tests, the 4-parameter Hill / logistic fit with Sebaugh–McCray lower-bend point, and a per-subject iterator that applies any of the four to a panel of subjects.

Breakpoint and sigmoid dose-response fitting for two-variable data.

All five functions in this module take x / y arrays (or a DataFrame for per_subject_segmented()) and return either a dict (the four primary fitters) or a DataFrame (the per-subject iterator). No I/O, no side effects, no project-specific assumptions.

Skip-marker convention. When the input is too small to fit (n < 30 for the primary fitters, n < min_n for the per-subject iterator), the result dict carries converged=False and n=<actual sample size>, plus the primary parameter (e.g. breakpoint or ec50) as np.nan. This is the DigiMuh-derived convention; callers should test converged before accessing fit-specific fields.

References.

Davies RB (1987) Biometrika 74:33-43.
Davies RB (2002) Biometrika 89:484-489.
Goutelle S, Maurin M, Rougier F, Barbaut X, Bourguignon L, Ducher M, Maire P (2008) Fundamental & Clinical Pharmacology 22:633-648.
Hill AV (1910) Journal of Physiology 40:iv-vii.
Muggeo VMR (2003) Statistics in Medicine 22:3055-3071.
Muggeo VMR (2016) Journal of Statistical Computation and Simulation 86:3059-3067.
Sebaugh JL, McCray PD (2003) Pharmaceutical Statistics 2:167-174.

rerandomstats.dose_response.broken_stick_fit(x, y, x_range=None, n_grid=200)[source]

Fit a two-segment piecewise-linear model with one breakpoint.

Model: y = α_below + β_below·x for x ≤ ψ; y = α_above + β_above·x for x > ψ. Biological constraint enforced: slope_above > slope_below and slope_above > 0 (the “flat-ish below, steeper above” pattern typical of threshold physiological responses). Fits that violate this constraint are returned with converged=False and a rejected_reason string.

The breakpoint is located by grid search over n_grid candidates in x_range, refined by scipy.optimize.minimize_scalar on the residual sum of squares. A profile-RSS 95 % confidence interval on the breakpoint is computed from the grid-evaluated RSS profile using the F-distribution threshold for a one-parameter constraint:

CI = {ψ : RSS(ψ) ≤ RSS_min · (1 + F_{1, df_resid; 0.95} / df_resid)}

Threshold crossings are linearly interpolated between adjacent grid points; CIs that hit the search bounds are flagged with breakpoint_ci_truncated=True.

Parameters:

x (ndarray) – Predictor array. NaN entries dropped.
y (ndarray) – Response array. Must align with x.
x_range (tuple[float, float] | None) – Search bounds for the breakpoint. Default (percentile_10, percentile_90) of x.
n_grid (int) – Grid resolution for the initial breakpoint search. Default 200.

Returns:

Dict with the breakpoint, both segment slopes and intercepts, r_squared, n, n_below, n_above, converged, breakpoint_ci_lo, breakpoint_ci_hi, breakpoint_se (derived from half-CI-width / 1.96), breakpoint_ci_truncated, and rejected_reason (None if the fit converged).

Return type:

dict

References

Muggeo VMR (2003) Estimating regression models with unknown break-points. Statistics in Medicine 22:3055-3071.

rerandomstats.dose_response.davies_test(x, y, k=10, x_range=None)[source]

Davies (1987 / 2002) test for the existence of a breakpoint.

Tests H0 (the relationship is a single straight line, no breakpoint) vs H1 (there is an unknown breakpoint where the slope changes). The procedure evaluates k candidate breakpoints, computes a naive t-statistic for the slope-difference at each, takes the maximum, and corrects the p-value for the search using Davies’ upper bound:

p ≤ Pr(|T| > M) + V · φ(M)

where V is the number of sign changes in the t-statistic sequence and φ is the standard-normal (or t) density.

Parameters:

x (ndarray) – Predictor array. NaN entries dropped.
y (ndarray) – Response array. Must align with x.
k (int) – Number of evaluation points. Default 10 (matches the R segmented package default).
x_range (tuple[float, float] | None) – Search range for candidate breakpoints. Default (percentile_5, percentile_95) of x.

Returns:

Dict with pvalue, best_at (candidate with strongest signal), max_statistic, n_sign_changes, and n.

Return type:

dict

References

Davies RB (1987) Biometrika 74:33-43. Davies RB (2002) Biometrika 89:484-489.

rerandomstats.dose_response.pscore_test(x, y, k=10, x_range=None)[source]

Pseudo-Score test for breakpoint existence (Muggeo 2016).

Typically more powerful than the Davies test for detecting a single changepoint. Averages the segmented variable over k candidate breakpoints:

φ̄(x) = (1/k) · Σ_j max(x - ψ_j, 0)

and tests whether γ is significant in the augmented linear model y = α + β·x + γ·φ̄. The key insight: under H0 (no breakpoint) γ = 0; under H1 γ picks up the slope change regardless of the true breakpoint location, sidestepping the nuisance-parameter problem that Davies’ procedure handles via upper-bound correction.

Parameters:

x (ndarray) – Predictor array. NaN entries dropped.
y (ndarray) – Response array. Must align with x.
k (int) – Number of evaluation points. Default 10.
x_range (tuple[float, float] | None) – Search range for candidate breakpoints. Default (percentile_5, percentile_95) of x.

Returns:

Dict with pvalue, t_statistic, best_at (candidate ψ with the strongest INDIVIDUAL signal, for reporting only — not the test statistic itself), df, and n.

Return type:

dict

References

Muggeo VMR (2016) Testing with a nuisance parameter present only under the alternative: a score-based approach with application to segmented modelling. J Stat Comput Simul 86:3059-3067.

rerandomstats.dose_response.hill_fit(x, y, x_range=None)[source]

Fit a four-parameter logistic (Hill) dose-response model.

Model:

y = y_min + (y_max - y_min) / (1 + (EC50 / x)^n)

EC50 is the predictor value at half-maximal response. The Hill coefficient n (hill_n in the return dict) captures steepness: large n = sharp threshold-like switch, small n = gradual transition.

In addition to EC50, the lower bend point following Sebaugh & McCray (2003) is returned: this is the x-value where the second derivative of the Hill curve equals zero on the lower side, marking the transition from the baseline plateau into the rising phase:

x_bend_lower = EC50 · ((n - 1) / (n + 1))^(1/n)    for n > 1

For n ≤ 1 (no inflection in the lower portion), falls back to EC10 as a conventional onset marker. The lower bend is directly comparable to a broken-stick breakpoint — both mark the onset of the rising regime — and is the recommended single-number “where does the response start” summary for sigmoid dose-response data.

Parameters:

x (ndarray) – Predictor array. NaN entries dropped.
y (ndarray) – Response array. Must align with x.
x_range (tuple[float, float] | None) – Plausible range for the EC50 initial guess and bounds. Default (percentile_10, percentile_90) of x.

Returns:

Dict with ec50, hill_n, y_min, y_max, lower_bend (np.nan if outside the data range), r_squared, aic, n, converged, and bend_plausible.

Return type:

dict

References

Goutelle S et al. (2008) The Hill equation: a review of its capabilities in pharmacological modelling. Fundamental & Clinical Pharmacology 22:633-648. Hill AV (1910) The possible effects of the aggregation of the molecules of haemoglobin on its dissociation curves. Journal of Physiology 40:iv-vii. Sebaugh JL, McCray PD (2003) Defining the linear portion of a sigmoid-shaped curve: bend points. Pharmaceutical Statistics 2:167-174.

rerandomstats.dose_response.per_subject_segmented(df, subject_col, x_col, y_col, *, model=<function broken_stick_fit>, min_n=50, model_kwargs=None)[source]

Apply a dose-response fitter to each subject’s data.

For every unique value (or tuple, for composite keys) in df[subject_col], fit model(x, y, **model_kwargs) where x and y are the subject’s rows of df[x_col] and df[y_col]. Subjects with fewer than min_n observations are returned with the model’s standard skip marker (converged=False plus the primary parameter as np.nan).

Pickle-safe: the model callable must be a module-level function so this iterator can be wrapped in a concurrent.futures.ProcessPoolExecutor for parallel fitting over large panels. All four primary fitters in this module (broken_stick_fit, davies_test, pscore_test, hill_fit) qualify.

Parameters:

df (DataFrame) – Long-format DataFrame with one row per observation.
subject_col (str | list[str]) – Column name (or list of column names for a composite key) identifying subjects.
x_col (str) – Column holding the predictor.
y_col (str) – Column holding the response.
model (Callable[[...], dict]) – Fitter callable taking (x, y, **kwargs) and returning a result dict. Defaults to broken_stick_fit().
min_n (int) – Minimum observations per subject. Subjects below this threshold get a skip-marker row. Default 50.
model_kwargs (dict[str, Any] | None) – Extra keyword arguments forwarded to model.

Returns:

the subject key column(s) plus all keys returned by model. Subjects with n < min_n carry converged=False and n=<actual>.

Return type:

DataFrame with one row per subject. Columns

Resampling Index Generator

┌──────────────────────────────────────────────────────────────────────┐ │ resample_n_of_k.py « Combinatorial Index Sampler » │ │ │ │ Generates index tuples for Fisher-style resampling. Supports │ │ exhaustive enumeration of all C(N, k) combinations as well as │ │ random sampling with or without uniqueness constraints. │ │ │ │ Author : Bart R.H. Geurten │ │ Licence: MIT │ └──────────────────────────────────────────────────────────────────────┘

class rerandomstats.resample_n_of_k.GetNofK(data_set_a, data_set_b, combination_n, mode='resampling', max_len_possible_4_perms=10, resampling_n=100000)[source]

Bases: object

Generate index-pair tuples for resampling two data sets.

Given two data sets A and B the class combines them, then produces pairs of index tuples (indices_A, indices_B) where indices_A has the same length as the shorter set and indices_B is its complement.

Parameters:

data_set_a (Sequence) – First data set.
data_set_b (Sequence) – Second data set.
combination_n (Union[int, str]) – 'all' for exhaustive enumeration or an integer specifying the number of random draws.
mode (Literal['combinations', 'resampling', 'resample_unique']) – 'combinations' for exhaustive, 'resampling' for random (non-unique), 'resample_unique' for random with uniqueness guarantee.
max_len_possible_4_perms (int) – If the shorter set exceeds this length, exhaustive mode falls back to resampling.
resampling_n (int) – Default number of random draws when combination_n triggers resampling mode.

combined_data: Concatenation of both data sets.

data_indices: List of (indices_A, indices_B) tuples produced by main().

combination_n: Actual number of combinations generated.

get_all_combinations()[source]

Return all C(N, k) unique index tuples.

Returns:: Exhaustive list of index tuples of length short_len drawn from range(combined_len).
Return type:: List[Tuple[int, …]]

get_unique_random_combinations()[source]

Return resampling_n unique random index tuples.

If the target count cannot be reached after 10 × resampling_n attempts the method returns whatever unique tuples it managed to collect.

Returns:: List of unique random index tuples.
Return type:: List[Tuple[int, …]]

get_random_combinations()[source]

Return resampling_n random index tuples (may contain duplicates).

Returns:: List of random index tuples.
Return type:: List[Tuple[int, …]]

complement_indices()[source]

Build (indices_A, complement) pairs from combinations.

Populates data_indices so that each entry contains the sampled indices and their complement within the combined data.

Return type:: None

main()[source]

Generate combinations and their complements.

Dispatches to the appropriate combination generator based on mode, then builds complement index pairs.

Return type:: None

get_shuffled_set(combi_i)[source]

Retrieve the combi_i-th shuffled split of the data.

Parameters:: combi_i (int) – Index into data_indices.
Returns:: A tuple (shuffled_A, shuffled_B) of data subsets.
Return type:: Tuple[List, List]

Data I/O

┌──────────────────────────────────────────────────────────────────────┐ │ data_io.py « CSV Data Ingestion Utilities » │ │ │ │ Reads wide-format CSV files (including German-locale semicolon- │ │ delimited variants), converts them to NumPy matrices, and │ │ reshapes wide tables into long (value, id) format suitable for │ │ statistical analysis. │ │ │ │ Author : Bart R.H. Geurten │ │ Licence: MIT │ └──────────────────────────────────────────────────────────────────────┘

class rerandomstats.data_io.DataIO(german_csv=False)[source]

Bases: object

CSV reader and wide→long table converter.

Handles both standard comma-separated and German-locale semicolon-separated CSV files.

Parameters:: german_csv (bool) – If True, treat semicolons as delimiters (common in German Excel exports).

raw_data: Raw rows as read from the CSV.

Example

>>> dio = DataIO()
>>> ids, vals = dio.wide_table_csv_to_long_table('data.csv')

read_csv(file_path)[source]

Read a CSV file into raw_data.

Parameters:: file_path (str | Path) – Path to the CSV file.
Return type:: None

make_square_np_matrix(values)[source]

Build a rectangular NumPy float matrix from row data.

Empty cells are filled with np.nan.

Parameters:: values (List) – List of rows (each row a list of strings or a semicolon-separated string when german_csv is set).
Returns:: 2-D numpy.ndarray of shape (nrows, ncols).
Return type:: ndarray

split_csv_headers()[source]

Return the column headers from the first CSV row.

Returns:: List of header strings.
Return type:: List[str]

wide_table_to_value_id_list(values, col_header)[source]

Convert a wide matrix to parallel id / value lists.

Non-NaN cells are unpacked column-wise, tagging each value with its column header.

Parameters:

values (ndarray) – 2-D NumPy array.
col_header (Sequence[str]) – Column names (length must match values.shape[1]).

Returns:

Tuple (id_list, value_list).

Return type:

Tuple[List[str], List[float]]

wide_table_csv_to_long_table(file_path)[source]

Read a wide-format CSV and return long-format id/value lists.

Convenience wrapper that chains read_csv(), make_square_np_matrix(), split_csv_headers(), and wide_table_to_value_id_list().

Parameters:: file_path (str | Path) – Path to the wide-format CSV.
Returns:: Tuple (id_list, value_list).
Return type:: Tuple[List[str], List[float]]

static get_subset_of_data(id_list, value_list, id_subset)[source]

Filter parallel id/value lists to a subset of ids.

Parameters:

id_list (Sequence[str]) – Full list of identifiers.
value_list (Sequence[float]) – Full list of values.
id_subset (Sequence[str]) – Identifiers to keep.

Returns:

Filtered (id_list, value_list) tuple.

Return type:

Tuple[List[str], List[float]]

Example

>>> DataIO.get_subset_of_data(
...     ['a', 'b', 'c'], [1, 2, 3], ['a', 'c']
... )
(['a', 'c'], [1, 3])

Pretty Table Output

┌──────────────────────────────────────────────────────────────────────┐ │ pretty_table.py « Console-Friendly Table Formatter » │ │ │ │ Renders a pandas DataFrame as a PrettyTable for quick terminal │ │ inspection or plain-text file export. │ │ │ │ Author : Bart R.H. Geurten │ │ Licence: MIT │ └──────────────────────────────────────────────────────────────────────┘

rerandomstats.pretty_table.write_pretty_table(df, file_path='', write=False, show=True)[source]

Render a DataFrame as an ASCII table.

Parameters:

df (DataFrame) – The DataFrame to render.
file_path (str | Path) – Destination file when write is True.
write (bool) – If True, write the table to file_path.
show (bool) – If True, print the table to stdout.

Returns:

The PrettyTable object.

Return type:

PrettyTable