API Reference
Fisher’s Exact Test
┌──────────────────────────────────────────────────────────────────────┐ │ fisher_exact.py « Fisher’s Exact Test » │ │ │ │ Wrapper around scipy.stats.fisher_exact for 2×2 contingency │ │ tables. Returns the p-value for the null hypothesis that the │ │ odds ratio of the underlying population equals one. │ │ │ │ Author : Bart R.H. Geurten │ │ Licence: MIT │ └──────────────────────────────────────────────────────────────────────┘
- class rerandomstats.fisher_exact.FisherExactTest(data_a, data_b, alternative='two-sided')[source]
Bases:
objectPerform Fisher’s exact test on a 2×2 contingency table.
The two input tuples each contain two counts (option_A, option_B). For example
(alive, dead)for treatment and control groups.- Parameters:
Example
>>> test = FisherExactTest((8, 2), (1, 5)) >>> p = test.main() >>> p < 0.05 True
- main()[source]
Run Fisher’s exact test and return the p-value.
Constructs the 2×2 contingency table from data_a and data_b, delegates to
scipy.stats.fisher_exact(), and stores / returns the resulting p-value.- Returns:
The p-value of Fisher’s exact test.
- Return type:
Fisher’s Resampling Test
┌──────────────────────────────────────────────────────────────────────┐ │ fisher_resampling.py « Fisher’s Resampling Test » │ │ │ │ Implements Sir Ronald Fisher’s re-randomisation test for │ │ comparing two independent samples. The observed test statistic │ │ (mean, median, or sum difference) is ranked against a null │ │ distribution built by combinatorial or random reshuffling. │ │ │ │ Author : Bart R.H. Geurten │ │ Licence: MIT │ └──────────────────────────────────────────────────────────────────────┘
- class rerandomstats.fisher_resampling.FisherResamplingTest(data_a, data_b, func, combination_n=10000)[source]
Bases:
objectTwo-sample resampling test in the tradition of R.A. Fisher.
The observed difference between two groups (by mean, median, or sum) is ranked against a null distribution constructed by reshuffling group labels.
- Parameters:
data_a (Sequence[float]) – First sample.
data_b (Sequence[float]) – Second sample.
func (Literal['meanDiff', 'medianDiff', 'sumDiff']) – Test-statistic identifier —
'meanDiff','medianDiff', or'sumDiff'.combination_n (Union[int, str]) –
'all'for exhaustive permutation or an integer for random resampling draws.
- original_test_result
Observed test statistic.
- shuffled_results
Sorted null distribution of the statistic.
Example
>>> test = FisherResamplingTest([1, 2, 3], [7, 8, 9], 'meanDiff', 10000) >>> p = test.main() >>> p < 0.05 True
- main()[source]
Execute the full resampling pipeline and return the p-value.
- Steps:
Generate shuffled index sets.
Compute the observed test statistic.
Build the null distribution via bootstrap resampling.
Rank the observed statistic and derive a two-sided p-value.
- Returns:
Two-sided p-value.
- Return type:
Hypothesis Tests
┌──────────────────────────────────────────────────────────────────────┐ │ hypothesis_tests.py « Classical Hypothesis Test Wrapper » │ │ │ │ Thin dispatcher around scipy.stats for two-sample hypothesis │ │ tests. Supported tests: Mann-Whitney U, Kruskal-Wallis, │ │ Chi-square, Kolmogorov-Smirnov, Mood’s Median, Wilcoxon │ │ Rank-Sum, and the independent t-test. │ │ │ │ Author : Bart R.H. Geurten │ │ Licence: MIT │ └──────────────────────────────────────────────────────────────────────┘
- class rerandomstats.hypothesis_tests.HypothesisTests(data_a, data_b, func, alternative='two-sided')[source]
Bases:
objectUnified interface for two-sample hypothesis tests.
This class wraps several
scipy.statstests behind a singlefuncstring selector. It is consumed primarily byMultiGroupTestwhen the test family is'hypo'.Supported func values:
- Parameters:
Example
>>> ht = HypothesisTests([1, 2, 3], [5, 6, 7], 'MannWhitneyU') >>> p = ht.main()
- main()[source]
Run the selected hypothesis test and return its p-value.
- Returns:
The p-value produced by the chosen test.
- Raises:
ValueError – If
funcis not a recognised test name.- Return type:
Binomial Statistics
┌──────────────────────────────────────────────────────────────────────┐ │ binomial_stats.py « Binomial Proportion Tests » │ │ │ │ Single-sample binomial test with Wilson confidence intervals, │ │ and a two-sample proportions test (z-test or chi-square) for │ │ comparing success rates across groups. │ │ │ │ Author : Bart R.H. Geurten │ │ Licence: MIT │ └──────────────────────────────────────────────────────────────────────┘
- class rerandomstats.binomial_stats.BinomialStats(heads, total_flips, alpha=0.05, alternative='two-sided')[source]
Bases:
objectSingle-sample binomial test with confidence intervals.
Wraps
scipy.stats.binomtest()andstatsmodels.stats.proportion.proportion_confint()(Wilson method) for quick proportion analysis.- Parameters:
Example
>>> bs = BinomialStats(heads=80, total_flips=100) >>> result = bs.binomial_test(base_rate=0.5) >>> result.pvalue < 0.05 True
- class rerandomstats.binomial_stats.MultipleBinomialTests(data_a, data_b, func, alternative='two-sided')[source]
Bases:
objectTwo-sample proportions test (z-test or chi-square).
Compares the success proportions in two groups. Each group is specified as a tuple
(successes, failures).- Parameters:
Example
>>> mbt = MultipleBinomialTests((30, 70), (50, 50), 'ztest') >>> p = mbt.main()
- main()[source]
Run the proportions comparison and return the p-value.
- Returns:
The p-value. Returns
1.0if the result is NaN (identical samples or missing data).- Raises:
ValueError – If
funcis unrecognised.- Return type:
Multi-Group Testing
┌──────────────────────────────────────────────────────────────────────┐ │ multi_group_test.py « Pairwise Multi-Group Comparisons » │ │ │ │ Runs all pairwise statistical tests across >2 groups, then │ │ applies multiple-testing correction (default: Benjamini-Hochberg │ │ FDR). Supports Fisher resampling, Fisher exact, classical │ │ hypothesis tests, and binomial proportion tests. │ │ │ │ Author : Bart R.H. Geurten │ │ Licence: MIT │ └──────────────────────────────────────────────────────────────────────┘
- class rerandomstats.multi_group_test.MultiGroupTest(data, group, test, combination_n=10000, correction_type='fdr_bh', combination_set=())[source]
Bases:
objectPairwise multi-group testing with FDR correction.
Given parallel lists of data values and group labels the class performs all (or a user-specified subset of) pairwise comparisons, then adjusts p-values for multiplicity.
The test is selected via a
'family:name'string:Family
Available names
FishermedianDiff,meanDiff,sumDiff,exactBinomialztest,chi2hypoMannWhitneyU,KruskalWallis,ChiSquare,Kolmogorov,MoodMedian,WilcoxonRankSum,IndependentT- Parameters:
data (Sequence[float]) – Flat list of observed values.
group (Sequence[str]) – Corresponding group labels (same length as data).
test (str) –
'family:name'string.combination_n (Union[int, str]) – Passed to
FisherResamplingTestwhen the Fisher family is selected.correction_type (str) – Any method accepted by
statsmodels.stats.multipletests().combination_set (Sequence[Tuple[str, str]]) – Optional list of
(groupA, groupB)tuples restricting which pairs are tested.
- df
pandas.DataFramewith results (available aftermain()).
Example
>>> import numpy as np >>> data = list(np.random.randn(30)) >>> groups = ['A']*10 + ['B']*10 + ['C']*10 >>> mgt = MultiGroupTest(data, groups, 'hypo:MannWhitneyU', 1000) >>> result = mgt.main() >>> isinstance(result, pd.DataFrame) True
Case-Crossover Estimators (v0.2.0)
Time-stratified case-crossover conditional logit with stratified- permutation backup and Burke-2015 σ-rescaled effect translator.
Time-stratified case-crossover estimators with permutation backup.
A case-crossover design (Maclure 1991, Am J Epidemiol 133:144–153) estimates the within-stratum association between a transient exposure and a binary outcome by matching each case observation against a set of control observations drawn from the same subject / stratum at other times. The stratum intercepts are integrated out in the conditional likelihood, so all time-invariant subject characteristics are absorbed by construction.
The events-list interchange convention used throughout this module mirrors the ThermoStrife pipeline (Geurten 2026, https://doi.org/10.5281/zenodo.20371612). Each event is a dict with the following keys:
event_id(str) — stratum identifier.lat(float),lon(float) — only used by the daylight covariate and the within-event-contrast fetcher; pass dummy values if these are not needed.when(datetime.date) — event-day timestamp.tmax_event_c(float) — case-row exposure value.baseline(pandas.DataFrame) — one row per control day, index =date, with a column namedtmaxcarrying the control-day exposure value. The column is namedtmaxfor historical reasons in the ThermoStrife data model; the value may be any continuous exposure variable in the caller’s units.
The case-crossover machinery is exposure-agnostic: tmax_c is
literally just the column name. The functions in this module make
no biological assumption about what that column represents.
- rerandomstats.case_crossover.daylight_hours(lat_deg, when)[source]
Closed-form solar daylight hours for
whenat latitudelat_deg.Uses the standard astronomical approximation (sunrise / sunset by solar declination, ignoring atmospheric refraction and astronomical twilight). Accurate to ~10 min outside the polar circles, which is well inside the noise of typical daily exposure inference and avoids a hard dependency on
astralorpysolar.
- rerandomstats.case_crossover.build_case_crossover_frame(events)[source]
Stack per-event case + control rows into one long DataFrame.
Events with no baseline rows are silently skipped. Events with a
Nonetmax_event_cproduce a case row whosetmax_cisNaN; downstream estimators will drop those.- Parameters:
events (list[dict]) – List of event dicts conforming to the module-level interchange convention (see module docstring).
- Returns:
Long DataFrame with columns
[event_id, day, is_case, tmax_c, daylight_h], one row per (event, day) pair. Cases getis_case = 1, controls getis_case = 0.- Return type:
- rerandomstats.case_crossover.case_crossover_conditional_logit(frame, *, covariates=None)[source]
Conditional logistic regression on matched case-control sets.
Fits
logit P(case | event_id) = β · tmax_c + γ · covariatesviastatsmodels.discrete.conditional_models.ConditionalLogit. The stratum intercepts are integrated out by the conditional likelihood, so all time-invariant within-stratum characteristics are absorbed by construction.exp(β)is the odds ratio of case-vs-control per +1 unit of the exposure variable.- Parameters:
frame (DataFrame) – Output of
build_case_crossover_frame()(or a DataFrame with the same column conventions).covariates (list[str] | None) – Extra columns to include alongside
tmax_c. Defaults to["daylight_h"].
- Returns:
Dict with the headline estimates and metadata. Carries a
{"skipped": True, "reason": str}marker if the frame is empty or no stratum retains a case row after NA filtering.- Return type:
- rerandomstats.case_crossover.stratified_permutation(frame, n_perm=10000, rng=None)[source]
Within-event label-shuffle test on the mean case-vs-control gap.
Under the null, the case-day exposure is exchangeable with the baseline-day exposures within the same stratum. For each event, the test shuffles the case label across (case + controls), computes (case-exposure) minus (mean of control-exposures), and averages over events. The empirical permutation distribution of this statistic gives a distribution-free backup for
case_crossover_conditional_logit().- Parameters:
frame (DataFrame) – Output of
build_case_crossover_frame().n_perm (int) – Number of permutation draws. Default 10 000.
rng (Generator | None) – Optional NumPy
Generator. DefaultNone→ fresh non-deterministic RNG. Pass an explicit seeded generator for reproducibility.
- Returns:
Dict carrying
observed_diff_C,pvalue_one_sided,pvalue_two_sided,n_events, andn_perm. Returns a{"skipped": True}marker on empty frames or when no event has exactly one case row.- Return type:
- rerandomstats.case_crossover.stratify_case_crossover(events, key_fn, *, min_events=5, covariates=None)[source]
Split events into subgroups and re-fit the case-crossover per subgroup.
- Parameters:
events (list[dict]) – List of event dicts (interchange convention per module docstring).
key_fn (Callable[[dict], str | None]) – Callable mapping an event to a stratum label. Events for which
key_fnreturnsNoneare filtered out rather than bucketed under a"None"label.min_events (int) – Strata with fewer events than this threshold are not fit; the result dict carries a
{"skipped": True, "reason": ...}marker for them so the caller can still surface them in reports.covariates (list[str] | None) – Forwarded to
case_crossover_conditional_logit(). Defaults to[]rather than["daylight_h"]because the daylight covariate often becomes collinear within small strata.
- Returns:
Dict mapping stratum label to that stratum’s
case_crossover_conditional_logit()result, with an extran_eventskey on every entry.- Return type:
- rerandomstats.case_crossover.event_anomaly_profile(events, fetch_fn, offsets=(-7, -2, -1, 0, 1, 2, 7))[source]
Fetch per-event exposure at each
offsetand return as a long frame.For each event, looks up the exposure on day
when + offsetvia the caller-suppliedfetch_fn, subtracts the event’s baseline-window mean to get an anomaly, and stacks the rows. Offsets that the fetcher returnsNonefor are silently dropped — downstream consumers (e.g. superposed-epoch plots) handle the resulting raggednper offset.- Parameters:
events (list[dict]) – List of event dicts. Each must additionally provide
provenanceand optionallystation_id, which are forwarded tofetch_fn.fetch_fn (Callable[[...], float | None]) – Callable with signature
fetch_fn(provenance, lat, lon, day, *, station_id)returning either a float (the exposure value) orNone(no data for that day). This is the seam through which the caller’s own data backend is plugged in.offsets (tuple[int, ...]) – Day offsets relative to the event day. Default
(-7, -2, -1, 0, 1, 2, 7)mirrors the superposed-epoch composite used in Geurten (2026) ThermoStrife v0.1.1.
- Returns:
Long DataFrame with columns
event_id,offset_days,anomaly_C. Empty DataFrame if no event resolves any offset.- Return type:
- rerandomstats.case_crossover.h3_within_event_contrast(events, fetch_fn, *, window_offsets=(0, -1), surround_offsets=(-7, 7), n_boot=10000, rng=None)[source]
Per-event paired contrast: mean anomaly in window vs surround offsets.
For each event, fetches exposure on day
t + offsetfor every offset inwindow_offsets ∪ surround_offsetsviafetch_fn, converts to an anomaly by subtracting the existing baseline mean, then computes:diff = mean(anomaly @ window_offsets) − mean(anomaly @ surround_offsets)
Across-event distribution of
diffis tested with a one-sided Wilcoxon signed-rank against H₁: median > 0. A bootstrap 95 % CI on the mean is also returned.- Parameters:
events (list[dict]) – Event dicts (interchange convention; must provide
provenanceand optionallystation_idfor the fetcher).fetch_fn (Callable[[...], float | None]) – Same signature as in
event_anomaly_profile().window_offsets (tuple[int, ...]) – Offsets that should sit AT or NEAR the event day under the hypothesis. Default
(0, -1)(event day + day-before).surround_offsets (tuple[int, ...]) – Offsets that should sit AWAY from the event day. Default
(-7, 7).n_boot (int) – Bootstrap draws for the mean CI.
rng (Generator | None) – Optional
np.random.Generator.
- Returns:
Dict with
mean_diff_C,median_diff_C,ci95_low_C,ci95_high_C,wilcoxon_statistic,pvalue_one_sided,n_events_used, andn_events_skipped. Returns{"skipped": True, ...}if fewer than 5 events resolve all required offsets, or if every diff is exactly zero (Wilcoxon undefined).- Return type:
- rerandomstats.case_crossover.hsiang_sigma_rescaled(events, n_boot=10000, rng=None)[source]
Per-event z-score (event − baseline_mean) / baseline_std, averaged.
Following Burke, Hsiang & Miguel (2015, Ann Rev Econ 7:577–617), expressing each event’s exposure anomaly as a multiple of its own baseline-window standard deviation makes effects comparable across stations, climates, and study designs. Their headline meta-analytic number for interpersonal violence is +2.4 % per 1 σ contemporaneous warming.
- Parameters:
- Returns:
Dict with
mean_z,median_z,ci95_low_z,ci95_high_z,fraction_positive, andn_events. Returns{"skipped": True}if no event has a usable baseline.- Return type:
Model Comparison (v0.2.0)
Wald two-sample-β test, nested-model likelihood-ratio test, and the
shared-algorithmic-source p-value correction helpers
(correct_pvalues / correct_pvalues_array /
benjamini_hochberg) routing through statsmodels.stats.multitest.multipletests.
Comparing fitted coefficients and correcting heterogeneous test batteries.
The four functions in this module operate on already-fitted models or
already-computed p-values — they do NOT take raw data. For raw-data
workflows that combine data ingestion, pairwise testing, and
correction into one call, use rerandomstats.MultiGroupTest.
This separation matters when the test battery is heterogeneous (e.g.
a case-crossover conditional logit, a Poisson IRR, a Wald two-sample-β
contrast, and a likelihood-ratio test, all in the same auxiliary
battery) and the corresponding p-values cannot all be produced by a
single pairwise-comparisons pipeline. In that case the caller runs
each test individually, collects the p-values into a dict, and passes
the dict to correct_pvalues() (for a single method) or
benjamini_hochberg() (for the BH + Bonferroni dual-method
report) for correction.
Algorithmic-source guarantee. Both correct_pvalues() and
benjamini_hochberg() delegate the underlying correction
arithmetic to statsmodels.stats.multipletests; the same call is
used internally by rerandomstats.MultiGroupTest. There is
no parallel implementation of BH / Bonferroni / Holm / Sidak / FDR-BY
in this package — every code path that adjusts p-values converges
on statsmodels’ canonical implementation.
- rerandomstats.model_comparison.wald_two_sample_beta(beta_a, se_a, beta_b, se_b, *, alternative='two-sided', name_a='a', name_b='b')[source]
Wald z-test on the difference between two estimated coefficients.
Tests
H0: β_a = β_b(or directional variants) usingz = (β_a − β_b) / sqrt(SE(β_a)² + SE(β_b)²)
This is the right test when you have two independently-fitted estimates of related coefficients — e.g. a case-crossover OR fitted on a hot-host subsample vs the same estimator fitted on a cool-host subsample, or a Poisson IRR for an aggression-set outcome vs the same estimator on a non-aggression outcome. The formula assumes the two estimates are independent (no overlap in the data they were fit on); for nested models on the same data, use
likelihood_ratio_test()instead.- Parameters:
beta_a (float) – Estimated coefficient from subsample / model A.
se_a (float) – Standard error of
beta_a.beta_b (float) – Estimated coefficient from subsample / model B.
se_b (float) – Standard error of
beta_b.alternative (Literal['two-sided', 'greater', 'less']) –
"two-sided"(default) testsβ_a ≠ β_b."greater"testsβ_a > β_b."less"testsβ_a < β_b.name_a (str) – Optional label for the A side, surfaced in the output dict. Defaults to
"a".name_b (str) – Optional label for the B side. Defaults to
"b".
- Returns:
Dict with the input estimates, the difference and its SE, the Wald z statistic, the requested-tail p-value, and a 95 % confidence interval on the difference.
- Raises:
ValueError – if
alternativeis not one of the three accepted strings.- Return type:
- rerandomstats.model_comparison.likelihood_ratio_test(model_full, model_reduced)[source]
Likelihood-ratio test of a full vs reduced nested fit.
Computes:
LR = 2 · (log L_full − log L_reduced) p = 1 − F_chi²(LR; df_full − df_reduced)
Both arguments may be any objects exposing
.llf(log-likelihood) and.df_model(number of estimated parameters) attributes — the convention used bystatsmodelsresult objects, which coversConditionalLogit,Poisson,Logit,OLS, and most others.Use this for nested model comparisons on the same data: e.g. testing whether adding an interaction term to a case-crossover model significantly improves fit, or whether a tournament-edition × anomaly interaction is needed beyond the main-effect model. For comparing two estimates of the SAME coefficient fitted on DIFFERENT data, use
wald_two_sample_beta()instead.- Parameters:
- Returns:
Dict with the LR statistic, the degrees-of-freedom difference, the chi² p-value, and both log-likelihoods.
- Raises:
ValueError – if
model_fulldoes not have strictly more parameters thanmodel_reduced.- Return type:
- rerandomstats.model_comparison.correct_pvalues(pvalues, *, alpha=0.05, method='fdr_bh')[source]
Adjust a heterogeneous battery of p-values via one correction method.
Thin wrapper around
statsmodels.stats.multitest.multipletests. This is the shared algorithmic core for every multi-comparison correction in reRandomStats: bothbenjamini_hochberg()(the dual-method convenience helper) and thererandomstats.MultiGroupTestpipeline ultimately route through the same statsmodels call. No parallel implementation of the underlying math exists in this package.Use this directly when you want a single correction method (e.g. just BH FDR or just Holm) and need a tidy per-test result dict rather than DataFrame columns.
- Parameters:
pvalues (dict[str, float]) – Dict mapping test name → raw p-value. Keys are carried verbatim through to the output
resultsdict. Insertion order is preserved.alpha (float) – Family-wise significance level. Default 0.05.
method (str) – Any method accepted by
statsmodels.stats.multitest.multipletests: e.g."fdr_bh"(Benjamini–Hochberg step-up, the default),"fdr_by"(Benjamini–Yekutieli),"bonferroni","holm","hommel","sidak","holm-sidak","simes-hochberg","hommel","fdr_tsbh","fdr_tsbky".
- Returns:
Dict with top-level keys
family_size,alpha,method,n_rejected, andresults.resultsis a per-test dict carryingraw_p,rank(1 = smallest raw p-value),adjusted_p, andreject(bool). Empty input returns{"family_size": 0, "alpha": alpha, "method": method, "n_rejected": 0, "results": {}}.- Return type:
- rerandomstats.model_comparison.correct_pvalues_array(pvalues, *, alpha=0.05, method='fdr_bh')[source]
Adjust a NumPy array of p-values, preserving NaN entries.
Array-in / array-out wrapper around the same shared
statsmodels.stats.multitest.multipletestscall thatcorrect_pvalues()uses. Designed as a one-line drop-in replacement for project-specific BH/Bonferroni helpers that take and return arrays — e.g.digimuh.stats_core.benjamini_hochberg— so refactors converging on reRandomStats are a single import change.NaN entries in the input are preserved at the same positions in the output and are excluded from the correction (the family size used by statsmodels is the count of finite p-values, matching the DigiMuh convention).
- Parameters:
- Returns:
Array of adjusted p-values, same shape and dtype as input, with NaN entries preserved. Empty input returns an empty array.
- Return type:
- rerandomstats.model_comparison.benjamini_hochberg(pvalues, alpha=0.05)[source]
Dual-method BH + Bonferroni report for a heterogeneous battery.
Convenience wrapper around
correct_pvalues(): runs the correction twice (method="fdr_bh"andmethod="bonferroni") and merges the per-test results so reports can show BH q-values AND Bonferroni-adjusted p-values side-by-side. The BH FDR is typically the primary (less conservative for correlated batteries) with Bonferroni as a conservative reference.Both corrections are computed via the same
statsmodels.stats.multitest.multipletestscall (one per method), so the underlying math has exactly one implementation.- Parameters:
- Returns:
Dict with per-test
raw_p,rank,bh_adjusted_p,bh_threshold(the per-rank BH cut(rank/k)·alpha),bh_reject,bonferroni_adjusted_p, andbonferroni_reject. Top-level keys also exposefamily_size,alpha,bonferroni_threshold(the per-test alpha/k cut),bh_cutoff_rank(largest rank that passes BH), andn_bh_rejected/n_bonferroni_rejectedcounts. Empty input returns{"family_size": 0, "alpha": alpha, "results": {}}.- Return type:
Dose-Response and Breakpoint Analysis (v0.2.0)
Broken-stick segmented regression with profile-RSS 95 % CI on the breakpoint, the Davies (1987 / 2002) and Muggeo (2016) Pseudo-Score breakpoint-existence tests, the 4-parameter Hill / logistic fit with Sebaugh–McCray lower-bend point, and a per-subject iterator that applies any of the four to a panel of subjects.
Breakpoint and sigmoid dose-response fitting for two-variable data.
All five functions in this module take x / y arrays (or a
DataFrame for per_subject_segmented()) and return either a
dict (the four primary fitters) or a DataFrame (the per-subject
iterator). No I/O, no side effects, no project-specific assumptions.
Skip-marker convention. When the input is too small to fit
(n < 30 for the primary fitters, n < min_n for the per-subject
iterator), the result dict carries converged=False and n=<actual
sample size>, plus the primary parameter (e.g. breakpoint or
ec50) as np.nan. This is the DigiMuh-derived convention;
callers should test converged before accessing fit-specific fields.
References.
Davies RB (1987) Biometrika 74:33-43.
Davies RB (2002) Biometrika 89:484-489.
Goutelle S, Maurin M, Rougier F, Barbaut X, Bourguignon L, Ducher M, Maire P (2008) Fundamental & Clinical Pharmacology 22:633-648.
Hill AV (1910) Journal of Physiology 40:iv-vii.
Muggeo VMR (2003) Statistics in Medicine 22:3055-3071.
Muggeo VMR (2016) Journal of Statistical Computation and Simulation 86:3059-3067.
Sebaugh JL, McCray PD (2003) Pharmaceutical Statistics 2:167-174.
- rerandomstats.dose_response.broken_stick_fit(x, y, x_range=None, n_grid=200)[source]
Fit a two-segment piecewise-linear model with one breakpoint.
Model:
y = α_below + β_below·xforx ≤ ψ;y = α_above + β_above·xforx > ψ. Biological constraint enforced:slope_above > slope_belowandslope_above > 0(the “flat-ish below, steeper above” pattern typical of threshold physiological responses). Fits that violate this constraint are returned withconverged=Falseand arejected_reasonstring.The breakpoint is located by grid search over
n_gridcandidates inx_range, refined byscipy.optimize.minimize_scalaron the residual sum of squares. A profile-RSS 95 % confidence interval on the breakpoint is computed from the grid-evaluated RSS profile using the F-distribution threshold for a one-parameter constraint:CI = {ψ : RSS(ψ) ≤ RSS_min · (1 + F_{1, df_resid; 0.95} / df_resid)}Threshold crossings are linearly interpolated between adjacent grid points; CIs that hit the search bounds are flagged with
breakpoint_ci_truncated=True.- Parameters:
- Returns:
Dict with the breakpoint, both segment slopes and intercepts,
r_squared,n,n_below,n_above,converged,breakpoint_ci_lo,breakpoint_ci_hi,breakpoint_se(derived from half-CI-width / 1.96),breakpoint_ci_truncated, andrejected_reason(Noneif the fit converged).- Return type:
References
Muggeo VMR (2003) Estimating regression models with unknown break-points. Statistics in Medicine 22:3055-3071.
- rerandomstats.dose_response.davies_test(x, y, k=10, x_range=None)[source]
Davies (1987 / 2002) test for the existence of a breakpoint.
Tests
H0(the relationship is a single straight line, no breakpoint) vsH1(there is an unknown breakpoint where the slope changes). The procedure evaluateskcandidate breakpoints, computes a naive t-statistic for the slope-difference at each, takes the maximum, and corrects the p-value for the search using Davies’ upper bound:p ≤ Pr(|T| > M) + V · φ(M)
where
Vis the number of sign changes in the t-statistic sequence andφis the standard-normal (or t) density.- Parameters:
x (ndarray) – Predictor array. NaN entries dropped.
y (ndarray) – Response array. Must align with
x.k (int) – Number of evaluation points. Default 10 (matches the R
segmentedpackage default).x_range (tuple[float, float] | None) – Search range for candidate breakpoints. Default
(percentile_5, percentile_95)ofx.
- Returns:
Dict with
pvalue,best_at(candidate with strongest signal),max_statistic,n_sign_changes, andn.- Return type:
References
Davies RB (1987) Biometrika 74:33-43. Davies RB (2002) Biometrika 89:484-489.
- rerandomstats.dose_response.pscore_test(x, y, k=10, x_range=None)[source]
Pseudo-Score test for breakpoint existence (Muggeo 2016).
Typically more powerful than the Davies test for detecting a single changepoint. Averages the segmented variable over
kcandidate breakpoints:φ̄(x) = (1/k) · Σ_j max(x - ψ_j, 0)
and tests whether
γis significant in the augmented linear modely = α + β·x + γ·φ̄. The key insight: under H0 (no breakpoint)γ = 0; under H1γpicks up the slope change regardless of the true breakpoint location, sidestepping the nuisance-parameter problem that Davies’ procedure handles via upper-bound correction.- Parameters:
- Returns:
Dict with
pvalue,t_statistic,best_at(candidate ψ with the strongest INDIVIDUAL signal, for reporting only — not the test statistic itself),df, andn.- Return type:
References
Muggeo VMR (2016) Testing with a nuisance parameter present only under the alternative: a score-based approach with application to segmented modelling. J Stat Comput Simul 86:3059-3067.
- rerandomstats.dose_response.hill_fit(x, y, x_range=None)[source]
Fit a four-parameter logistic (Hill) dose-response model.
Model:
y = y_min + (y_max - y_min) / (1 + (EC50 / x)^n)
EC50 is the predictor value at half-maximal response. The Hill coefficient
n(hill_nin the return dict) captures steepness: largen= sharp threshold-like switch, smalln= gradual transition.In addition to EC50, the lower bend point following Sebaugh & McCray (2003) is returned: this is the x-value where the second derivative of the Hill curve equals zero on the lower side, marking the transition from the baseline plateau into the rising phase:
x_bend_lower = EC50 · ((n - 1) / (n + 1))^(1/n) for n > 1
For
n ≤ 1(no inflection in the lower portion), falls back to EC10 as a conventional onset marker. The lower bend is directly comparable to a broken-stick breakpoint — both mark the onset of the rising regime — and is the recommended single-number “where does the response start” summary for sigmoid dose-response data.- Parameters:
- Returns:
Dict with
ec50,hill_n,y_min,y_max,lower_bend(np.nanif outside the data range),r_squared,aic,n,converged, andbend_plausible.- Return type:
References
Goutelle S et al. (2008) The Hill equation: a review of its capabilities in pharmacological modelling. Fundamental & Clinical Pharmacology 22:633-648. Hill AV (1910) The possible effects of the aggregation of the molecules of haemoglobin on its dissociation curves. Journal of Physiology 40:iv-vii. Sebaugh JL, McCray PD (2003) Defining the linear portion of a sigmoid-shaped curve: bend points. Pharmaceutical Statistics 2:167-174.
- rerandomstats.dose_response.per_subject_segmented(df, subject_col, x_col, y_col, *, model=<function broken_stick_fit>, min_n=50, model_kwargs=None)[source]
Apply a dose-response fitter to each subject’s data.
For every unique value (or tuple, for composite keys) in
df[subject_col], fitmodel(x, y, **model_kwargs)wherexandyare the subject’s rows ofdf[x_col]anddf[y_col]. Subjects with fewer thanmin_nobservations are returned with the model’s standard skip marker (converged=Falseplus the primary parameter asnp.nan).Pickle-safe: the
modelcallable must be a module-level function so this iterator can be wrapped in aconcurrent.futures.ProcessPoolExecutorfor parallel fitting over large panels. All four primary fitters in this module (broken_stick_fit,davies_test,pscore_test,hill_fit) qualify.- Parameters:
df (DataFrame) – Long-format DataFrame with one row per observation.
subject_col (str | list[str]) – Column name (or list of column names for a composite key) identifying subjects.
x_col (str) – Column holding the predictor.
y_col (str) – Column holding the response.
model (Callable[[...], dict]) – Fitter callable taking
(x, y, **kwargs)and returning a result dict. Defaults tobroken_stick_fit().min_n (int) – Minimum observations per subject. Subjects below this threshold get a skip-marker row. Default 50.
model_kwargs (dict[str, Any] | None) – Extra keyword arguments forwarded to
model.
- Returns:
the subject key column(s) plus all keys returned by
model. Subjects withn < min_ncarryconverged=Falseandn=<actual>.- Return type:
DataFrame with one row per subject. Columns
Resampling Index Generator
┌──────────────────────────────────────────────────────────────────────┐ │ resample_n_of_k.py « Combinatorial Index Sampler » │ │ │ │ Generates index tuples for Fisher-style resampling. Supports │ │ exhaustive enumeration of all C(N, k) combinations as well as │ │ random sampling with or without uniqueness constraints. │ │ │ │ Author : Bart R.H. Geurten │ │ Licence: MIT │ └──────────────────────────────────────────────────────────────────────┘
- class rerandomstats.resample_n_of_k.GetNofK(data_set_a, data_set_b, combination_n, mode='resampling', max_len_possible_4_perms=10, resampling_n=100000)[source]
Bases:
objectGenerate index-pair tuples for resampling two data sets.
Given two data sets A and B the class combines them, then produces pairs of index tuples
(indices_A, indices_B)whereindices_Ahas the same length as the shorter set andindices_Bis its complement.- Parameters:
data_set_a (Sequence) – First data set.
data_set_b (Sequence) – Second data set.
combination_n (Union[int, str]) –
'all'for exhaustive enumeration or an integer specifying the number of random draws.mode (Literal['combinations', 'resampling', 'resample_unique']) –
'combinations'for exhaustive,'resampling'for random (non-unique),'resample_unique'for random with uniqueness guarantee.max_len_possible_4_perms (int) – If the shorter set exceeds this length, exhaustive mode falls back to resampling.
resampling_n (int) – Default number of random draws when combination_n triggers resampling mode.
- combined_data
Concatenation of both data sets.
- combination_n
Actual number of combinations generated.
- get_unique_random_combinations()[source]
Return resampling_n unique random index tuples.
If the target count cannot be reached after
10 × resampling_nattempts the method returns whatever unique tuples it managed to collect.
- get_random_combinations()[source]
Return resampling_n random index tuples (may contain duplicates).
- complement_indices()[source]
Build
(indices_A, complement)pairs from combinations.Populates
data_indicesso that each entry contains the sampled indices and their complement within the combined data.- Return type:
None
Data I/O
┌──────────────────────────────────────────────────────────────────────┐ │ data_io.py « CSV Data Ingestion Utilities » │ │ │ │ Reads wide-format CSV files (including German-locale semicolon- │ │ delimited variants), converts them to NumPy matrices, and │ │ reshapes wide tables into long (value, id) format suitable for │ │ statistical analysis. │ │ │ │ Author : Bart R.H. Geurten │ │ Licence: MIT │ └──────────────────────────────────────────────────────────────────────┘
- class rerandomstats.data_io.DataIO(german_csv=False)[source]
Bases:
objectCSV reader and wide→long table converter.
Handles both standard comma-separated and German-locale semicolon-separated CSV files.
- Parameters:
german_csv (bool) – If
True, treat semicolons as delimiters (common in German Excel exports).
- raw_data
Raw rows as read from the CSV.
Example
>>> dio = DataIO() >>> ids, vals = dio.wide_table_csv_to_long_table('data.csv')
- make_square_np_matrix(values)[source]
Build a rectangular NumPy float matrix from row data.
Empty cells are filled with
np.nan.- Parameters:
values (List) – List of rows (each row a list of strings or a semicolon-separated string when german_csv is set).
- Returns:
2-D
numpy.ndarrayof shape(nrows, ncols).- Return type:
- wide_table_to_value_id_list(values, col_header)[source]
Convert a wide matrix to parallel id / value lists.
Non-NaN cells are unpacked column-wise, tagging each value with its column header.
- wide_table_csv_to_long_table(file_path)[source]
Read a wide-format CSV and return long-format id/value lists.
Convenience wrapper that chains
read_csv(),make_square_np_matrix(),split_csv_headers(), andwide_table_to_value_id_list().
Pretty Table Output
┌──────────────────────────────────────────────────────────────────────┐ │ pretty_table.py « Console-Friendly Table Formatter » │ │ │ │ Renders a pandas DataFrame as a PrettyTable for quick terminal │ │ inspection or plain-text file export. │ │ │ │ Author : Bart R.H. Geurten │ │ Licence: MIT │ └──────────────────────────────────────────────────────────────────────┘