API Reference

Core Infrastructure

DigiMuh: a toolkit for ingesting, storing, and querying dairy-cow environmental and physiological sensor data.

The package consolidates heterogeneous CSV exports from on-farm monitoring systems into a single normalised SQLite database.

Hierarchical configuration for DigiMuh analysis pipelines.

Priority (highest wins):

  1. CLI arguments (always override everything)

  2. .env in the project directory (quick per-project overrides)

  3. ~/.config/digimuh/config.yaml (machine-specific, never in repo)

  4. Built-in defaults

Usage in an entry point:

from digimuh.config import load_config

cfg = load_config()
# cfg.database   → Path to cow.db
# cfg.output     → Path to results directory
# cfg.tierauswahl → Path to Tierauswahl.xlsx
# cfg.n_jobs     → Number of parallel workers

CLI arguments still work and override everything:

digimuh-extract --db /other/cow.db --out /tmp/test

Setup a new machine:

digimuh-config

This creates ~/.config/digimuh/config.yaml interactively.

class digimuh.config.DigiMuhConfig(database=None, output=<factory>, tierauswahl=None, n_jobs=20, smaxtec_drink_correction=False, _sources=<factory>)[source]

Resolved configuration for a DigiMuh pipeline run.

Parameters:
  • database (Path | None)

  • output (Path)

  • tierauswahl (Path | None)

  • n_jobs (int)

  • smaxtec_drink_correction (bool)

  • _sources (dict)

database: Path | None = None
output: Path
tierauswahl: Path | None = None
n_jobs: int = 20
smaxtec_drink_correction: bool = False
digimuh.config.load_config(cli_args=None, project_root=None)[source]

Load configuration with full priority chain.

Parameters:
  • cli_args (Namespace | None) – Parsed argparse namespace (from entry point).

  • project_root (Path | None) – Project root for .env lookup. Defaults to cwd.

Returns:

Resolved DigiMuhConfig.

Return type:

DigiMuhConfig

digimuh.config.print_config(cfg)[source]

Log the resolved configuration with sources.

Parameters:

cfg (DigiMuhConfig)

Return type:

None

digimuh.config.setup_interactive()[source]

Interactive setup for a new machine. Creates config.yaml.

Return type:

None

digimuh.config.main()[source]

Entry point for digimuh-config.

Return type:

None

Pretty-printed console output for DigiMuh analysis pipelines.

Uses rich for coloured panels, tables, progress bars, and tree views. Falls back to plain logging if rich is not installed.

Usage:

from digimuh.console import console, section, result_table, progress

console.print("[bold blue]Starting analysis …[/]")
with progress("Fitting animals") as pb:
    task = pb.add_task("Broken-stick", total=220)
    for animal in animals:
        fit(animal)
        pb.advance(task)
section("Results", "Breakpoint analysis complete")
result_table("Convergence", headers, rows)
digimuh.console.setup_logging(level=20)[source]

Configure logging with rich handler if available.

Parameters:

level (int)

Return type:

None

digimuh.console.section(title, subtitle='')[source]

Print a section header.

Parameters:
Return type:

None

digimuh.console.reset_steps()[source]

Reset the step counter (for testing).

Return type:

None

digimuh.console.result_table(title, headers, rows, highlight_col=None)[source]

Print a formatted results table.

Parameters:
Return type:

None

digimuh.console.stars_styled(stars)[source]

Return rich-styled significance stars.

Parameters:

stars (str)

Return type:

str

digimuh.console.kv(key, value, indent=2)[source]

Print a key-value pair.

Parameters:
Return type:

None

digimuh.console.kv_pair(key, val1, val2, sep=' / ')[source]

Print a key with two values (e.g. converged / total).

Parameters:
Return type:

None

digimuh.console.progress(description='Processing')[source]

Context manager for a rich progress bar.

Parameters:

description (str)

digimuh.console.banner(title, version='0.1.0')[source]

Print the DigiMuh startup banner.

Parameters:
Return type:

None

digimuh.console.done(message='All analyses complete.')[source]

Print completion message.

Parameters:

message (str)

Return type:

None

Shared database connection, view initialisation, and plotting defaults used across all DigiMuh analysis modules.

Usage:

from digimuh.analysis_utils import connect_db, setup_plotting
con = connect_db(Path("cow.db"))
setup_plotting()
digimuh.analysis_utils.connect_db(db_path, create_views=True)[source]

Open the cow database and optionally create analysis views.

Parameters:
  • db_path (Path) – Path to the SQLite database file.

  • create_views (bool) – If True, execute create_views.sql to ensure all analysis views exist.

Returns:

An open sqlite3.Connection with Row factory enabled.

Return type:

Connection

digimuh.analysis_utils.query_df(con, sql, params=())[source]

Execute a query and return a pandas DataFrame.

Parameters:
  • con (sqlite3.Connection) – Active database connection.

  • sql (str) – SQL query string.

  • params (tuple) – Bind parameters for the query.

Returns:

A pandas.DataFrame with the query results.

Return type:

pd.DataFrame

digimuh.analysis_utils.setup_plotting()[source]

Configure matplotlib for publication-quality figures.

Sets SVG text as editable (svg.fonttype = "none"), uses a clean style, and configures reasonable defaults for axis labels and tick sizes.

Return type:

None

digimuh.analysis_utils.save_fig(fig, name, out_dir)[source]

Save a figure as SVG, PNG, and close it.

Parameters:
  • fig (Figure) – The matplotlib figure to save.

  • name (str) – Base filename (without extension).

  • out_dir (Path) – Output directory.

Return type:

None

Data Ingestion

Quick validation of the ingested cow database.

Checks row counts, null rates, value ranges, referential integrity, and temporal coverage. Run immediately after ingestion to catch problems before launching analysis.

Usage:

python -m digimuh.validate_db --db cow.db
digimuh.validate_db.check_table_counts(con)[source]

Verify all expected tables exist and report row counts.

Returns:

List of warning/error messages (empty = all good).

Parameters:

con (Connection)

Return type:

list[str]

digimuh.validate_db.check_null_rates(con)[source]

Report null rates for key columns.

Returns:

List of warning messages for suspiciously high null rates.

Parameters:

con (Connection)

Return type:

list[str]

digimuh.validate_db.check_value_ranges(con)[source]

Check that numeric values fall within plausible ranges.

Returns:

List of warnings for out-of-range values.

Parameters:

con (Connection)

Return type:

list[str]

digimuh.validate_db.check_temporal_coverage(con)[source]

Report the date range of each timestamped table.

Returns:

List of warnings for unexpected gaps or ranges.

Parameters:

con (Connection)

Return type:

list[str]

digimuh.validate_db.check_referential_integrity(con)[source]

Check foreign key relationships between fact and dimension tables.

Returns:

List of warnings for orphaned references.

Parameters:

con (Connection)

Return type:

list[str]

digimuh.validate_db.main()[source]

Run all validation checks and print a summary.

Return type:

None

Analysis 00 — Broken-Stick Regression

Analysis 01 — Ketosis

Subclinical ketosis risk scoring from multi-sensor fusion.

Uses the v_analysis_ketosis view which joins daily milking data (HerdePlus MLP test-day results), smaXtec rumen metrics, water intake, and disease ground truth.

The analysis:

  1. Extracts days with MLP test-day data (FPR available).

  2. Computes a composite ketosis risk score from: - Fat-to-protein ratio (FPR > 1.4 = energy deficit) - Rumination index (lower = reduced feed intake) - Milk yield deviation from rolling cow mean - Rumen pH (low pH + high FPR = metabolic confusion)

  3. Validates against disease records (ground truth).

  4. Trains a Random Forest classifier and reports feature importance and cross-validated performance.

Usage:

python -m digimuh.analysis_01_ketosis --db cow.db --out results/ketosis

References

Oetzel (2013) — FPR thresholds for subclinical ketosis. Kaufman et al. (2016) J Dairy Sci 99:5604–18 — rumination

time association with subclinical ketosis.

digimuh.analysis_01_ketosis.load_ketosis_data(con)[source]

Load ketosis analysis view and add derived features.

Parameters:

con – Active database connection with views created.

Returns:

DataFrame with one row per animal per MLP test day, enriched with rolling-mean deviations and risk scores.

Return type:

DataFrame

digimuh.analysis_01_ketosis.train_ketosis_classifier(df, out_dir)[source]

Train a Random Forest to classify ketosis risk.

Uses FPR > 1.4 as the positive label (subclinical ketosis indicator) and evaluates against disease records where available.

Parameters:
Returns:

Dict with performance metrics and feature importances.

Return type:

dict

digimuh.analysis_01_ketosis.plot_ketosis_overview(df, out_dir)[source]

Generate overview plots for ketosis analysis.

Parameters:
Return type:

None

digimuh.analysis_01_ketosis.main()[source]

Entry point for ketosis analysis.

Return type:

None

Analysis 03 — Heat Stress

Per-animal heat stress response modelling.

Uses the v_analysis_heat_stress view which combines daily smaXtec rumen data, DWD weather, gouna respiration, and milking production.

The analysis:

  1. Builds per-animal Z-scored rumen temperature following the approach of the NZ smaXtec study (JDS Communications, 2024): each cow’s temperature distribution is scaled to a common mean and SD before thresholding.

  2. Fits per-animal thermoregulatory dose-response curves: rumen_temp_z = f(THI) using sigmoidal regression. The inflection point and slope characterise each animal’s heat tolerance.

  3. Computes a daily heat load index fusing rumen temp, respiration rate, activity suppression, and water intake.

  4. Quantifies the production impact: milk yield loss per unit of heat load.

Usage:

python -m digimuh.analysis_03_heat_stress --db cow.db --out results/heat

References

Identifying and predicting heat stress events for grazing dairy cows using rumen temperature boluses. JDS Comm. 2024.

digimuh.analysis_03_heat_stress.load_heat_data(con)[source]

Load heat stress view and add per-animal Z-scored temperature.

Parameters:

con – Database connection with views.

Returns:

DataFrame with per-animal Z-scored rumen temperature and a composite heat load index.

Return type:

DataFrame

digimuh.analysis_03_heat_stress.sigmoid(x, L, k, x0, b)[source]

Four-parameter sigmoid: L / (1 + exp(-k*(x - x0))) + b.

Parameters:
Return type:

ndarray

digimuh.analysis_03_heat_stress.fit_dose_response(df, min_days=30)[source]

Fit per-animal sigmoid dose-response: rumen_temp_z = f(THI).

The inflection point (x0) represents the THI at which the cow’s temperature begins to rise sharply — its personal heat tolerance threshold.

Parameters:
Returns:

animal_id, thi_threshold (x0), slope (k), n_days.

Return type:

DataFrame with one row per animal, columns

digimuh.analysis_03_heat_stress.compute_production_impact(df)[source]

Estimate milk yield loss attributable to heat load.

Bins the heat_load_index into quartiles and computes mean milk yield per bin.

Parameters:

df (DataFrame) – DataFrame from load_heat_data().

Returns:

Summary DataFrame with heat load bins and mean production.

Return type:

DataFrame

digimuh.analysis_03_heat_stress.plot_heat_overview(df, dose, out_dir)[source]

Generate heat stress analysis plots.

Parameters:
  • df (DataFrame) – Full analysis DataFrame.

  • dose (DataFrame) – Per-animal dose-response fit results.

  • out_dir (Path) – Output directory.

Return type:

None

digimuh.analysis_03_heat_stress.main()[source]

Entry point for heat stress analysis.

Return type:

None

Analysis 06 — Digestive

Rumen mechanical–chemical–production coupling analysis.

Uses v_analysis_digestive which joins daily smaXtec motility/pH with HerdePlus MLP milk composition.

The key insight: reticulorumen contraction patterns (motility) drive mixing, mixing drives fermentation rate, fermentation determines the volatile fatty acid profile, and VFA ratios directly shape milk fat and protein. This pipeline has a multi-day lag.

The analysis:

  1. Computes time-lagged cross-correlations between daily motility metrics and the next available MLP test-day values.

  2. Builds a digestive efficiency score from the motility–pH coupling: animals where motility and pH co-vary tightly have well-functioning rumens.

  3. Tests whether digestive efficiency predicts milk composition at the next MLP test day.

Usage:

python -m digimuh.analysis_06_digestive --db cow.db --out results/digestive
digimuh.analysis_06_digestive.load_digestive_data(con)[source]

Load digestive analysis view.

Parameters:

con – Database connection with views.

Returns:

DataFrame with daily motility/pH and sparse MLP composition.

Return type:

DataFrame

digimuh.analysis_06_digestive.compute_lagged_correlations(df, predictor_cols, target_cols, max_lag_days=14)[source]

Compute cross-correlations between daily rumen metrics and MLP test-day values at various time lags.

For each animal, MLP test days are identified (non-null target), and the mean of each predictor over the preceding N days is correlated with the target value.

Parameters:
  • df (DataFrame) – Full digestive DataFrame, sorted by animal_id + day.

  • predictor_cols (list[str]) – Daily rumen metrics to use as predictors.

  • target_cols (list[str]) – MLP test-day columns (sparse).

  • max_lag_days (int) – Maximum look-back window in days.

Returns:

lag × predictor × target → correlation.

Return type:

DataFrame

digimuh.analysis_06_digestive.compute_digestive_efficiency(df, window=7)[source]

Compute a per-animal rolling digestive efficiency score.

Efficiency is defined as the strength of coupling between motility (contraction interval) and rumen pH over a rolling window. In a well-functioning rumen, shorter contraction intervals (faster mixing) correspond to lower pH (more active fermentation) — a negative correlation.

Parameters:
  • df (DataFrame) – Full digestive DataFrame.

  • window (int) – Rolling window size in days.

Returns:

DataFrame with animal_id, day, digest_eff, and digest_eff_rank (percentile within herd-day).

Return type:

DataFrame

digimuh.analysis_06_digestive.plot_digestive_results(lagged, efficiency, out_dir)[source]

Generate digestive analysis plots.

Parameters:
  • lagged (DataFrame) – Lagged correlation results.

  • efficiency (DataFrame) – Digestive efficiency scores.

  • out_dir (Path) – Output directory.

Return type:

None

digimuh.analysis_06_digestive.main()[source]

Entry point for digestive efficiency analysis.

Return type:

None

Analysis 11 — Circadian

Circadian rhythm analysis as a general-purpose welfare biomarker.

Uses v_analysis_circadian which provides hourly aggregates of rumen temperature, activity, and rumination per animal per day, joined with disease ground truth.

Biological rationale: healthy ruminants exhibit strong ~24h rhythms in core body temperature (nadir early morning, peak late afternoon), activity (bimodal: dawn and dusk feeding bouts), and rumination (complementary to activity — peaks at rest). Circadian amplitude collapse or phase shift is a well-established early marker of sickness in human chronobiology but barely explored in cattle.

The analysis:

  1. For each animal-day, fits a single-harmonic Fourier model (24h period) to the hourly profile of each signal.

  2. Extracts amplitude (strength of rhythm) and acrophase (time of peak) as daily biomarkers.

  3. Computes a Circadian Disruption Index (CDI) = deviation from the animal’s own healthy-period baseline.

  4. Tests whether CDI elevation precedes clinical disease onset.

Usage:

python -m digimuh.analysis_11_circadian --db cow.db --out results/circadian
digimuh.analysis_11_circadian.load_circadian_data(con)[source]

Load hourly circadian data.

Parameters:

con – Database connection with views.

Returns:

DataFrame with hourly temp/activity/rumination per animal-day.

Return type:

DataFrame

digimuh.analysis_11_circadian.fit_circadian_harmonic(hours, values)[source]

Fit a single 24h-period harmonic to hourly data.

Model: y(t) = A * cos(2π/24 * t - φ) + M

Uses the closed-form DFT approach (no iterative fitting): compute the first Fourier coefficient at frequency 1/24h.

Parameters:
  • hours (ndarray) – Array of hour-of-day values (0–23).

  • values (ndarray) – Corresponding signal values.

Returns:

Dict with amplitude, acrophase_h (hour of peak, 0–24), mesor (24h mean), n_hours (data points), and relative_amplitude (amplitude / mesor).

Return type:

dict

digimuh.analysis_11_circadian.extract_circadian_features(df)[source]

Extract circadian features for each animal-day.

For each of temperature, activity, and rumination, computes the 24h Fourier amplitude, acrophase, and relative amplitude.

Parameters:

df (DataFrame) – Hourly circadian DataFrame.

Returns:

DataFrame with one row per animal-day and columns for each signal’s circadian parameters.

Return type:

DataFrame

digimuh.analysis_11_circadian.compute_disruption_index(features, baseline_days=30)[source]

Compute Circadian Disruption Index (CDI) per animal-day.

CDI measures how far each day’s circadian parameters deviate from the animal’s own baseline (first baseline_days of healthy-period data). A Mahalanobis-like distance across amplitude, phase, and mesor of all three signals.

Parameters:
Returns:

DataFrame with animal_id, day, cdi.

Return type:

DataFrame

digimuh.analysis_11_circadian.plot_circadian_results(features, cdi, out_dir)[source]

Generate circadian analysis plots.

Parameters:
  • features (DataFrame) – Circadian feature DataFrame.

  • cdi (DataFrame) – Circadian Disruption Index DataFrame.

  • out_dir (Path) – Output directory.

Return type:

None

digimuh.analysis_11_circadian.main()[source]

Entry point for circadian analysis.

Return type:

None

Analysis 12 — Motility Entropy

Reticulorumen contraction entropy as a novel welfare biomarker.

Uses v_analysis_motility which extracts raw motility time series (contraction interval and pulse width) from smaXtec derived data.

Biological rationale: in a healthy rumen, reticulorumen contractions are quasi-periodic with modest beat-to-beat variability (analogous to healthy heart rate variability). Pathological states — acidosis, inflammation, impaction — disrupt this regularity. Very low entropy (rigid, uncoupled contractions) and very high entropy (chaotic, disorganised contractions) both indicate dysfunction.

This is directly analogous to heart rate variability (HRV) analysis in cardiology, applied to the rumen motor complex. As far as we know, this approach has not been published.

The analysis:

  1. Computes sample entropy and permutation entropy of the contraction interval (mot_period) series in sliding windows.

  2. Derives daily summary statistics: mean entropy, entropy SD, and entropy trend (slope).

  3. Correlates entropy features with concurrent rumen pH, rumination index, and disease status.

  4. Tests whether entropy changes precede clinical diagnosis.

Usage:

python -m digimuh.analysis_12_motility_entropy --db cow.db --out results/entropy
digimuh.analysis_12_motility_entropy.sample_entropy(x, m=2, r=None)[source]

Compute sample entropy of a time series.

Sample entropy (SampEn) quantifies the regularity of a signal. Lower values indicate more self-similarity (regularity); higher values indicate more complexity/randomness.

Parameters:
  • x (ndarray) – 1-D time series (must have len > m+1).

  • m (int) – Embedding dimension (template length). Default 2, standard for physiological signals.

  • r (float | None) – Tolerance radius. Default is 0.2 * std(x), the standard choice from Richman & Moorman (2000).

Returns:

Sample entropy value. Returns np.nan if the series is too short or constant.

Return type:

float

References

Richman JS, Moorman JR. Am J Physiol Heart Circ Physiol. 2000;278:H2039–49.

digimuh.analysis_12_motility_entropy.permutation_entropy(x, order=3, delay=1, normalize=True)[source]

Compute permutation entropy of a time series.

Permutation entropy (PE) captures the complexity of a signal based on the ordinal patterns of consecutive values. It is robust to noise and monotonic transformations.

Parameters:
  • order (int) – Embedding order (permutation length). Default 3.

  • delay (int) – Embedding delay. Default 1.

  • normalize (bool) – If True, normalise by log(order!) to [0, 1].

  • x (ndarray)

Returns:

Permutation entropy value.

Return type:

float

References

Bandt C, Pompe B. Phys Rev Lett. 2002;88:174102.

digimuh.analysis_12_motility_entropy.compute_daily_entropy(con, window_size=50)[source]

Compute daily motility entropy features per animal.

For each animal-day, collects the mot_period readings, computes sample entropy and permutation entropy, and derives summary statistics.

Parameters:
  • con – Database connection with views.

  • window_size (int) – Minimum number of motility readings per day to compute entropy (default: 50).

Returns:

DataFrame with daily entropy features per animal.

Return type:

DataFrame

Test whether entropy changes before disease onset.

For each disease event, computes the mean entropy in the lookback_days before diagnosis and compares to the animal’s healthy-period baseline.

Parameters:
  • entropy_df (DataFrame) – Daily entropy features.

  • lookback_days (int) – Days before diagnosis to examine.

Returns:

DataFrame with pre-disease vs. baseline entropy comparison.

Return type:

DataFrame

digimuh.analysis_12_motility_entropy.plot_entropy_results(entropy_df, trends, out_dir)[source]

Generate entropy analysis plots.

Parameters:
  • entropy_df (DataFrame) – Daily entropy features.

  • trends (DataFrame) – Pre-disease trend results.

  • out_dir (Path) – Output directory.

Return type:

None

digimuh.analysis_12_motility_entropy.main()[source]

Entry point for motility entropy analysis.

Return type:

None