API Reference¶
Core Infrastructure¶
DigiMuh: a toolkit for ingesting, storing, and querying dairy-cow environmental and physiological sensor data.
The package consolidates heterogeneous CSV exports from on-farm monitoring systems into a single normalised SQLite database.
Hierarchical configuration for DigiMuh analysis pipelines.
Priority (highest wins):
CLI arguments (always override everything)
.envin the project directory (quick per-project overrides)~/.config/digimuh/config.yaml(machine-specific, never in repo)Built-in defaults
Usage in an entry point:
from digimuh.config import load_config
cfg = load_config()
# cfg.database → Path to cow.db
# cfg.output → Path to results directory
# cfg.tierauswahl → Path to Tierauswahl.xlsx
# cfg.n_jobs → Number of parallel workers
CLI arguments still work and override everything:
digimuh-extract --db /other/cow.db --out /tmp/test
Setup a new machine:
digimuh-config
This creates ~/.config/digimuh/config.yaml interactively.
- class digimuh.config.DigiMuhConfig(database=None, output=<factory>, tierauswahl=None, n_jobs=20, smaxtec_drink_correction=False, _sources=<factory>)[source]¶
Resolved configuration for a DigiMuh pipeline run.
- Parameters:
- digimuh.config.load_config(cli_args=None, project_root=None)[source]¶
Load configuration with full priority chain.
- Parameters:
- Returns:
Resolved DigiMuhConfig.
- Return type:
- digimuh.config.print_config(cfg)[source]¶
Log the resolved configuration with sources.
- Parameters:
cfg (DigiMuhConfig)
- Return type:
None
- digimuh.config.setup_interactive()[source]¶
Interactive setup for a new machine. Creates config.yaml.
- Return type:
None
Pretty-printed console output for DigiMuh analysis pipelines.
Uses rich for coloured panels, tables, progress bars, and tree
views. Falls back to plain logging if rich is not installed.
Usage:
from digimuh.console import console, section, result_table, progress
console.print("[bold blue]Starting analysis …[/]")
with progress("Fitting animals") as pb:
task = pb.add_task("Broken-stick", total=220)
for animal in animals:
fit(animal)
pb.advance(task)
section("Results", "Breakpoint analysis complete")
result_table("Convergence", headers, rows)
- digimuh.console.setup_logging(level=20)[source]¶
Configure logging with rich handler if available.
- Parameters:
level (int)
- Return type:
None
- digimuh.console.result_table(title, headers, rows, highlight_col=None)[source]¶
Print a formatted results table.
- digimuh.console.kv_pair(key, val1, val2, sep=' / ')[source]¶
Print a key with two values (e.g. converged / total).
- digimuh.console.progress(description='Processing')[source]¶
Context manager for a rich progress bar.
- Parameters:
description (str)
- digimuh.console.done(message='All analyses complete.')[source]¶
Print completion message.
- Parameters:
message (str)
- Return type:
None
Shared database connection, view initialisation, and plotting defaults used across all DigiMuh analysis modules.
Usage:
from digimuh.analysis_utils import connect_db, setup_plotting
con = connect_db(Path("cow.db"))
setup_plotting()
- digimuh.analysis_utils.connect_db(db_path, create_views=True)[source]¶
Open the cow database and optionally create analysis views.
- Parameters:
- Returns:
An open
sqlite3.ConnectionwithRowfactory enabled.- Return type:
- digimuh.analysis_utils.query_df(con, sql, params=())[source]¶
Execute a query and return a pandas DataFrame.
- Parameters:
con (sqlite3.Connection) – Active database connection.
sql (str) – SQL query string.
params (tuple) – Bind parameters for the query.
- Returns:
A
pandas.DataFramewith the query results.- Return type:
pd.DataFrame
Data Ingestion¶
Quick validation of the ingested cow database.
Checks row counts, null rates, value ranges, referential integrity, and temporal coverage. Run immediately after ingestion to catch problems before launching analysis.
Usage:
python -m digimuh.validate_db --db cow.db
- digimuh.validate_db.check_table_counts(con)[source]¶
Verify all expected tables exist and report row counts.
- Returns:
List of warning/error messages (empty = all good).
- Parameters:
con (Connection)
- Return type:
- digimuh.validate_db.check_null_rates(con)[source]¶
Report null rates for key columns.
- Returns:
List of warning messages for suspiciously high null rates.
- Parameters:
con (Connection)
- Return type:
- digimuh.validate_db.check_value_ranges(con)[source]¶
Check that numeric values fall within plausible ranges.
- Returns:
List of warnings for out-of-range values.
- Parameters:
con (Connection)
- Return type:
- digimuh.validate_db.check_temporal_coverage(con)[source]¶
Report the date range of each timestamped table.
- Returns:
List of warnings for unexpected gaps or ranges.
- Parameters:
con (Connection)
- Return type:
- digimuh.validate_db.check_referential_integrity(con)[source]¶
Check foreign key relationships between fact and dimension tables.
- Returns:
List of warnings for orphaned references.
- Parameters:
con (Connection)
- Return type:
Analysis 00 — Broken-Stick Regression¶
Analysis 01 — Ketosis¶
Subclinical ketosis risk scoring from multi-sensor fusion.
Uses the v_analysis_ketosis view which joins daily milking
data (HerdePlus MLP test-day results), smaXtec rumen metrics,
water intake, and disease ground truth.
The analysis:
Extracts days with MLP test-day data (FPR available).
Computes a composite ketosis risk score from: - Fat-to-protein ratio (FPR > 1.4 = energy deficit) - Rumination index (lower = reduced feed intake) - Milk yield deviation from rolling cow mean - Rumen pH (low pH + high FPR = metabolic confusion)
Validates against disease records (ground truth).
Trains a Random Forest classifier and reports feature importance and cross-validated performance.
Usage:
python -m digimuh.analysis_01_ketosis --db cow.db --out results/ketosis
References
Oetzel (2013) — FPR thresholds for subclinical ketosis. Kaufman et al. (2016) J Dairy Sci 99:5604–18 — rumination
time association with subclinical ketosis.
- digimuh.analysis_01_ketosis.load_ketosis_data(con)[source]¶
Load ketosis analysis view and add derived features.
- Parameters:
con – Active database connection with views created.
- Returns:
DataFrame with one row per animal per MLP test day, enriched with rolling-mean deviations and risk scores.
- Return type:
- digimuh.analysis_01_ketosis.train_ketosis_classifier(df, out_dir)[source]¶
Train a Random Forest to classify ketosis risk.
Uses FPR > 1.4 as the positive label (subclinical ketosis indicator) and evaluates against disease records where available.
- Parameters:
df (DataFrame) – DataFrame from
load_ketosis_data().out_dir (Path) – Directory for saving results.
- Returns:
Dict with performance metrics and feature importances.
- Return type:
- digimuh.analysis_01_ketosis.plot_ketosis_overview(df, out_dir)[source]¶
Generate overview plots for ketosis analysis.
- Parameters:
df (DataFrame) – DataFrame from
load_ketosis_data().out_dir (Path) – Output directory for figures.
- Return type:
None
Analysis 03 — Heat Stress¶
Per-animal heat stress response modelling.
Uses the v_analysis_heat_stress view which combines daily
smaXtec rumen data, DWD weather, gouna respiration, and milking
production.
The analysis:
Builds per-animal Z-scored rumen temperature following the approach of the NZ smaXtec study (JDS Communications, 2024): each cow’s temperature distribution is scaled to a common mean and SD before thresholding.
Fits per-animal thermoregulatory dose-response curves: rumen_temp_z = f(THI) using sigmoidal regression. The inflection point and slope characterise each animal’s heat tolerance.
Computes a daily heat load index fusing rumen temp, respiration rate, activity suppression, and water intake.
Quantifies the production impact: milk yield loss per unit of heat load.
Usage:
python -m digimuh.analysis_03_heat_stress --db cow.db --out results/heat
References
Identifying and predicting heat stress events for grazing dairy cows using rumen temperature boluses. JDS Comm. 2024.
- digimuh.analysis_03_heat_stress.load_heat_data(con)[source]¶
Load heat stress view and add per-animal Z-scored temperature.
- Parameters:
con – Database connection with views.
- Returns:
DataFrame with per-animal Z-scored rumen temperature and a composite heat load index.
- Return type:
- digimuh.analysis_03_heat_stress.sigmoid(x, L, k, x0, b)[source]¶
Four-parameter sigmoid: L / (1 + exp(-k*(x - x0))) + b.
- digimuh.analysis_03_heat_stress.fit_dose_response(df, min_days=30)[source]¶
Fit per-animal sigmoid dose-response: rumen_temp_z = f(THI).
The inflection point (x0) represents the THI at which the cow’s temperature begins to rise sharply — its personal heat tolerance threshold.
- Parameters:
df (DataFrame) – DataFrame from
load_heat_data().min_days (int) – Minimum number of observation days per animal.
- Returns:
animal_id,thi_threshold(x0),slope(k),n_days.- Return type:
DataFrame with one row per animal, columns
- digimuh.analysis_03_heat_stress.compute_production_impact(df)[source]¶
Estimate milk yield loss attributable to heat load.
Bins the heat_load_index into quartiles and computes mean milk yield per bin.
- Parameters:
df (DataFrame) – DataFrame from
load_heat_data().- Returns:
Summary DataFrame with heat load bins and mean production.
- Return type:
Analysis 06 — Digestive¶
Rumen mechanical–chemical–production coupling analysis.
Uses v_analysis_digestive which joins daily smaXtec motility/pH
with HerdePlus MLP milk composition.
The key insight: reticulorumen contraction patterns (motility) drive mixing, mixing drives fermentation rate, fermentation determines the volatile fatty acid profile, and VFA ratios directly shape milk fat and protein. This pipeline has a multi-day lag.
The analysis:
Computes time-lagged cross-correlations between daily motility metrics and the next available MLP test-day values.
Builds a digestive efficiency score from the motility–pH coupling: animals where motility and pH co-vary tightly have well-functioning rumens.
Tests whether digestive efficiency predicts milk composition at the next MLP test day.
Usage:
python -m digimuh.analysis_06_digestive --db cow.db --out results/digestive
- digimuh.analysis_06_digestive.load_digestive_data(con)[source]¶
Load digestive analysis view.
- Parameters:
con – Database connection with views.
- Returns:
DataFrame with daily motility/pH and sparse MLP composition.
- Return type:
- digimuh.analysis_06_digestive.compute_lagged_correlations(df, predictor_cols, target_cols, max_lag_days=14)[source]¶
Compute cross-correlations between daily rumen metrics and MLP test-day values at various time lags.
For each animal, MLP test days are identified (non-null target), and the mean of each predictor over the preceding N days is correlated with the target value.
- Parameters:
- Returns:
lag × predictor × target → correlation.
- Return type:
DataFrame
- digimuh.analysis_06_digestive.compute_digestive_efficiency(df, window=7)[source]¶
Compute a per-animal rolling digestive efficiency score.
Efficiency is defined as the strength of coupling between motility (contraction interval) and rumen pH over a rolling window. In a well-functioning rumen, shorter contraction intervals (faster mixing) correspond to lower pH (more active fermentation) — a negative correlation.
Analysis 11 — Circadian¶
Circadian rhythm analysis as a general-purpose welfare biomarker.
Uses v_analysis_circadian which provides hourly aggregates of
rumen temperature, activity, and rumination per animal per day,
joined with disease ground truth.
Biological rationale: healthy ruminants exhibit strong ~24h rhythms in core body temperature (nadir early morning, peak late afternoon), activity (bimodal: dawn and dusk feeding bouts), and rumination (complementary to activity — peaks at rest). Circadian amplitude collapse or phase shift is a well-established early marker of sickness in human chronobiology but barely explored in cattle.
The analysis:
For each animal-day, fits a single-harmonic Fourier model (24h period) to the hourly profile of each signal.
Extracts amplitude (strength of rhythm) and acrophase (time of peak) as daily biomarkers.
Computes a Circadian Disruption Index (CDI) = deviation from the animal’s own healthy-period baseline.
Tests whether CDI elevation precedes clinical disease onset.
Usage:
python -m digimuh.analysis_11_circadian --db cow.db --out results/circadian
- digimuh.analysis_11_circadian.load_circadian_data(con)[source]¶
Load hourly circadian data.
- Parameters:
con – Database connection with views.
- Returns:
DataFrame with hourly temp/activity/rumination per animal-day.
- Return type:
- digimuh.analysis_11_circadian.fit_circadian_harmonic(hours, values)[source]¶
Fit a single 24h-period harmonic to hourly data.
Model: y(t) = A * cos(2π/24 * t - φ) + M
Uses the closed-form DFT approach (no iterative fitting): compute the first Fourier coefficient at frequency 1/24h.
- digimuh.analysis_11_circadian.extract_circadian_features(df)[source]¶
Extract circadian features for each animal-day.
For each of temperature, activity, and rumination, computes the 24h Fourier amplitude, acrophase, and relative amplitude.
- digimuh.analysis_11_circadian.compute_disruption_index(features, baseline_days=30)[source]¶
Compute Circadian Disruption Index (CDI) per animal-day.
CDI measures how far each day’s circadian parameters deviate from the animal’s own baseline (first baseline_days of healthy-period data). A Mahalanobis-like distance across amplitude, phase, and mesor of all three signals.
- Parameters:
features (DataFrame) – DataFrame from
extract_circadian_features().baseline_days (int) – Number of initial healthy days to define the per-animal baseline.
- Returns:
DataFrame with
animal_id,day,cdi.- Return type:
Analysis 12 — Motility Entropy¶
Reticulorumen contraction entropy as a novel welfare biomarker.
Uses v_analysis_motility which extracts raw motility time series
(contraction interval and pulse width) from smaXtec derived data.
Biological rationale: in a healthy rumen, reticulorumen contractions are quasi-periodic with modest beat-to-beat variability (analogous to healthy heart rate variability). Pathological states — acidosis, inflammation, impaction — disrupt this regularity. Very low entropy (rigid, uncoupled contractions) and very high entropy (chaotic, disorganised contractions) both indicate dysfunction.
This is directly analogous to heart rate variability (HRV) analysis in cardiology, applied to the rumen motor complex. As far as we know, this approach has not been published.
The analysis:
Computes sample entropy and permutation entropy of the contraction interval (
mot_period) series in sliding windows.Derives daily summary statistics: mean entropy, entropy SD, and entropy trend (slope).
Correlates entropy features with concurrent rumen pH, rumination index, and disease status.
Tests whether entropy changes precede clinical diagnosis.
Usage:
python -m digimuh.analysis_12_motility_entropy --db cow.db --out results/entropy
- digimuh.analysis_12_motility_entropy.sample_entropy(x, m=2, r=None)[source]¶
Compute sample entropy of a time series.
Sample entropy (SampEn) quantifies the regularity of a signal. Lower values indicate more self-similarity (regularity); higher values indicate more complexity/randomness.
- Parameters:
- Returns:
Sample entropy value. Returns
np.nanif the series is too short or constant.- Return type:
References
Richman JS, Moorman JR. Am J Physiol Heart Circ Physiol. 2000;278:H2039–49.
- digimuh.analysis_12_motility_entropy.permutation_entropy(x, order=3, delay=1, normalize=True)[source]¶
Compute permutation entropy of a time series.
Permutation entropy (PE) captures the complexity of a signal based on the ordinal patterns of consecutive values. It is robust to noise and monotonic transformations.
- Parameters:
- Returns:
Permutation entropy value.
- Return type:
References
Bandt C, Pompe B. Phys Rev Lett. 2002;88:174102.
- digimuh.analysis_12_motility_entropy.compute_daily_entropy(con, window_size=50)[source]¶
Compute daily motility entropy features per animal.
For each animal-day, collects the
mot_periodreadings, computes sample entropy and permutation entropy, and derives summary statistics.
- digimuh.analysis_12_motility_entropy.compute_predisease_trends(entropy_df, lookback_days=7)[source]¶
Test whether entropy changes before disease onset.
For each disease event, computes the mean entropy in the lookback_days before diagnosis and compares to the animal’s healthy-period baseline.