Pipeline ======== This page describes the algorithmic details of each STAG pipeline stage. For code examples, see :doc:`usage`. Data collection --------------- Thirty-six adult red deer stags (*Cervus elaphus*) were each fitted with two battery-powered tri-axial accelerometers (±16 g, 50 Hz; TechnoSmart Europe): - **Head unit** (axyTrek) — mounted at the base of the antler pedicle; includes a 0.5 Hz GPS logger. - **Ear unit** (axy) — attached to the ear. Stags were monitored for 48 h in a paddock with *ad libitum* food and water. Simultaneous video (30 Hz) served as ground truth. Feature set ----------- Six accelerometer axes (three head, three ear) are standardised to zero mean and unit variance. GPS-derived speed and tortuosity are retained only for coarse ground-truthing due to the 100× sampling-rate mismatch between accelerometers (50 Hz) and GPS (0.5 Hz). **Tortuosity** is defined as the arc–chord ratio over three consecutive GPS fixes: .. math:: \\tau = \\frac{\\|\\vec{p}_1 - \\vec{p}_0\\| + \\|\\vec{p}_2 - \\vec{p}_1\\|}{\\|\\vec{p}_2 - \\vec{p}_0\\|} A straight trajectory yields τ = 1; higher values indicate more sinuous paths. Clustering ---------- *k*-means (Lloyd's algorithm, squared Euclidean distance) partitions the z-scored feature vectors into *k* prototypical movements. The GPU implementation uses RAPIDS cuML for scalability. **Model selection** evaluates k = 2–50 (24 settings × 200 independent runs = 48 000 fits) using two criteria: 1. **Cluster quality** — Calinski–Harabasz index (higher is better). 2. **Cluster stability** — contiguous leave-out: a block of 1 − *r* of the time series is removed (r ∈ {0.50, 0.75, 0.90, 1.00}), the block is slid in 2 % steps, and centroid drift is measured via Hungarian assignment. Low drift = stable solution. k = 8 was selected as a joint compromise of quality and stability. Behavioural prototypes ---------------------- The eight prototypical movements (PM₀–PM₇) fall into three categories: - **Inactive** (PM₀, PM₁, PM₃) — lying / standing, ± rumination / panting. 65.5 % of the time budget. - **Grazing** (PM₆, PM₇) — grazing while walking, and stationary grazing. - **Ear flicks** (PM₂, PM₄, PM₅) — rapid ear movements in response to irritants (e.g. flies), each < 1 s. Hidden Markov Model ------------------- The prototype label sequence is modelled as a first-order Markov chain. **Super-prototypes** are frequent triplets of successive labels exceeding a probability threshold, representing compound behaviours: - *Grazing cycle*: stationary graze → step → stationary graze (the most frequent triplet). - *Ear-flick bout*: PM₂ → PM₅ → PM₄ (a top-five triplet, condensing a flurry of ear movements into a single event). Circadian analysis ------------------ Classified data are aggregated into hourly bins over a 24 h period (second day only, after acclimation). Grazing peaks in morning, midday, and evening; inactive states dominate overnight. Ear flicks show a marked daytime peak, correlating with insect activity. On-animal deployment -------------------- Classification reduces to a nearest-centroid operation. On an Arduino Uno (16 MHz ATmega328P) the C implementation achieves 4.3 × 10⁸ classifications per second — four orders of magnitude faster than the 50 Hz sensor sampling rate. Memory footprint is a few kilobytes.