API Reference
stag.sync — Sensor synchronisation
Sensor data synchronisation for head and ear accelerometers.
This module provides the BetterDataSync class, which aligns
tri-axial accelerometer streams recorded on the head and ear of a deer
by detecting calibration-drop events (three controlled 1.5 m drops
recorded simultaneously by both loggers).
- class stag.sync.data_sync.BetterDataSync(deer_id, head_data, ear_data, window_dict, log=True, log_folder='', mkplot=False, plot_folder='')[source]
Bases:
objectSynchronise head and ear accelerometer data via calibration drops.
- Parameters:
deer_id (str) – Identifier for the deer (e.g.
"R1_D1").head_data (pandas.DataFrame) – Accelerometer data from the head-mounted logger with columns
'X','Y','Z'.ear_data (pandas.DataFrame) – Accelerometer data from the ear-mounted logger.
window_dict (dict) – Processing window with keys
'start'and'end'(sample indices).log (bool, optional) – Enable CSV logging. Default
True.log_folder (str, optional) – Directory for log files.
mkplot (bool, optional) – Generate diagnostic plots. Default
False.plot_folder (str, optional) – Directory for saved plots.
Utility functions for accelerometer data preprocessing.
Helper functions used during sensor synchronisation, including z-score calibration, absolute-value transforms, column summation, and consecutive-difference computation.
- stag.sync.utils.correct_calibration(data, cols=None)[source]
Z-score the specified columns (zero mean, unit variance).
- Parameters:
data (pandas.DataFrame) – Input accelerometer data.
cols (list of str, optional) – Columns to standardise. Default
['X', 'Y', 'Z'].
- Returns:
Z-scored copy of the selected columns.
- Return type:
- stag.sync.utils.make_absolute(data)[source]
Return the element-wise absolute value of a DataFrame.
- Parameters:
data (pandas.DataFrame) – Input data.
- Returns:
DataFrame with absolute values.
- Return type:
- stag.sync.utils.sum_columns(data)[source]
Sum all columns row-wise.
- Parameters:
data (pandas.DataFrame) – Input data.
- Returns:
Row-wise sum.
- Return type:
- stag.sync.utils.get_consecutive_differences(series)[source]
Compute first-order differences of a Series.
- Parameters:
series (pandas.Series) – Input time series.
- Returns:
Consecutive differences (length = original − 1).
- Return type:
- stag.sync.utils.get_calibrated_absolute_accelleration(data, cols=None)[source]
One-step pipeline: z-score → absolute → sum.
- Parameters:
data (pandas.DataFrame) – Raw accelerometer data.
cols (list of str, optional) – Columns to process. Default
['X', 'Y', 'Z'].
- Returns:
Summed absolute z-scored acceleration.
- Return type:
stag.database — Database models and ingestion
Extract clustering-ready feature matrices from the STAG database.
Queries the SQLAlchemy database for synchronised accelerometer data
and exports it as a .npy array suitable for the k-means stage.
- class stag.database.make_cluster_data.DeerInfo(**kwargs)[source]
Bases:
BaseRepresents information about each deer, including identification and related data.
- Attributes:
deer_id (Integer): The primary key, autoincrementing. repetition_number (String): Identifies the repetition sequence of the data collection for this deer. deer_number (String): A unique identifier for the deer. accelerometer_data (relationship): Links to associated AccelerometerData records. trajectory_data (relationship): Links to associated TrajectoryData records.
- deer_id
- repetition_number
- deer_number
- accelerometer_data
- trajectory_data
- class stag.database.make_cluster_data.AccelerometerData(**kwargs)[source]
Bases:
BaseStores accelerometer data related to deer movement.
- Attributes:
data_id (Integer): The primary key, autoincrementing. deer_id (Integer): Foreign key linking back to the DeerInfo. X_head, Y_head, Z_head (Float): Accelerometer readings for the head. X_ear, Y_ear, Z_ear (Float): Accelerometer readings for the ear. deer_info (relationship): Back-reference to the associated DeerInfo.
- data_id
- deer_id
- X_head
- Y_head
- Z_head
- X_ear
- Y_ear
- Z_ear
- deer_info
- class stag.database.make_cluster_data.TrajectoryData(**kwargs)[source]
Bases:
BaseContains trajectory data including positional information and calculated features.
- Attributes:
data_id (Integer): The primary key, autoincrementing. deer_id (Integer): Foreign key linking back to the DeerInfo. pos_WGS84_lat, pos_WGS84_lon (Float): GPS coordinates in the WGS84 system. pos_NZMG_x_meter, pos_NZMG_y_meter (Float): Position in the New Zealand Map Grid system. pos_x_meter_filt, pos_y_meter_filt (Float): Filtered positional data. abs_speed_mPs (Float): Absolute speed in meters per second. tortuosity (Float): Calculated tortuosity of the movement path. deer_info (relationship): Back-reference to the associated DeerInfo.
- data_id
- deer_id
- pos_WGS84_lat
- pos_WGS84_lon
- pos_NZMG_x_meter
- pos_NZMG_y_meter
- pos_x_meter_filt
- pos_y_meter_filt
- abs_speed_mPs
- tortuosity
- deer_info
- stag.database.make_cluster_data.open_session(database_url)[source]
Opens a session for the database.
- stag.database.make_cluster_data.get_deer_ids(session)[source]
Returns a list of all deer_ids in the database.
- stag.database.make_cluster_data.get_data_for_deer(session, deer_id)[source]
Fetches and concatenates accelerometer and trajectory data for a given deer_id.
stag.gps — GPS trajectory analysis
Tortuosity and speed from raw GPS latitude/longitude.
Haversine-based distance calculation between consecutive GPS fixes, yielding arc-chord tortuosity ratios and absolute ground speed.
stag.clustering — k-means clustering and evaluation
GPU-accelerated k-means clustering with contiguous leave-out stability.
This module implements the core clustering stage of the STAG pipeline. It partitions z-scored accelerometer feature vectors into k prototypical movements using RAPIDS cuML k-means on GPU, evaluates cluster quality via the Calinski–Harabasz index, and supports a contiguous leave-out scheme for robustness analysis.
The script is designed for SLURM array-job submission: all parameters (k, deletion size, deletion position, random state) are accepted as command-line arguments.
Example
python -m stag.clustering.kmeans \
-t deer8 -nc 8 -ds 0 -dp 0 -rs 0 \
-df data/clust_data.npy -sd results/
- stag.clustering.kmeans.shrink_data(data, reduction_percent, cut_position_percent)[source]
Remove a contiguous block from the data for stability analysis.
Implements the circular leave-out scheme described in the paper: a block of
reduction_percent% of the data starting atcut_position_percent% is excised, wrapping around if the block extends past the end of the array.- Parameters:
data (numpy.ndarray) – Feature matrix of shape
(n_samples, n_features).reduction_percent (float) – Percentage of data to remove (0–100).
cut_position_percent (float) – Starting position of the cut as a percentage of total length.
- Returns:
Reduced feature matrix.
- Return type:
- stag.clustering.kmeans.generate_filename(parent_dir, tag, num_clusters, deletion_size, deletion_position)[source]
Build standardised output paths for centroids, labels, and metadata.
- Parameters:
- Returns:
Dictionary with keys
'centroids','labels','meta'mapping to their respective file paths.- Return type:
- stag.clustering.kmeans.save_output(centroids, labels, quality_score, data_file, reduction_percent, cut_position_percent, filenames, start_time, duration)[source]
Persist clustering results (centroids, labels, metadata JSON).
- Parameters:
centroids (numpy.ndarray) – Cluster centroids, shape
(k, n_features).labels (numpy.ndarray) – Per-sample cluster assignments.
quality_score (float) – Calinski–Harabasz index.
data_file (str) – Path to the input data file.
reduction_percent (float) – Deletion size used.
cut_position_percent (float) – Deletion position used.
filenames (dict) – Output paths from
generate_filename().start_time (datetime.datetime) – Analysis start timestamp.
duration (datetime.timedelta) – Wall-clock duration of the analysis.
- stag.clustering.kmeans.get_quality(labels, data_gpu_scaled)[source]
Compute the Calinski–Harabasz index for a clustering solution.
- Parameters:
labels (numpy.ndarray) – Cluster assignments.
data_gpu_scaled (cupy.ndarray or numpy.ndarray) – Standardised feature matrix (on GPU or CPU).
- Returns:
Calinski–Harabasz score, or
NaNif only one cluster is populated.- Return type:
- stag.clustering.kmeans.main(tag, n_clusters, deletion_size, deletion_position, random_state, data_file, save_dir)[source]
Run a single k-means clustering job.
- Parameters:
tag (str) – Experiment tag.
n_clusters (int) – Number of clusters.
deletion_size (int) – Percentage of contiguous data to leave out (0 = full data).
deletion_position (int) – Starting position of the leave-out block (percentage).
random_state (int) – Random seed for k-means initialisation.
data_file (str) – Path to the
.npyfeature matrix.save_dir (str) – Root directory for output files.
stag.analysis — Behavioural sequence analysis
Behavioural sequence analysis from cluster label time series.
This module implements the LabelAnalyser, which takes the
per-time-step cluster assignments produced by the k-means stage and
computes:
Percentage prevalence of each prototypical movement.
Bout durations (mean ± SEM) for contiguous runs of the same label.
A first-order transition matrix (the basis for HMM super-prototypes).
Short-sequence filtering to merge spurious single-frame labels into adjacent bouts (after Braun & Geurten, 2010).
Results are saved as a single JSON file for downstream plotting and circadian analysis.
- class stag.analysis.label_analysis.LabelAnalyser(file_path, fps=50)[source]
Bases:
objectAnalyse cluster labels for behavioural statistics and transitions.
- Parameters:
- IDX
The label array (modified in place by
filterIDX()).- Type:
- filterIDX(cutoff)[source]
Merge short label runs into neighbouring bouts.
Sequences shorter than cutoff frames are absorbed by the adjacent bout (previous or next) that is longer, following the approach of Braun & Geurten (2010).
- Parameters:
cutoff (int) – Minimum bout length (in frames) to retain.
- get_percentage()[source]
Compute the prevalence of each label as a percentage.
- Returns:
Array of length
cen_numwith percentage values.- Return type:
- get_transitions()[source]
Build the first-order transition matrix.
- Returns:
Square matrix of shape
(cen_num, cen_num)where entry(i, j)counts transitions from label i to label j.- Return type:
- save_results_to_json(file_path, durations, percentages, transitions)[source]
Write analysis results to a JSON file.
- Parameters:
file_path (str) – Output JSON path.
durations (list of tuple) – From
get_mean_durations().percentages (numpy.ndarray) – From
get_percentage().transitions (numpy.ndarray) – From
get_transitions().
Feature-level statistics for z-scoring.
Computes per-column mean (mu) and standard deviation (sigma) from a
.npy feature matrix and saves them to CSV for use in standardisation
and de-standardisation of cluster centroids.
- stag.analysis.preprocessing.save_mu_sigma_from_npy(npy_file_path, csv_file_path)[source]
Loads a matrix from a .npy file, calculates mu and sigma for each column, and saves these values to a CSV file.
Parameters: - npy_file_path (str): Path to the .npy file containing the matrix. - csv_file_path (str): Path to save the CSV file with mu and sigma values.
stag.utils — Utility functions
CSV-formatted log handler for Python logging.
Provides CsvFormatter, a logging formatter that writes each
log record as a quoted CSV row (level, message).
- class stag.utils.csv_formatter.CsvFormatter[source]
Bases:
Formatter- format(record)[source]
Format the specified record as text.
The record’s attribute dictionary is used as the operand to a string formatting operation which yields the returned string. Before formatting the dictionary, a couple of preparatory steps are carried out. The message attribute of the record is computed using LogRecord.getMessage(). If the formatting string uses the time (as determined by a call to usesTime(), formatTime() is called to format the event time. If there is exception information, it is formatted using formatException() and appended to the message.
Standardised output path generation for clustering results.
Generates a directory hierarchy and filenames for centroids, labels, and metadata JSON files based on experiment tag, k, deletion size, and deletion position.