API Reference

stag.sync — Sensor synchronisation

Sensor data synchronisation for head and ear accelerometers.

This module provides the BetterDataSync class, which aligns tri-axial accelerometer streams recorded on the head and ear of a deer by detecting calibration-drop events (three controlled 1.5 m drops recorded simultaneously by both loggers).

class stag.sync.data_sync.BetterDataSync(deer_id, head_data, ear_data, window_dict, log=True, log_folder='', mkplot=False, plot_folder='')[source]

Bases: object

Synchronise head and ear accelerometer data via calibration drops.

Parameters:
  • deer_id (str) – Identifier for the deer (e.g. "R1_D1").

  • head_data (pandas.DataFrame) – Accelerometer data from the head-mounted logger with columns 'X', 'Y', 'Z'.

  • ear_data (pandas.DataFrame) – Accelerometer data from the ear-mounted logger.

  • window_dict (dict) – Processing window with keys 'start' and 'end' (sample indices).

  • log (bool, optional) – Enable CSV logging. Default True.

  • log_folder (str, optional) – Directory for log files.

  • mkplot (bool, optional) – Generate diagnostic plots. Default False.

  • plot_folder (str, optional) – Directory for saved plots.

drops_dict

Detected calibration-drop timestamps after synchronisation.

Type:

dict

detect_drops(signal, prominence=5.0, distance=500)[source]

Detect calibration-drop peaks in a preprocessed signal.

Parameters:
  • signal (array-like) – Preprocessed (summed absolute z-scored) acceleration signal.

  • prominence (float, optional) – Minimum peak prominence. Default 5.0.

  • distance (int, optional) – Minimum samples between peaks. Default 500.

Returns:

Indices of detected peaks.

Return type:

numpy.ndarray

run_synchronization()[source]

Execute the full synchronisation pipeline.

Returns:

Dictionary with 'head' and 'ear' drop indices if successful, None otherwise.

Return type:

dict or None

Utility functions for accelerometer data preprocessing.

Helper functions used during sensor synchronisation, including z-score calibration, absolute-value transforms, column summation, and consecutive-difference computation.

stag.sync.utils.correct_calibration(data, cols=None)[source]

Z-score the specified columns (zero mean, unit variance).

Parameters:
  • data (pandas.DataFrame) – Input accelerometer data.

  • cols (list of str, optional) – Columns to standardise. Default ['X', 'Y', 'Z'].

Returns:

Z-scored copy of the selected columns.

Return type:

pandas.DataFrame

stag.sync.utils.make_absolute(data)[source]

Return the element-wise absolute value of a DataFrame.

Parameters:

data (pandas.DataFrame) – Input data.

Returns:

DataFrame with absolute values.

Return type:

pandas.DataFrame

stag.sync.utils.sum_columns(data)[source]

Sum all columns row-wise.

Parameters:

data (pandas.DataFrame) – Input data.

Returns:

Row-wise sum.

Return type:

pandas.Series

stag.sync.utils.get_consecutive_differences(series)[source]

Compute first-order differences of a Series.

Parameters:

series (pandas.Series) – Input time series.

Returns:

Consecutive differences (length = original − 1).

Return type:

pandas.Series

stag.sync.utils.get_calibrated_absolute_accelleration(data, cols=None)[source]

One-step pipeline: z-score → absolute → sum.

Parameters:
  • data (pandas.DataFrame) – Raw accelerometer data.

  • cols (list of str, optional) – Columns to process. Default ['X', 'Y', 'Z'].

Returns:

Summed absolute z-scored acceleration.

Return type:

pandas.Series

stag.database — Database models and ingestion

Extract clustering-ready feature matrices from the STAG database.

Queries the SQLAlchemy database for synchronised accelerometer data and exports it as a .npy array suitable for the k-means stage.

class stag.database.make_cluster_data.DeerInfo(**kwargs)[source]

Bases: Base

Represents information about each deer, including identification and related data.

Attributes:

deer_id (Integer): The primary key, autoincrementing. repetition_number (String): Identifies the repetition sequence of the data collection for this deer. deer_number (String): A unique identifier for the deer. accelerometer_data (relationship): Links to associated AccelerometerData records. trajectory_data (relationship): Links to associated TrajectoryData records.

deer_id
repetition_number
deer_number
accelerometer_data
trajectory_data
class stag.database.make_cluster_data.AccelerometerData(**kwargs)[source]

Bases: Base

Stores accelerometer data related to deer movement.

Attributes:

data_id (Integer): The primary key, autoincrementing. deer_id (Integer): Foreign key linking back to the DeerInfo. X_head, Y_head, Z_head (Float): Accelerometer readings for the head. X_ear, Y_ear, Z_ear (Float): Accelerometer readings for the ear. deer_info (relationship): Back-reference to the associated DeerInfo.

data_id
deer_id
X_head
Y_head
Z_head
X_ear
Y_ear
Z_ear
deer_info
class stag.database.make_cluster_data.TrajectoryData(**kwargs)[source]

Bases: Base

Contains trajectory data including positional information and calculated features.

Attributes:

data_id (Integer): The primary key, autoincrementing. deer_id (Integer): Foreign key linking back to the DeerInfo. pos_WGS84_lat, pos_WGS84_lon (Float): GPS coordinates in the WGS84 system. pos_NZMG_x_meter, pos_NZMG_y_meter (Float): Position in the New Zealand Map Grid system. pos_x_meter_filt, pos_y_meter_filt (Float): Filtered positional data. abs_speed_mPs (Float): Absolute speed in meters per second. tortuosity (Float): Calculated tortuosity of the movement path. deer_info (relationship): Back-reference to the associated DeerInfo.

data_id
deer_id
pos_WGS84_lat
pos_WGS84_lon
pos_NZMG_x_meter
pos_NZMG_y_meter
pos_x_meter_filt
pos_y_meter_filt
abs_speed_mPs
tortuosity
deer_info
stag.database.make_cluster_data.open_session(database_url)[source]

Opens a session for the database.

stag.database.make_cluster_data.get_deer_ids(session)[source]

Returns a list of all deer_ids in the database.

stag.database.make_cluster_data.get_data_for_deer(session, deer_id)[source]

Fetches and concatenates accelerometer and trajectory data for a given deer_id.

stag.database.make_cluster_data.aggregate_all_data(session, deer_ids)[source]

Aggregates data for all deer_ids and interpolates over NaN values.

stag.database.make_cluster_data.save_data_to_npy(data, filename)[source]

Saves the data to a .npy file.

stag.gps — GPS trajectory analysis

Tortuosity and speed from raw GPS latitude/longitude.

Haversine-based distance calculation between consecutive GPS fixes, yielding arc-chord tortuosity ratios and absolute ground speed.

stag.gps.tortuosity.calculate_tortuosity_and_speed(lat, lon, fps=0.5)[source]
stag.gps.tortuosity.lat_lon_vec_to_meter_vec(lat1, lon1, lat2, lon2)[source]
stag.gps.tortuosity.extract_tort_and_speed(saved_loc_and_tort_filepath)[source]

stag.clustering — k-means clustering and evaluation

GPU-accelerated k-means clustering with contiguous leave-out stability.

This module implements the core clustering stage of the STAG pipeline. It partitions z-scored accelerometer feature vectors into k prototypical movements using RAPIDS cuML k-means on GPU, evaluates cluster quality via the Calinski–Harabasz index, and supports a contiguous leave-out scheme for robustness analysis.

The script is designed for SLURM array-job submission: all parameters (k, deletion size, deletion position, random state) are accepted as command-line arguments.

Example

python -m stag.clustering.kmeans \
    -t deer8 -nc 8 -ds 0 -dp 0 -rs 0 \
    -df data/clust_data.npy -sd results/
stag.clustering.kmeans.shrink_data(data, reduction_percent, cut_position_percent)[source]

Remove a contiguous block from the data for stability analysis.

Implements the circular leave-out scheme described in the paper: a block of reduction_percent % of the data starting at cut_position_percent % is excised, wrapping around if the block extends past the end of the array.

Parameters:
  • data (numpy.ndarray) – Feature matrix of shape (n_samples, n_features).

  • reduction_percent (float) – Percentage of data to remove (0–100).

  • cut_position_percent (float) – Starting position of the cut as a percentage of total length.

Returns:

Reduced feature matrix.

Return type:

numpy.ndarray

stag.clustering.kmeans.generate_filename(parent_dir, tag, num_clusters, deletion_size, deletion_position)[source]

Build standardised output paths for centroids, labels, and metadata.

Parameters:
  • parent_dir (str) – Root directory for results.

  • tag (str) – Experiment tag (e.g. "deer8").

  • num_clusters (int) – Number of clusters (k).

  • deletion_size (int) – Deletion size percentage.

  • deletion_position (int) – Deletion position percentage.

Returns:

Dictionary with keys 'centroids', 'labels', 'meta' mapping to their respective file paths.

Return type:

dict

stag.clustering.kmeans.save_output(centroids, labels, quality_score, data_file, reduction_percent, cut_position_percent, filenames, start_time, duration)[source]

Persist clustering results (centroids, labels, metadata JSON).

Parameters:
  • centroids (numpy.ndarray) – Cluster centroids, shape (k, n_features).

  • labels (numpy.ndarray) – Per-sample cluster assignments.

  • quality_score (float) – Calinski–Harabasz index.

  • data_file (str) – Path to the input data file.

  • reduction_percent (float) – Deletion size used.

  • cut_position_percent (float) – Deletion position used.

  • filenames (dict) – Output paths from generate_filename().

  • start_time (datetime.datetime) – Analysis start timestamp.

  • duration (datetime.timedelta) – Wall-clock duration of the analysis.

stag.clustering.kmeans.get_quality(labels, data_gpu_scaled)[source]

Compute the Calinski–Harabasz index for a clustering solution.

Parameters:
  • labels (numpy.ndarray) – Cluster assignments.

  • data_gpu_scaled (cupy.ndarray or numpy.ndarray) – Standardised feature matrix (on GPU or CPU).

Returns:

Calinski–Harabasz score, or NaN if only one cluster is populated.

Return type:

float

stag.clustering.kmeans.main(tag, n_clusters, deletion_size, deletion_position, random_state, data_file, save_dir)[source]

Run a single k-means clustering job.

Parameters:
  • tag (str) – Experiment tag.

  • n_clusters (int) – Number of clusters.

  • deletion_size (int) – Percentage of contiguous data to leave out (0 = full data).

  • deletion_position (int) – Starting position of the leave-out block (percentage).

  • random_state (int) – Random seed for k-means initialisation.

  • data_file (str) – Path to the .npy feature matrix.

  • save_dir (str) – Root directory for output files.

stag.analysis — Behavioural sequence analysis

Behavioural sequence analysis from cluster label time series.

This module implements the LabelAnalyser, which takes the per-time-step cluster assignments produced by the k-means stage and computes:

  • Percentage prevalence of each prototypical movement.

  • Bout durations (mean ± SEM) for contiguous runs of the same label.

  • A first-order transition matrix (the basis for HMM super-prototypes).

  • Short-sequence filtering to merge spurious single-frame labels into adjacent bouts (after Braun & Geurten, 2010).

Results are saved as a single JSON file for downstream plotting and circadian analysis.

class stag.analysis.label_analysis.LabelAnalyser(file_path, fps=50)[source]

Bases: object

Analyse cluster labels for behavioural statistics and transitions.

Parameters:
  • file_path (str) – Path to a .npy file containing the integer label array.

  • fps (int, optional) – Sampling rate in Hz, used to convert bout lengths to seconds. Default 50.

IDX

The label array (modified in place by filterIDX()).

Type:

numpy.ndarray

fps

Frames per second.

Type:

int

cen_num

Number of unique cluster labels.

Type:

int

label_num

Total number of time steps.

Type:

int

filterIDX(cutoff)[source]

Merge short label runs into neighbouring bouts.

Sequences shorter than cutoff frames are absorbed by the adjacent bout (previous or next) that is longer, following the approach of Braun & Geurten (2010).

Parameters:

cutoff (int) – Minimum bout length (in frames) to retain.

get_percentage()[source]

Compute the prevalence of each label as a percentage.

Returns:

Array of length cen_num with percentage values.

Return type:

numpy.ndarray

get_mean_durations()[source]

Compute mean bout duration and SEM for each label.

Returns:

Each tuple is (mean_seconds, sem_seconds).

Return type:

list of tuple of (float, float)

get_transitions()[source]

Build the first-order transition matrix.

Returns:

Square matrix of shape (cen_num, cen_num) where entry (i, j) counts transitions from label i to label j.

Return type:

numpy.ndarray

save_results_to_json(file_path, durations, percentages, transitions)[source]

Write analysis results to a JSON file.

Parameters:
main(cutoff, save_path)[source]

Run the full label analysis pipeline.

Parameters:
  • cutoff (int) – Minimum bout length for filtering (in frames).

  • save_path (str) – Path for the output JSON file.

Feature-level statistics for z-scoring.

Computes per-column mean (mu) and standard deviation (sigma) from a .npy feature matrix and saves them to CSV for use in standardisation and de-standardisation of cluster centroids.

stag.analysis.preprocessing.save_mu_sigma_from_npy(npy_file_path, csv_file_path)[source]

Loads a matrix from a .npy file, calculates mu and sigma for each column, and saves these values to a CSV file.

Parameters: - npy_file_path (str): Path to the .npy file containing the matrix. - csv_file_path (str): Path to save the CSV file with mu and sigma values.

stag.utils — Utility functions

CSV-formatted log handler for Python logging.

Provides CsvFormatter, a logging formatter that writes each log record as a quoted CSV row (level, message).

class stag.utils.csv_formatter.CsvFormatter[source]

Bases: Formatter

format(record)[source]

Format the specified record as text.

The record’s attribute dictionary is used as the operand to a string formatting operation which yields the returned string. Before formatting the dictionary, a couple of preparatory steps are carried out. The message attribute of the record is computed using LogRecord.getMessage(). If the formatting string uses the time (as determined by a call to usesTime(), formatTime() is called to format the event time. If there is exception information, it is formatted using formatException() and appended to the message.

Standardised output path generation for clustering results.

Generates a directory hierarchy and filenames for centroids, labels, and metadata JSON files based on experiment tag, k, deletion size, and deletion position.

stag.utils.filename_generator.generate_filename(parent_dir, tag, num_clusters, deletion_size, deletion_position)[source]