arista.preprocess

class arista.preprocess.Recording(frame, time_s, sensor_t_c, target_t_c, drive_t_c, dfbf, dfbf_drift_corrected=None, drift_method='none', recording_date=None)[source]

Bases: object

Frame-aligned, optionally drift-corrected Ca²⁺ recording.

One row per imaging frame. Matches the column layout of the samples table in [[Database Schema]] so the ingester can bulk-insert directly.

dfbf_drift_corrected is None whenever drift_method == "none".

recording_date is the calendar date the recording started, extracted from the first sensor MAT epoch during alignment. The ingester uses it to populate animals.recording_date.

Parameters:

frame (np.ndarray)
time_s (np.ndarray)
sensor_t_c (np.ndarray)
target_t_c (np.ndarray)
drive_t_c (np.ndarray | None)
dfbf (np.ndarray)
dfbf_drift_corrected (np.ndarray | None)
drift_method (str)
recording_date (str | None)

dfbf: np.ndarray

dfbf_drift_corrected: np.ndarray | None = None

drift_method: str = 'none'

drive_t_c: np.ndarray | None

frame: np.ndarray

property n_frames: int

recording_date: str | None = None

sensor_t_c: np.ndarray

target_t_c: np.ndarray

time_s: np.ndarray

to_dataframe()[source]

Return the recording as a pandas DataFrame with DB-schema columns.

Return type:: pandas.DataFrame

class arista.preprocess.SensorRecord(epoch_time, frame, sensor_t_c, target_t_c, drive_t_c)[source]

Bases: object

Raw MATLAB sensor record — continuously logged 5-column array.

Each MAT file holds a data matrix whose columns are, in order: epoch_time (MATLAB serial datenum), frame index (1-based!), sensor_T (°C, actually measured), target_T (°C, set-point) and drive_T (°C, applied to the Peltier element). The log rate is higher than the imaging rate so multiple sensor rows share the same frame number; arista.preprocess.align() collapses them.

Parameters:

epoch_time (numpy.ndarray)
frame (numpy.ndarray)
sensor_t_c (numpy.ndarray)
target_t_c (numpy.ndarray)
drive_t_c (numpy.ndarray)

drive_t_c: numpy.ndarray

epoch_time: numpy.ndarray

frame: numpy.ndarray

property n_samples: int

sensor_t_c: numpy.ndarray

target_t_c: numpy.ndarray

arista.preprocess.apply_drift(recording, fit)[source]

Subtract a fit from the ΔF/F trace and return a new Recording.

If fit is None the recording is returned with drift_method = "none" and dfbf_drift_corrected = None (i.e. drift correction explicitly not applied — the original dfbf column remains the source of truth).

Parameters:

recording (Recording) – An aligned Recording from arista.preprocess.align.assemble_recording().
fit (DriftFit | None) – The chosen DriftFit, or None.

Returns:

A new Recording with dfbf_drift_corrected and drift_method filled in.

Return type:

Recording

arista.preprocess.assemble_recording(fiji, sensor)[source]

Run stages A + B + C and return a frame-aligned Recording.

Drift correction is not applied here — that is arista.preprocess.drift’s job. The returned recording has dfbf_drift_corrected = None and drift_method = "none".

The output length is the intersection of the Fiji frame range and the (post-collapse, post-interp) sensor frame range. Any Fiji frames without a matching sensor frame are dropped silently — this matches legacy behaviour and only ever clips a handful of trailing frames in well-formed recordings.

Parameters:

fiji (FijiRecording) – Fiji ΔF/F₀ trace.
sensor (SensorRecord) – Raw sensor record.

Returns:

A Recording aligned to Fiji’s frame numbers.

Return type:

Recording

arista.preprocess.collapse_to_frames(sensor)[source]

Group the continuous sensor log by frame, take the per-frame mean.

Sensor rows where frame == 0 are pre-stimulus calibration data and dropped (matches the frame > 0 cut in the legacy pipeline). Frame numbers are decremented by 1 so the result is 0-indexed, aligning with Fiji’s frame numbering.

Parameters:: sensor (SensorRecord) – Raw SensorRecord straight from arista.preprocess.io.read_sensor_mat().
Returns:: A DataFrame indexed by integer 0-based frame number, with columns epoch_time, sensor_t_c, target_t_c, drive_t_c. NaN-padded over any missing intermediate frames so interpolate_missing_frames() can fill them.
Return type:: pandas.DataFrame

arista.preprocess.correct_drift(recording, method='auto')[source]

Convenience: fit all candidates, pick the best, apply it.

For headless / batch / CI use. Matches the default behaviour arista-preprocess drift --method auto will expose at the CLI level in Phase 3.

Parameters:

recording (Recording)
method (Literal['linear', 'poly', 'exp', 'none', 'auto'])

Return type:

Recording

arista.preprocess.fit_all(t, y)[source]

Compute linear, poly and exp fits; return them in a dict by method name.

The exponential fit may fail to converge on flat traces; in that case it is omitted from the returned dict (rather than raising) so the AIC chooser can still pick between linear and poly.

Parameters:

t (numpy.ndarray)
y (numpy.ndarray)

Return type:

dict[str, DriftFit]

arista.preprocess.fit_exponential(t, y)[source]

a·exp(-b·t) + c fit. Mirrors pytci’s fitExp bounds + p0.

Parameters:

t (numpy.ndarray)
y (numpy.ndarray)

Return type:

DriftFit

arista.preprocess.fit_linear(t, y)[source]

Degree-1 polyfit on the first and last _LINEAR_TAIL_FRAMES frames.

Matches pytci’s fitLinear: the fit is trained on pre + post stimulus tails only, then evaluated over the whole trace. This deliberately ignores the stimulus-evoked excursions so the linear component captures photobleach drift rather than the response.

Parameters:

t (numpy.ndarray)
y (numpy.ndarray)

Return type:

DriftFit

arista.preprocess.fit_polynomial(t, y, degree=4)[source]

Degree-degree polyfit over the whole trace (default 4, pytci default).

Parameters:

t (numpy.ndarray)
y (numpy.ndarray)
degree (int)

Return type:

DriftFit

arista.preprocess.interpolate_missing_frames(per_frame)[source]

Linearly interpolate NaN values in a per-frame dataframe.

Replicates aristaSingleCellData.interpolateMissingFrames: any NaN values left after arista.preprocess.align.collapse_to_frames() has reindexed the per-frame table to a contiguous integer range are filled by DataFrame.interpolate(method="linear").

Parameters:: per_frame (pandas.DataFrame) – DataFrame whose index is the integer frame number. Typical columns are epoch_time / sensor_t_c / target_t_c / drive_t_c but the function is agnostic.
Returns:: A new DataFrame with NaNs filled. The index is preserved.
Return type:: pandas.DataFrame

arista.preprocess.is_broken_sensor(sensor, min_rows=1000)[source]

Return True if a sensor MAT file looks truncated.

The heuristic mirrors pytci: a full recording logs the sensor at a much higher rate than the imaging frame rate, so an honest MAT file has thousands of rows. Anything substantially smaller is almost certainly a partially-written file from a crashed acquisition.

Parameters:

sensor (SensorRecord) – Sensor record from arista.preprocess.io.read_sensor_mat().
min_rows (int) – Threshold below which we declare the file broken (default 1000, matching pytci).

Return type:

bool

arista.preprocess.load_template(stimulus_name)[source]

Substitute a median-template sensor trace for a broken MAT.

Not yet implemented in v0.1; raises with a clear pointer to the sprint plan rather than silently returning bogus data.

Parameters:: stimulus_name (str) – Canonical stimulus name (e.g. "adaptation").
Return type:: SensorRecord

arista.preprocess.pick_best(fits, method='auto')[source]

Select one fit from a fit_all() result.

Parameters:

fits (dict[str, DriftFit]) – Mapping method_name → DriftFit as returned by fit_all().
method (Literal['linear', 'poly', 'exp', 'none', 'auto']) – Either "auto" (pick lowest AIC), or one of "linear" / "poly" / "exp" to force that fit, or "none" to apply no correction.

Returns:

The chosen DriftFit, or None if method == "none".

Raises:

ValueError – If method is not a valid choice, or if a forced method is requested but missing from fits.

Return type:

DriftFit | None

arista.preprocess.read_fiji_csv(path)[source]

Read a Fiji ΔF/F₀ ROI export into a FijiRecording.

Accepts any of the three header conventions documented in the module docstring. Header is required (no headerless CSV support; that would silently re-interpret the first frame as a column name and corrupt downstream alignment).

Parameters:: path (Path | str) – Path to the CSV file.
Returns:: Frozen FijiRecording with frame and dfbf as numpy arrays.
Raises:: ValueError – If neither a frame-like nor a value-like column is present, or if frame indices are not monotonically increasing.
Return type:: FijiRecording

arista.preprocess.read_recording_csv(path)[source]

Re-read a canonical write_recording_csv() output.

Parameters:: path (Path | str)
Return type:: Recording

arista.preprocess.read_sensor_mat(path)[source]

Read a MATLAB temperature_data_*.mat sensor record.

The MAT file must contain a top-level variable data shaped (n_samples, 5): [epoch_time, frame, sensor_T, target_T, drive_T].

Parameters:

path (Path | str) – Path to the .mat file.

Returns:

Frozen SensorRecord with the five columns as separate arrays.

Raises:

KeyError – If the MAT file lacks a data variable.
ValueError – If the data matrix is not 5 columns wide.

Return type:

SensorRecord

arista.preprocess.write_recording_csv(recording, path)[source]

Persist a Recording to disk as a canonical CSV.

Column order matches the samples table in [[Database Schema]] so arista-ingest can COPY-style load without remapping. Two #-prefixed header lines carry recording-level provenance:

# drift_method: <method> — which drift correction was applied
# recording_date: <YYYY-MM-DD> — calendar date of frame 0, omitted when the recording carries no date

Parameters:

recording (Recording) – The recording to write.
path (Path | str) – Destination CSV path.

Returns:

The resolved path the file was written to.

Return type:

Path

Modules

`align`	Frame-align a sensor record against a Fiji ΔF/F trace.
`drift`	Drift-correction fits and chooser.
`interpolate`	Fill DAQ-dropped frames via linear interpolation.
`io`	File I/O for raw inputs and preprocessed outputs.
`template_rescue`	Detect a broken / truncated sensor MAT file.