arista.preprocess

Headless preprocessing pipeline — public API re-exports.

class arista.preprocess.DriftFit(method, fitted, residual_ssq, aic, params=<factory>)[source]

Bases: object

A fitted drift model plus its evaluation on the full trace.

Parameters:
aic: float
fitted: numpy.ndarray
method: Literal['linear', 'poly', 'exp']
params: dict[str, Any]
residual_ssq: float
class arista.preprocess.FijiRecording(frame, dfbf)[source]

Bases: object

Raw Fiji ROI export — one ΔF/F₀ value per imaging frame.

Parameters:
dfbf: numpy.ndarray
frame: numpy.ndarray
property n_frames: int
class arista.preprocess.Recording(frame, time_s, sensor_t_c, target_t_c, drive_t_c, dfbf, dfbf_drift_corrected=None, drift_method='none', recording_date=None)[source]

Bases: object

Frame-aligned, optionally drift-corrected Ca²⁺ recording.

One row per imaging frame. Matches the column layout of the samples table in [[Database Schema]] so the ingester can bulk-insert directly.

dfbf_drift_corrected is None whenever drift_method == "none".

recording_date is the calendar date the recording started, extracted from the first sensor MAT epoch during alignment. The ingester uses it to populate animals.recording_date.

Parameters:
  • frame (np.ndarray)

  • time_s (np.ndarray)

  • sensor_t_c (np.ndarray)

  • target_t_c (np.ndarray)

  • drive_t_c (np.ndarray | None)

  • dfbf (np.ndarray)

  • dfbf_drift_corrected (np.ndarray | None)

  • drift_method (str)

  • recording_date (str | None)

dfbf: np.ndarray
dfbf_drift_corrected: np.ndarray | None = None
drift_method: str = 'none'
drive_t_c: np.ndarray | None
frame: np.ndarray
property n_frames: int
recording_date: str | None = None
sensor_t_c: np.ndarray
target_t_c: np.ndarray
time_s: np.ndarray
to_dataframe()[source]

Return the recording as a pandas DataFrame with DB-schema columns.

Return type:

pandas.DataFrame

class arista.preprocess.SensorRecord(epoch_time, frame, sensor_t_c, target_t_c, drive_t_c)[source]

Bases: object

Raw MATLAB sensor record — continuously logged 5-column array.

Each MAT file holds a data matrix whose columns are, in order: epoch_time (MATLAB serial datenum), frame index (1-based!), sensor_T (°C, actually measured), target_T (°C, set-point) and drive_T (°C, applied to the Peltier element). The log rate is higher than the imaging rate so multiple sensor rows share the same frame number; arista.preprocess.align() collapses them.

Parameters:
drive_t_c: numpy.ndarray
epoch_time: numpy.ndarray
frame: numpy.ndarray
property n_samples: int
sensor_t_c: numpy.ndarray
target_t_c: numpy.ndarray
arista.preprocess.apply_drift(recording, fit)[source]

Subtract a fit from the ΔF/F trace and return a new Recording.

If fit is None the recording is returned with drift_method = "none" and dfbf_drift_corrected = None (i.e. drift correction explicitly not applied — the original dfbf column remains the source of truth).

Parameters:
Returns:

A new Recording with dfbf_drift_corrected and drift_method filled in.

Return type:

Recording

arista.preprocess.assemble_recording(fiji, sensor)[source]

Run stages A + B + C and return a frame-aligned Recording.

Drift correction is not applied here — that is arista.preprocess.drift’s job. The returned recording has dfbf_drift_corrected = None and drift_method = "none".

The output length is the intersection of the Fiji frame range and the (post-collapse, post-interp) sensor frame range. Any Fiji frames without a matching sensor frame are dropped silently — this matches legacy behaviour and only ever clips a handful of trailing frames in well-formed recordings.

Parameters:
Returns:

A Recording aligned to Fiji’s frame numbers.

Return type:

Recording

arista.preprocess.collapse_to_frames(sensor)[source]

Group the continuous sensor log by frame, take the per-frame mean.

Sensor rows where frame == 0 are pre-stimulus calibration data and dropped (matches the frame > 0 cut in the legacy pipeline). Frame numbers are decremented by 1 so the result is 0-indexed, aligning with Fiji’s frame numbering.

Parameters:

sensor (SensorRecord) – Raw SensorRecord straight from arista.preprocess.io.read_sensor_mat().

Returns:

A DataFrame indexed by integer 0-based frame number, with columns epoch_time, sensor_t_c, target_t_c, drive_t_c. NaN-padded over any missing intermediate frames so interpolate_missing_frames() can fill them.

Return type:

pandas.DataFrame

arista.preprocess.correct_drift(recording, method='auto')[source]

Convenience: fit all candidates, pick the best, apply it.

For headless / batch / CI use. Matches the default behaviour arista-preprocess drift --method auto will expose at the CLI level in Phase 3.

Parameters:
  • recording (Recording)

  • method (Literal['linear', 'poly', 'exp', 'none', 'auto'])

Return type:

Recording

arista.preprocess.fit_all(t, y)[source]

Compute linear, poly and exp fits; return them in a dict by method name.

The exponential fit may fail to converge on flat traces; in that case it is omitted from the returned dict (rather than raising) so the AIC chooser can still pick between linear and poly.

Parameters:
Return type:

dict[str, DriftFit]

arista.preprocess.fit_exponential(t, y)[source]

a·exp(-b·t) + c fit. Mirrors pytci’s fitExp bounds + p0.

Parameters:
Return type:

DriftFit

arista.preprocess.fit_linear(t, y)[source]

Degree-1 polyfit on the first and last _LINEAR_TAIL_FRAMES frames.

Matches pytci’s fitLinear: the fit is trained on pre + post stimulus tails only, then evaluated over the whole trace. This deliberately ignores the stimulus-evoked excursions so the linear component captures photobleach drift rather than the response.

Parameters:
Return type:

DriftFit

arista.preprocess.fit_polynomial(t, y, degree=4)[source]

Degree-degree polyfit over the whole trace (default 4, pytci default).

Parameters:
Return type:

DriftFit

arista.preprocess.interpolate_missing_frames(per_frame)[source]

Linearly interpolate NaN values in a per-frame dataframe.

Replicates aristaSingleCellData.interpolateMissingFrames: any NaN values left after arista.preprocess.align.collapse_to_frames() has reindexed the per-frame table to a contiguous integer range are filled by DataFrame.interpolate(method="linear").

Parameters:

per_frame (pandas.DataFrame) – DataFrame whose index is the integer frame number. Typical columns are epoch_time / sensor_t_c / target_t_c / drive_t_c but the function is agnostic.

Returns:

A new DataFrame with NaNs filled. The index is preserved.

Return type:

pandas.DataFrame

arista.preprocess.is_broken_sensor(sensor, min_rows=1000)[source]

Return True if a sensor MAT file looks truncated.

The heuristic mirrors pytci: a full recording logs the sensor at a much higher rate than the imaging frame rate, so an honest MAT file has thousands of rows. Anything substantially smaller is almost certainly a partially-written file from a crashed acquisition.

Parameters:
Return type:

bool

arista.preprocess.load_template(stimulus_name)[source]

Substitute a median-template sensor trace for a broken MAT.

Not yet implemented in v0.1; raises with a clear pointer to the sprint plan rather than silently returning bogus data.

Parameters:

stimulus_name (str) – Canonical stimulus name (e.g. "adaptation").

Return type:

SensorRecord

arista.preprocess.pick_best(fits, method='auto')[source]

Select one fit from a fit_all() result.

Parameters:
  • fits (dict[str, DriftFit]) – Mapping method_name DriftFit as returned by fit_all().

  • method (Literal['linear', 'poly', 'exp', 'none', 'auto']) – Either "auto" (pick lowest AIC), or one of "linear" / "poly" / "exp" to force that fit, or "none" to apply no correction.

Returns:

The chosen DriftFit, or None if method == "none".

Raises:

ValueError – If method is not a valid choice, or if a forced method is requested but missing from fits.

Return type:

DriftFit | None

arista.preprocess.read_fiji_csv(path)[source]

Read a Fiji ΔF/F₀ ROI export into a FijiRecording.

Accepts any of the three header conventions documented in the module docstring. Header is required (no headerless CSV support; that would silently re-interpret the first frame as a column name and corrupt downstream alignment).

Parameters:

path (Path | str) – Path to the CSV file.

Returns:

Frozen FijiRecording with frame and dfbf as numpy arrays.

Raises:

ValueError – If neither a frame-like nor a value-like column is present, or if frame indices are not monotonically increasing.

Return type:

FijiRecording

arista.preprocess.read_recording_csv(path)[source]

Re-read a canonical write_recording_csv() output.

Parameters:

path (Path | str)

Return type:

Recording

arista.preprocess.read_sensor_mat(path)[source]

Read a MATLAB temperature_data_*.mat sensor record.

The MAT file must contain a top-level variable data shaped (n_samples, 5): [epoch_time, frame, sensor_T, target_T, drive_T].

Parameters:

path (Path | str) – Path to the .mat file.

Returns:

Frozen SensorRecord with the five columns as separate arrays.

Raises:
  • KeyError – If the MAT file lacks a data variable.

  • ValueError – If the data matrix is not 5 columns wide.

Return type:

SensorRecord

arista.preprocess.write_recording_csv(recording, path)[source]

Persist a Recording to disk as a canonical CSV.

Column order matches the samples table in [[Database Schema]] so arista-ingest can COPY-style load without remapping. Two #-prefixed header lines carry recording-level provenance:

  • # drift_method: <method> — which drift correction was applied

  • # recording_date: <YYYY-MM-DD> — calendar date of frame 0, omitted when the recording carries no date

Parameters:
  • recording (Recording) – The recording to write.

  • path (Path | str) – Destination CSV path.

Returns:

The resolved path the file was written to.

Return type:

Path

Modules

align

Frame-align a sensor record against a Fiji ΔF/F trace.

drift

Drift-correction fits and chooser.

interpolate

Fill DAQ-dropped frames via linear interpolation.

io

File I/O for raw inputs and preprocessed outputs.

template_rescue

Detect a broken / truncated sensor MAT file.