arista.ingest.parsers.alex

Walk an Alex-layout tree and yield IngestRecord instances.

The expected layout is the flat form produced by arista-preprocess (and committed for the in-repo subset):

<root>/<genotype>/<animal_label>/<fiji>.csv

where:

<genotype> is the canonical strain (641 / 605 / nomp_C)
<animal_label> matches arista.ingest.metadata.parse_animal_label() (e.g. WT_02_m, nompC_01_f)
<fiji> is any Fiji ROI filename accepted by arista.constants.parse_fiji_filename() (l_CC01 / r_HC02 / CC_01 / CC01 / lower-case variants)

The deep HCS layout (<genotype>/<date>/<exp>/Arista_<side>/) is detected and surfaced via the skipped list — the parser for it lands in the next ingest sprint.

Stimulus protocol: Alex’s 641 sessions all use ascAmp per his records; this parser defaults to that and the CLI exposes a flag to override per ingest run.

Functions

discover_alex_records(source_root, *[, ...])

Yield one DiscoveryResult per CSV under source_root.

Classes

`DiscoveryResult`(csv_path, record, reason)	Outcome of `discover_alex_records()` for one CSV path.
`IngestRecord`(researcher_name, strain_name, ...)	One ingest-ready unit: dimension lookups + samples in one bundle.

class arista.ingest.parsers.alex.DiscoveryResult(csv_path, record, reason)[source]

Bases: object

Outcome of discover_alex_records() for one CSV path.

Parameters:

csv_path (Path)
record (IngestRecord | None)
reason (str | None)

csv_path: Path

reason: str | None

record: IngestRecord | None

class arista.ingest.parsers.alex.IngestRecord(researcher_name, strain_name, recording_date, sex, animal_number, arista_suffix, cell_type_code, cell_number, hemisphere, stimulus_name, fps, n_samples, duration_s, drift_method, samples_df, source_csv, notes=None)[source]

Bases: object

One ingest-ready unit: dimension lookups + samples in one bundle.

The orchestrator consumes a stream of these and translates each into one animals row (or lookup), one recordings row, and N samples rows.

Parameters:

researcher_name (str)
strain_name (str)
recording_date (str)
sex (str)
animal_number (int)
arista_suffix (str | None)
cell_type_code (str)
cell_number (int)
hemisphere (str | None)
stimulus_name (str)
fps (float)
n_samples (int)
duration_s (float)
drift_method (str)
samples_df (pandas.DataFrame)
source_csv (Path)
notes (str | None)

animal_number: int

arista_suffix: str | None

cell_number: int

cell_type_code: str

drift_method: str

duration_s: float

fps: float

hemisphere: str | None

n_samples: int

notes: str | None = None

recording_date: str

researcher_name: str

samples_df: pandas.DataFrame

sex: str

source_csv: Path

stimulus_name: str

strain_name: str

arista.ingest.parsers.alex.discover_alex_records(source_root, *, stimulus_name='ascAmp')[source]

Yield one DiscoveryResult per CSV under source_root.

Walks <root>/<genotype>/<animal>/<fiji>.csv only. Paths that do not match the flat layout are yielded with record=None and a populated reason; the CLI surfaces them in the startup banner.

Parameters:

source_root (Path) – Root directory (e.g. preprocessed_output/alex/).
stimulus_name (str) – Stimulus protocol to assign to every record. Defaults to ascAmp per Alex’s 641 sessions.

Yields:

DiscoveryResult instances in deterministic alpha order.

Return type:

Iterator[DiscoveryResult]