arista.ingest

Database ingestion — discover preprocessed CSVs, insert into SQLite.

class arista.ingest.AnimalLabel(strain_prefix, animal_number, sex, arista_suffix)[source]

Bases: object

Parsed animal-directory components.

Parameters:
  • strain_prefix (str)

  • animal_number (int)

  • sex (str)

  • arista_suffix (str | None)

strain_prefix

The leading token (e.g. 'WT' / 'nompC') present in the directory name. Informational — the canonical strain comes from the genotype directory one level up.

Type:

str

animal_number

The 1-based animal-of-the-day integer.

Type:

int

sex

'm' / 'f' / 'u'.

Type:

str

arista_suffix

'b' for the second arista on the same fly; None otherwise. Robert’s f02b convention is the inspiration; Alex uses this rarely.

Type:

str | None

animal_number: int
arista_suffix: str | None
sex: str
strain_prefix: str
class arista.ingest.DiscoveryResult(csv_path, record, reason)[source]

Bases: object

Outcome of discover_alex_records() for one CSV path.

Parameters:
csv_path: Path
reason: str | None
record: IngestRecord | None
class arista.ingest.IngestRecord(researcher_name, strain_name, recording_date, sex, animal_number, arista_suffix, cell_type_code, cell_number, hemisphere, stimulus_name, fps, n_samples, duration_s, drift_method, samples_df, source_csv, notes=None)[source]

Bases: object

One ingest-ready unit: dimension lookups + samples in one bundle.

The orchestrator consumes a stream of these and translates each into one animals row (or lookup), one recordings row, and N samples rows.

Parameters:
animal_number: int
arista_suffix: str | None
cell_number: int
cell_type_code: str
drift_method: str
duration_s: float
fps: float
hemisphere: str | None
n_samples: int
notes: str | None = None
recording_date: str
researcher_name: str
samples_df: pandas.DataFrame
sex: str
source_csv: Path
stimulus_name: str
strain_name: str
class arista.ingest.IngestStats(inserted_recordings=0, skipped_duplicates=0, errors=0, inserted_samples=0)[source]

Bases: object

Result tally for one ingest run.

Parameters:
  • inserted_recordings (int)

  • skipped_duplicates (int)

  • errors (int)

  • inserted_samples (int)

as_dict()[source]
Return type:

dict[str, int]

errors: int = 0
inserted_recordings: int = 0
inserted_samples: int = 0
skipped_duplicates: int = 0
arista.ingest.discover_alex_records(source_root, *, stimulus_name='ascAmp')[source]

Yield one DiscoveryResult per CSV under source_root.

Walks <root>/<genotype>/<animal>/<fiji>.csv only. Paths that do not match the flat layout are yielded with record=None and a populated reason; the CLI surfaces them in the startup banner.

Parameters:
  • source_root (Path) – Root directory (e.g. preprocessed_output/alex/).

  • stimulus_name (str) – Stimulus protocol to assign to every record. Defaults to ascAmp per Alex’s 641 sessions.

Yields:

DiscoveryResult instances in deterministic alpha order.

Return type:

Iterator[DiscoveryResult]

arista.ingest.discover_laurin_records(source_root)[source]

Yield one DiscoveryResult per CSV under ms-thesis/result/.

The expected tree is a flat <source_root>/*.csv with no subdirectories — Laurin’s massiveAligner writes all outputs into one result/ folder regardless of strain or stimulus.

Parameters:

source_root (Path) – Path to ms-thesis/result/.

Return type:

Iterator[DiscoveryResult]

arista.ingest.discover_robert_records(source_root)[source]

Yield one DiscoveryResult per TXT under Compiled_data_pickled/.

The expected tree is <source_root>/<genotype>/*.txt where <genotype> is one of CantonS / NompC3_NSybLexALexOpGCamp6 / NompC-HeterozControl / NompCPbac / NompCRescue / NompCOverExpression / NompCGal4-Ctrl-NCBG / NompCGal4-Ctrl-WTBG / UASNompC-Ctrl-NCBG / NSybLexALexOpGCamp6 / ColdAdapt / HotAdapt / AristaBending.

Files outside that pattern are yielded with record=None and a populated reason so the CLI surface can show them.

Parameters:

source_root (Path) – Path to Compiled_data_pickled/ (or an equivalent tree of <genotype>/<txt>).

Return type:

Iterator[DiscoveryResult]

arista.ingest.ingest_one(conn, record)[source]

Insert one IngestRecord and commit.

Returns:

Triple (recording_id, was_new, n_samples_inserted). was_new is False when the recording’s natural key was already present (re-ingest skipped); in that case samples are also not re-inserted.

Parameters:
Return type:

tuple[int, bool, int]

arista.ingest.ingest_stream(conn, records)[source]

Ingest every record from records. Survives per-record failures.

Errors are logged and counted; the orchestrator does not abort the whole run on one bad recording so a corpus-wide ingest can complete even if a handful of files are malformed.

Parameters:
Return type:

IngestStats

arista.ingest.parse_animal_label(label)[source]

Parse an animal-directory name into structured fields.

Returns None if label does not match the expected pattern, so callers can filter() without try/except boilerplate.

Parameters:

label (str) – Directory base-name, e.g. 'WT_02_m' or 'nompC_01_f' or 'WT_02b_m'.

Returns:

An AnimalLabel if the name parses, else None.

Return type:

AnimalLabel | None

arista.ingest.prepare_db(conn)[source]

Apply schema + seeds. Safe to call on a fresh or populated DB.

Parameters:

conn (Connection)

Return type:

None

Modules

metadata

Filename + directory-name parsers that recover dimension-table fields.

orchestrator

Insert IngestRecord instances into an arista SQLite DB.

parsers

Per-source parsers for preprocessed Ca²⁺ recordings.