arista.ingest
Database ingestion — discover preprocessed CSVs, insert into SQLite.
- class arista.ingest.AnimalLabel(strain_prefix, animal_number, sex, arista_suffix)[source]
Bases:
objectParsed animal-directory components.
- strain_prefix
The leading token (e.g.
'WT'/'nompC') present in the directory name. Informational — the canonical strain comes from the genotype directory one level up.- Type:
- class arista.ingest.DiscoveryResult(csv_path, record, reason)[source]
Bases:
objectOutcome of
discover_alex_records()for one CSV path.- Parameters:
csv_path (Path)
record (IngestRecord | None)
reason (str | None)
- record: IngestRecord | None
- class arista.ingest.IngestRecord(researcher_name, strain_name, recording_date, sex, animal_number, arista_suffix, cell_type_code, cell_number, hemisphere, stimulus_name, fps, n_samples, duration_s, drift_method, samples_df, source_csv, notes=None)[source]
Bases:
objectOne ingest-ready unit: dimension lookups + samples in one bundle.
The orchestrator consumes a stream of these and translates each into one
animalsrow (or lookup), onerecordingsrow, and Nsamplesrows.- Parameters:
researcher_name (str)
strain_name (str)
recording_date (str)
sex (str)
animal_number (int)
arista_suffix (str | None)
cell_type_code (str)
cell_number (int)
hemisphere (str | None)
stimulus_name (str)
fps (float)
n_samples (int)
duration_s (float)
drift_method (str)
samples_df (pandas.DataFrame)
source_csv (Path)
notes (str | None)
- samples_df: pandas.DataFrame
- class arista.ingest.IngestStats(inserted_recordings=0, skipped_duplicates=0, errors=0, inserted_samples=0)[source]
Bases:
objectResult tally for one ingest run.
- arista.ingest.discover_alex_records(source_root, *, stimulus_name='ascAmp')[source]
Yield one
DiscoveryResultper CSV undersource_root.Walks
<root>/<genotype>/<animal>/<fiji>.csvonly. Paths that do not match the flat layout are yielded withrecord=Noneand a populatedreason; the CLI surfaces them in the startup banner.- Parameters:
- Yields:
DiscoveryResultinstances in deterministic alpha order.- Return type:
- arista.ingest.discover_laurin_records(source_root)[source]
Yield one
DiscoveryResultper CSV underms-thesis/result/.The expected tree is a flat
<source_root>/*.csvwith no subdirectories — Laurin’s massiveAligner writes all outputs into oneresult/folder regardless of strain or stimulus.- Parameters:
source_root (Path) – Path to
ms-thesis/result/.- Return type:
- arista.ingest.discover_robert_records(source_root)[source]
Yield one
DiscoveryResultper TXT underCompiled_data_pickled/.The expected tree is
<source_root>/<genotype>/*.txtwhere<genotype>is one of CantonS / NompC3_NSybLexALexOpGCamp6 / NompC-HeterozControl / NompCPbac / NompCRescue / NompCOverExpression / NompCGal4-Ctrl-NCBG / NompCGal4-Ctrl-WTBG / UASNompC-Ctrl-NCBG / NSybLexALexOpGCamp6 / ColdAdapt / HotAdapt / AristaBending.Files outside that pattern are yielded with
record=Noneand a populatedreasonso the CLI surface can show them.- Parameters:
source_root (Path) – Path to
Compiled_data_pickled/(or an equivalent tree of<genotype>/<txt>).- Return type:
- arista.ingest.ingest_one(conn, record)[source]
Insert one
IngestRecordand commit.- Returns:
Triple
(recording_id, was_new, n_samples_inserted).was_newisFalsewhen the recording’s natural key was already present (re-ingest skipped); in that case samples are also not re-inserted.- Parameters:
conn (Connection)
record (IngestRecord)
- Return type:
- arista.ingest.ingest_stream(conn, records)[source]
Ingest every record from records. Survives per-record failures.
Errors are logged and counted; the orchestrator does not abort the whole run on one bad recording so a corpus-wide ingest can complete even if a handful of files are malformed.
- Parameters:
conn (Connection)
records (Iterable[IngestRecord])
- Return type:
- arista.ingest.parse_animal_label(label)[source]
Parse an animal-directory name into structured fields.
Returns
Noneiflabeldoes not match the expected pattern, so callers canfilter()without try/except boilerplate.- Parameters:
label (str) – Directory base-name, e.g.
'WT_02_m'or'nompC_01_f'or'WT_02b_m'.- Returns:
An
AnimalLabelif the name parses, elseNone.- Return type:
AnimalLabel | None
- arista.ingest.prepare_db(conn)[source]
Apply schema + seeds. Safe to call on a fresh or populated DB.
- Parameters:
conn (Connection)
- Return type:
None
Modules
Filename + directory-name parsers that recover dimension-table fields. |
|
Insert |
|
Per-source parsers for preprocessed Ca²⁺ recordings. |