arista.ingest.orchestrator

Insert IngestRecord instances into an arista SQLite DB.

The orchestrator runs once per ingest invocation. It is idempotent — re-running over a corpus that is already in the database is a no-op, because every INSERT uses OR IGNORE and natural-key UNIQUE constraints already protect every dimension table (see [[Database Schema]]).

Provenance: each ingested CSV is registered in source_files with its sha256 + byte size. The recordings.processed_file_id FK ties the recording back to the preprocessed CSV; the original Fiji ROI and sensor MAT are added in a later sprint once arista.preprocess.write_recording_csv() also persists those source paths.

Functions

ingest_one(conn, record)

Insert one IngestRecord and commit.

ingest_stream(conn, records)

Ingest every record from records.

prepare_db(conn)

Apply schema + seeds.

Classes

IngestStats([inserted_recordings, ...])

Result tally for one ingest run.

class arista.ingest.orchestrator.IngestStats(inserted_recordings=0, skipped_duplicates=0, errors=0, inserted_samples=0)[source]

Bases: object

Result tally for one ingest run.

Parameters:
  • inserted_recordings (int)

  • skipped_duplicates (int)

  • errors (int)

  • inserted_samples (int)

as_dict()[source]
Return type:

dict[str, int]

errors: int = 0
inserted_recordings: int = 0
inserted_samples: int = 0
skipped_duplicates: int = 0
arista.ingest.orchestrator.ingest_one(conn, record)[source]

Insert one IngestRecord and commit.

Returns:

Triple (recording_id, was_new, n_samples_inserted). was_new is False when the recording’s natural key was already present (re-ingest skipped); in that case samples are also not re-inserted.

Parameters:
Return type:

tuple[int, bool, int]

arista.ingest.orchestrator.ingest_stream(conn, records)[source]

Ingest every record from records. Survives per-record failures.

Errors are logged and counted; the orchestrator does not abort the whole run on one bad recording so a corpus-wide ingest can complete even if a handful of files are malformed.

Parameters:
Return type:

IngestStats

arista.ingest.orchestrator.prepare_db(conn)[source]

Apply schema + seeds. Safe to call on a fresh or populated DB.

Parameters:

conn (Connection)

Return type:

None