arista.ingest.orchestrator
Insert IngestRecord instances into an arista SQLite DB.
The orchestrator runs once per ingest invocation. It is idempotent —
re-running over a corpus that is already in the database is a no-op,
because every INSERT uses OR IGNORE and natural-key UNIQUE
constraints already protect every dimension table (see
[[Database Schema]]).
Provenance: each ingested CSV is registered in source_files with
its sha256 + byte size. The recordings.processed_file_id FK ties
the recording back to the preprocessed CSV; the original Fiji ROI
and sensor MAT are added in a later sprint once
arista.preprocess.write_recording_csv() also persists those
source paths.
Functions
|
Insert one |
|
Ingest every record from records. |
|
Apply schema + seeds. |
Classes
|
Result tally for one ingest run. |
- class arista.ingest.orchestrator.IngestStats(inserted_recordings=0, skipped_duplicates=0, errors=0, inserted_samples=0)[source]
Bases:
objectResult tally for one ingest run.
- arista.ingest.orchestrator.ingest_one(conn, record)[source]
Insert one
IngestRecordand commit.- Returns:
Triple
(recording_id, was_new, n_samples_inserted).was_newisFalsewhen the recording’s natural key was already present (re-ingest skipped); in that case samples are also not re-inserted.- Parameters:
conn (Connection)
record (IngestRecord)
- Return type:
- arista.ingest.orchestrator.ingest_stream(conn, records)[source]
Ingest every record from records. Survives per-record failures.
Errors are logged and counted; the orchestrator does not abort the whole run on one bad recording so a corpus-wide ingest can complete even if a handful of files are malformed.
- Parameters:
conn (Connection)
records (Iterable[IngestRecord])
- Return type:
- arista.ingest.orchestrator.prepare_db(conn)[source]
Apply schema + seeds. Safe to call on a fresh or populated DB.
- Parameters:
conn (Connection)
- Return type:
None