The Metaweave pipeline is a Python ETL that consumes the form’s emails, decrypts them, and upserts to PostgreSQL. It runs as a single CLI —Documentation Index
Fetch the complete documentation index at: https://docs.appliedaifoundation.org/llms.txt
Use this file to discover all available pages before exploring further.
python -m src.main — with no daemon, no queue, no message broker. Designed to be triggered on a schedule.
Data flow
Stages
Fetcher
Microsoft Graph API. OAuth2 client credentials. Filters unread emails by subject pattern. Marks as read after fetch.
Parser
Extracts the encrypted block between BEGIN/END markers. AES-128-CBC + PKCS7 + base64. Falls back to regex parsing for hand-edited emails.
Mapper
92 scalar fields + 11 arrays → SQLAlchemy model instances. DMS → decimal coords, datetime parsing, type coercion.
Writer
Upsert Vessel by IMO, upsert Voyage by (vessel, voyage_number), delete-then-insert Report by (vessel, report_type, report_datetime_utc) with CASCADE.
Designed for
- Scheduled invocation — typically every 5–15 minutes via cron or a serverless scheduler
- Idempotent reads — marks emails as read after fetch, so a re-run won’t re-process
- Safe re-submissions — corrections replace previous rows by design (CASCADE delete + insert)
- Single mailbox per fleet — one shared Outlook mailbox is the input; the script loops every unread email
Not designed for
- High volume (>10,000 emails/run) — sequential processing, single thread
- Multi-tenant — one Azure app registration → one mailbox → one Postgres database
- Real-time delivery — there’s a polling delay between Submit and ingest (driven by your cron cadence)
See also
- Installation — venv, dependencies, Cloud SQL setup
- Running —
python -m src.mainflags and modes - ETL stages — fetcher / parser / mapper / writer in detail
- Data model — the 17 SQLAlchemy tables
- Bootstrap scripts —
excel_to_history.pyandgenerate_metaweave_test_history.py(inhistory-tools/) - Configuration — environment variables