Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.appliedaifoundation.org/llms.txt

Use this file to discover all available pages before exploring further.

All configuration is via environment variables, loaded with python-dotenv. config.py reads the local .env in the pipeline directory; shell variables take precedence.

Required variables

Microsoft Graph (Outlook fetch)

VariablePurpose
AZURE_TENANT_IDYour Azure AD tenant ID — used to build the OAuth2 authority URL
AZURE_CLIENT_IDApp registration’s client ID
AZURE_CLIENT_SECRETApp registration’s client secret
OUTLOOK_USER_EMAILMailbox address the script reads from (e.g. metaweave-forms@yourcompany.com)
The app registration needs application permissions (not delegated) on Microsoft Graph:
  • Mail.Read
  • Mail.ReadWrite (to mark messages as read)
If any of these are missing, the fetcher fails on first call with ValueError: AZURE_TENANT_ID is required.

PostgreSQL (Cloud SQL)

VariablePurpose
GOOGLE_SERVICE_ACCOUNT_BASE64Base64-encoded service account JSON. Decoded in-memory and passed to Cloud SQL Connector
CLOUD_SQL_INSTANCE_CONNECTION_NAMEproject:region:instance — your Cloud SQL instance
POSTGRES_USERDatabase user
POSTGRES_PASSWORDDatabase password (shell-escape $ as \$)
POSTGRES_DBDatabase name
Cloud SQL Connector tunnels via Google’s IAM auth — no public IP needed on the database. The service account must have at least cloudsql.instances.connect and cloudsql.client IAM roles on the instance. If any are missing, the writer fails when the first session is opened with a Cloud SQL Connector error.

Optional variables

Encryption key

VariableDefaultPurpose
MW_AES_KEYmw7k2x9p4q8n3v5h16-byte UTF-8 string used as AES-128-CBC key + IV
If you change this in the pipeline, you must also change it in the form’s CryptoJS config — the two must match exactly, byte-for-byte. The key length must remain 16 bytes (AES-128). To rotate the key:
  1. Generate a new 16-character ASCII string
  2. Update MW_AES_KEY in the pipeline env
  3. Update the form’s embedded key constant
  4. Re-export the configured HTML for each vessel
  5. Distribute the new HTML to the fleet
  6. Existing emails encrypted with the old key will fail decryption — keep both keys for the transition window if needed

Other constants

These live in src/config.py as module constants (not env vars). Override by editing the file.

Markers

MARKER_BEGIN = "BEGIN MW FORM DATA"
MARKER_END = "END MW FORM DATA"
The parser looks for these strings to find the encrypted block. If the form’s markers change, update both sides.

Subject regex

SUBJECT_PATTERN = re.compile(
    r"Metaweave Forms:\s*(.+?)\s*-\s*(.+?)\s*-\s*(\d{2}\.\d{2}\.\d{4})"
)
Captures vessel name, report type, date (DD.MM.YYYY). Subjects that don’t match are filtered out by the fetcher.

Report type mapping

REPORT_TYPE_MAP = {
    "Noon Report":         "NOON",
    "Arrival Notice":      "ARRIVAL",
    "Departure Notice":    "DEPARTURE",
    "Bunker Report":       "BUNKER",
    "Statement of Facts":  "SOF",
}
Maps the human-readable subject text to the canonical type stored in metaweave_report.report_type. Add new entries here when the form ships a new report type.

Sample .env

# Microsoft Graph (Outlook)
AZURE_TENANT_ID=8a9c...e123
AZURE_CLIENT_ID=12345678-aaaa-bbbb-cccc-dddddddddddd
AZURE_CLIENT_SECRET=Wzy~aaaQ.oooHelloThereSecretValue
OUTLOOK_USER_EMAIL=metaweave-forms@yourcompany.com

# Encryption (default works for stock setup)
MW_AES_KEY=mw7k2x9p4q8n3v5h

# Cloud SQL
GOOGLE_SERVICE_ACCOUNT_BASE64=ewogICJ0eXBlIjogInNlcnZpY2VfYWNjb3VudCIsCiAgInByb2plY3RfaWQi...
CLOUD_SQL_INSTANCE_CONNECTION_NAME=my-gcp-project:asia-south1:emissions-db
POSTGRES_USER=metaweave_writer
POSTGRES_PASSWORD=somepassword
POSTGRES_DB=emissions
Encode the service account JSON with:
base64 -w0 service-account.json    # Linux
base64 -i service-account.json     # macOS
(or cat service-account.json | base64 | tr -d '\n')

How config is loaded

src/config.py loads dotenv from .env in the current working directory,
then reads each env var with os.getenv(). Shell vars override .env values.

What’s not configurable

The pipeline does not support:
  • Multiple Outlook mailboxes per run (one mailbox per invocation)
  • Multiple databases per run
  • Different schema per environment (use different POSTGRES_DB)
  • Different table prefix (the metaweave_ prefix is hardcoded)
For multi-tenant deployments, run multiple instances with different env files and orchestrate via your scheduler.

See also