Configuration reference#
airlock.yaml is the canonical config for one worker. It declares which
tables to export per user, how the tables relate, and what to mask. The
worker reads it once at boot; changes require a restart (no hot
reload).
Database credentials are never read from YAML — set DATABASE_URL
in the environment. The validator rejects database_url: in YAML
explicitly so it can't be added by mistake.
Top-level fields#
| Field | Type | Default | Description |
|---|---|---|---|
root_table | string | users | The table that defines a "snapshot owner". One row in this table = one snapshot. |
root_filter_col | string | id | Column on root_table that the agent uses as the user identifier (typically external_id). |
tables | list | [] | Tables to include in each snapshot, listed in topological order (parents before children). See Tables. |
data_dir | string | /dev/shm/airlock | Where per-user .duckdb files are written. Default is RAM-backed tmpfs on Linux; override on macOS dev hosts. |
snapshot_ttl_s | int | 300 | Per-snapshot freshness window in seconds. After this, the next MCP call re-exports synchronously. 0 = re-export every call. |
hints | list | [] | Free-text domain hints surfaced to the LLM as part of get_schema. |
sample_queries | list | [] | Named example SQL queries surfaced to the LLM. See Sample queries. |
tenant_isolation | object | null | Multi-tenant safety guard. See Tenant isolation. |
control_plane | object | null | Tunnel mode config. See Control plane. Without this block, the worker runs in direct FastMCP HTTP mode on localhost:8000. |
Tables#
Each table entry declares one source-DB table that should appear in every per-user snapshot. The list is topologically sorted at boot so parents and via-targets are exported before their dependents.
tables:
- name: users
mask_columns:
email: hash
phone: redact
ssn: "null"
- name: accounts
parent: users
fk: user_id
- name: transactions
parent: accounts
fk: account_id
exclude_columns: [raw_payload]
- name: merchants
via: transactions
via_fk: merchant_id
Table fields#
| Field | Type | Description |
|---|---|---|
name | string | Source-DB table name. Required. |
pk | string | Primary key column name. Default: id. |
parent | string | The table this is a forward-FK child of. Pair with fk. |
fk | string | Column on this table that references parent.pk. Pair with parent. |
via | string | list | The table whose rows pull this one in via reverse FK (e.g. merchants is pulled in by transactions.merchant_id). Pair with via_fk. |
via_fk | string | Column on via that references this table's pk. |
columns | list | Whitelist of columns to project. Empty = SELECT *. |
exclude_columns | list | Columns to drop. Triggers a source-DESCRIBE to enumerate the rest. |
mask_columns | object | column → policy. See Masking. |
local_table | string | Override the local DuckDB table name (default: same as name). |
source_table | string | Override the source-DB table name (default: same as name). Useful when local + source names differ. |
Edge types#
There are two ways a non-root table joins the export graph:
- Forward FK (
parent+fk): rows on this table wherefk IN ( parent.pk). Use when this table is "many" relative to its parent. - Reverse FK (
via+via_fk): rows on this table whosepkis referenced byvia.via_fk. Use when the parent table holds the FK pointing at this one (e.g.transactions.merchant_idpulls inmerchants).
If a table has neither, the export skips it and logs a warning.
Masking#
mask_columns declares per-column PII policies. They run as UPDATE
statements against the local DuckDB snapshot post-copy — your source
database is never modified.
| Policy | Effect | Use for |
|---|---|---|
hash | sha256 hex of the value | Joinable PII (emails, names, account numbers) |
redact | Replaced with "***REDACTED***" | Free-text PII (phone, address) |
null | Replaced with NULL (must be quoted in YAML) | Things the agent should never see (SSN, tokens, password hashes) |
Quote "null" in YAML — unquoted null parses as None and the
masker can't tell what you meant.
Tenant isolation#
For schemas where multiple tenants share a Postgres shard (e.g. one
users table holds both Acme's and Beta's users), the
tenant_isolation block prevents an export from accidentally reaching
across tenants:
tenant_isolation:
column: tenant_id
required: true # default; export refuses to run without --tenant-filter
When set with required: true, the airlock export CLI requires
--tenant-filter <value> and injects it into root_filters for every
per-user export. A user identifier alone can never match rows from
another tenant.
The on-demand export path (per-MCP-call) reads tenant_id from the
worker's control_plane.tenant_id so this guard is always in effect.
Control plane#
control_plane:
cp_url: "wss://cp.airlocklabs.ai/v1/tunnel"
cp_public_key_b64: "EL//8+W+Dy…"
worker_id: "w_acme_prod"
tenant_id: "t_acme"
private_key_path: "/etc/airlock/worker_ed25519.pem"
| Field | Type | Description |
|---|---|---|
cp_url | string | The control plane's tunnel WebSocket URL. |
cp_public_key_b64 | string | Base64-encoded Ed25519 public key of the CP. The worker pins this at install time so a compromised DNS doesn't let an imposter impersonate the CP. Same value across customers. |
worker_id | string | Stable identifier for this worker. The CP uses it to route audit records and admin frames. |
tenant_id | string | Stable identifier for your tenant. The CP uses it for routing; the worker uses it as the audience claim on every JWT it verifies. |
private_key_path | string | Path to the worker's Ed25519 private key, generated locally by airlock register. Mode 0600 recommended. Never transmitted. |
Without this block the worker runs in direct FastMCP HTTP mode on
AIRLOCK_HOST:AIRLOCK_PORT (default localhost:8000) — useful for
local dev, never for production.
Sample queries and hints#
hints:
- Amounts are in cents (divide by 100 for dollars)
- Use transacted_date for date filtering
sample_queries:
- name: Monthly spending
sql: |
SELECT category, COUNT(*) AS txn_count, …
Both surface to the LLM as part of get_schema. Hints are free text;
sample queries are paired name + SQL. Use them to encode domain
knowledge that's hard to infer from column names alone (units,
date conventions, denormalizations the agent shouldn't have to
re-discover on every chat).
Playground prompt (per-tenant system prompt)#
The console's playground builds the agent's system prompt from a generic body plus an optional tenant-specific override:
playground_prompt: |
You are a clinical analyst reviewing one patient's chart. Surface
abnormal labs, medication changes, and encounter trends. Cite the
table + column you queried; never invent values.
Surfaces to the playground via the manifest returned by
get_schema — set this and the operator playground frames the
agent's persona to your vertical. Leave it unset and the playground
falls back to the generic "data-analyst" body. Edit live in the
console at Config → Edit airlock.yaml; the schema-aware editor
autocompletes the field and validates it on save.
Environment variables#
These override or supplement the YAML at runtime:
| Env var | Default | Effect |
|---|---|---|
DATABASE_URL | — | Required for export. postgres://… is supported today. mysql://… is on the roadmap (see Source databases below). Never read from YAML. |
AIRLOCK_DATA_DIR | /dev/shm/airlock | Default data_dir if YAML doesn't set one. macOS dev hosts must override (no /dev/shm). |
AIRLOCK_QUERY_TIMEOUT | 5 | Seconds before a single SQL query is interrupted. |
AIRLOCK_MAX_ROWS | 500 | Row cap on execute_sql results. |
AIRLOCK_EXPORT_CONCURRENCY | 16 | Global semaphore on concurrent exports. Caps source-DB connection burn during cold-start storms. |
AIRLOCK_METRICS_PORT | 9090 | Prometheus scrape port. 0 disables. |
AIRLOCK_HOST | localhost | Direct-mode FastMCP bind host. |
AIRLOCK_PORT | 8000 | Direct-mode FastMCP port. |
Source databases#
Airlock today supports PostgreSQL as the source database. Set
DATABASE_URL to a postgres:// (or postgresql://) URL and the
worker connects with psycopg2 over a read-only role.
export DATABASE_URL='postgresql://airlock_reader:[email protected]:5432/prod'
MySQL (on roadmap)#
MySQL / MariaDB support is not shipped yet. The export driver
infrastructure is in place via DuckDB's mysql extension, but the
on-demand snapshot enrichment paths are postgres-only and we have not
yet validated the MySQL flow against a real source. If you need MySQL,
email [email protected] so we can prioritize correctly.
The intended shape, once shipped, is identical to Postgres apart from the URL scheme:
# Roadmap — not supported today
export DATABASE_URL='mysql://airlock_reader:[email protected]:3306/prod'
airlock.yaml itself stays the same: tables, mask_columns,
tenant_isolation, etc. are driver-agnostic.
Validating a config#
airlock validate --config airlock.yaml
Exits 0 + JSON {"ok": true, "errors": []} on a valid file. Exits 1
with structured errors on a bad one. The console's Config page calls
the same parse path before persisting any write — the validator is the
only gate between an operator typo and a worker that can't restart.
Editing config from the console#
The Console's Config page round-trips airlock.yaml over the
operator-token-gated admin channel. The worker validates each PUT by
parsing on a temp file, atomic-renames into place on success, and logs
config_dirty=true reload_required so you know to roll the deployment.
Workers continue serving the previous config until restart. There is intentionally no hot reload — config writes that change masking policies are too consequential to apply silently.