Concepts#

A short glossary of the terms that show up across the rest of the docs. If a sentence elsewhere uses one of these words and you're not sure what it means, the answer is here. Each entry is one paragraph plus a citation back to the code or another doc when relevant.

If you're new to Airlock, Why Airlock is the shortest path to the pitch; Architecture is the full mental model. This page is the dictionary.

Control plane vs. console#

Airlock has two services we host (or you self-host alongside the worker). They confuse each other constantly, so:

Control plane (CP, cp.airlocklabs.ai) is the API. A Python/FastAPI service that workers, the operator console, and (future) Terraform talk to. It relays MCP tool-call envelopes between agents and workers, persists tenant + audit metadata, and hands out enrollment tokens. Stripe analogy: api.stripe.com.
Console (console.airlocklabs.ai) is the operator UI. A Next.js app that humans log in to. It manages tenants, views audit logs, edits airlock.yaml, runs the playground. It's a thin client over CP — it never holds raw customer data, only renders what CP returns. Stripe analogy: dashboard.stripe.com.

When a doc says "CP," it means the API. When it says "console," it means the UI. They're separate processes on separate hosts.

Worker#

The worker is the only piece of Airlock that runs inside your VPC. It's a Python process that:

Holds your source-DB connection string (DATABASE_URL). The control plane has no field for this and explicitly rejects database_url in airlock.yaml — see worker/airlock/config.py lines 1–6 and 65–82.
Exports DuckDB snapshots on first query, with masking applied at export time so raw rows never enter the snapshot file.
Serves MCP tool calls (get_schema, execute_sql, null_rates) over a long-lived WSS tunnel to CP.
Generates its own Ed25519 keypair on first boot; only the public half ever leaves the worker host.

The worker is what makes "your data stays in your VPC" honest. It's also why deployment is "drop a binary on a host" rather than "give us your DB credentials."

Tenant#

A tenant is the admin boundary inside your Airlock organization. Most customers run one tenant ("Acme Corp"); larger customers split prod and staging into two tenants so configuration changes can be reviewed independently. Each tenant has its own API keys, its own worker(s), its own audit log.

A tenant is not a per-end-user boundary — that's a snapshot. One tenant typically serves many snapshots.

Snapshot#

A snapshot is one DuckDB file on the worker's disk, named <snapshot_id>.duckdb (where the id is a row id from your root_table — see Per-row scoping below). It's lazily exported on first MCP tool call for that snapshot, lives in tmpfs, and gets garbage-collected when its TTL expires.

This is the core privacy property: an agent answering questions about Alice physically cannot see Bob's rows, because Bob's rows are not in Alice's snapshot file.

CP only stores <sid>.meta.json metadata about a snapshot (column counts, mask summary). The actual .duckdb file never leaves the worker — CP cannot exfiltrate snapshots even if compromised, because it doesn't have them.

Per-row scoping (`root_table` / `root_filter_col`)#

Per-row scoping is what makes snapshots per-end-user rather than per-tenant. It's declared in airlock.yaml:

root_table: users
root_filter_col: external_id

When the agent asks about row id alice, the worker materializes a snapshot containing only the rows of users (and joined tables) that match external_id = 'alice'. Default examples: users.external_id for fintech, patients.external_id for healthcare.

(Internal: the worker calls this row id a subject id when querying postgres for the distinct values of root_filter_col. Operators only ever see "snapshot id" in the UI; "subject" is the WHO whose row a snapshot was keyed by.)

Mask policy#

A mask policy is a per-column transform applied at snapshot-export time, declared per-table in airlock.yaml under mask_columns:. Three strategies ship today:

hash — deterministic SHA-based hash. Joinable across tables (so the agent can still answer "did this user appear in table X?") but not reversible without the source DB.
null — the column is wiped. Agents see NULL. Use this for data the agent should never see at all (SSNs, raw card numbers).
redact — free-text PII removal (phone numbers, emails inside longer strings). Pattern-based; the surrounding text is preserved.

Masks are applied before rows hit the snapshot file. The unmasked row never lands on disk. See Configuration for the full reference.

Egress block#

The worker has a SQL parser allow-list. Anything that would reach out of the snapshot — DuckDB's read_csv_auto against an HTTP URL, COPY TO to a network destination, httpfs reads, etc. — is rejected before execution. The audit event for a blocked attempt shows outcome: egress_blocked.

Egress blocks are visible in the live audit feed in the console. They're the most concrete demonstration that "the agent can ask anything but only get back what we let it have."

Audit log#

The audit log is an append-only JSONL file on CP at /state/audit.jsonl. Each event captures metadata only: tool name, snapshot id, latency, outcome (ok / egress_blocked / worker_unavailable / etc.). It does not capture SQL text or result rows — those exist only in CP process memory while a request is in flight (see Mode A vs Mode B) and are discarded on response.

Mode A vs Mode B#

Every payload between worker and CP rides in an "envelope" with a scheme field.

Mode A — scheme="plain" — the deployed default. Base64-JSON over TLS-protected WSS. CP unwraps the envelope, sees SQL text + masked result rows in process memory, relays to the agent. Audit captures metadata only.
Mode B — scheme="x25519-chacha20poly1305" — end-to-end encrypted between the agent client shim and the worker. CP becomes a payload-opaque relay; it can route but not read. The envelope shape is shipped; key exchange + client shim are pending.

Honesty caveat: today, masked query results pass through the Airlock relay (CP) in process memory only. They're never written to disk, never logged, and discarded on response — but a hypothetical CP compromise during a live request could observe them. Mode B closes that gap. Until Mode B ships, the truthful one-liner is "raw data never leaves your VPC; masked results pass through CP in memory only, never persisted, with end-to-end encryption planned."

See Security for the full threat model.

Self-hosted vs hosted#

The worker always runs in your infrastructure. That's the whole point — your DB credentials and raw rows never enter Airlock-hosted infra.

CP and console can be either:

Hosted by us — cp.airlocklabs.ai + console.airlocklabs.ai. Same Docker image as self-hosted, run by Airlock on Fly.io.
Self-hosted by you — same Docker images, deployed next to your worker(s) with a different AIRLOCK_CP_URL. For customers whose compliance posture requires it. The product surface is identical. See Self-hosting the control plane for when to pick this and how to deploy.

In both cases, the worker → CP boundary is the same long-lived WSS tunnel with the same Ed25519 handshake. (Argon2id is the hash CP uses on inbound ak_live_* API keys — a separate auth path on the agent → CP edge.)

Where this fits#

The architecture is summarized at Architecture and the deploy walkthrough at Quickstart. For threat-model-level detail, see Security. For the full machine-readable shape of airlock.yaml, see Configuration.

← Why Airlock

Architecture →