Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Deployment

Production deployment with Caddy + basic auth

For real-world operation, use the production stack under deployment/ in the repo. It puts a Caddy reverse proxy in front of the Rust service with:

  • Automatic TLS via Let’s Encrypt (HTTP/2 and HTTP/3 on by default)
  • Basic auth on the UI and API
  • Postgres and ClickHouse on the internal docker network — no published ports
  • ClickHouse memory-capped at ~2 GB (see deployment/clickhouse-config.xml)

Setup:

cd deployment
cp .env.example .env
$EDITOR .env            # set domain, ACME email, bcrypt hash, DB passwords, KEK
docker compose up -d

deployment/README.md is the authoritative source for setup, user management, password rotation, backups, and troubleshooting.

Authentication boundary

The Rust service ships an in-binary auth stack (GitHub OAuth + opaque API tokens; magic-link sign-in is gated by config). The native auth is the boundary; a basic-auth layer in front of Caddy would double-prompt. Single-tenant deploys behave the same way — sign up as the first user and the operator surface is yours.

/healthz and /readyz are intentionally exposed without auth so uptime probes, load balancers, and orchestrators can hit them. /metrics on the public domain returns 404 — scrape it on the internal docker network instead.

The public status page (/status, /status/*, /api/public/*, /static/*, /robots.txt, /favicon.ico) is also unauthenticated by design — see Public status surface below.

See Authentication for the in-binary flow.

Email provider (Resend)

Transactional email (invitations, magic-link sign-in) goes through the EmailSender trait. Production uses Resend; dev and test default to the log provider, which writes the action URL to the tracing log so you can copy-paste it into a browser.

Setup:

  1. Create a Resend account and verify your sending domain. Resend will give you DKIM and DMARC records to add to DNS.

  2. Generate an API key with emails.send permission only.

  3. Configure the service:

    [email]
    provider = "resend"
    from_name = "Acme Status"
    from_address = "no-reply@status.acme.test"
    
    [email.resend]
    api_key = "re_…"
    

    Or via env: UPTIMEPAGE_EMAIL__PROVIDER=resend, UPTIMEPAGE_EMAIL__RESEND__API_KEY=re_….

  4. auth.public_base_url must be set to the externally-reachable origin (e.g. https://status.acme.test); the value is embedded in the links the recipient receives.

The factory rejects boot when provider = "resend" is set without a non-empty API key — fail-fast over send-time surprise.

Public status surface

The Caddyfile carries an @public matcher that short-circuits basic_auth for the public status paths and adds a per-IP rate limit (60 req/min) via the caddy-ratelimit plugin. The stock caddy:2-alpine image doesn’t include that plugin, so the production deployment uses a custom custom-caddy:2 image built with xcaddy:

docker build -t custom-caddy:2 - <<'EOF'
FROM caddy:2-builder AS builder
RUN xcaddy build --with github.com/mholt/caddy-ratelimit

FROM caddy:2-alpine
COPY --from=builder /usr/bin/caddy /usr/bin/caddy
EOF

Then point the caddy service in deployment/docker-compose.yml at custom-caddy:2. Full procedure (including the opt-out path that drops the rate-limit block) is in deployment/README.md.

The same custom image carries two more per-IP zones: auth_endpoints (10/min on /auth/*, /api/v1/me, invitation accept) and org_creation (3 per 24 h on POST /api/v1/orgs). These are the edge tier; the per-org / per-user budgets the service enforces from each org’s plan are the Quotas & rate limits tier — complementary, since behind the proxy the app sees only the proxy as the peer.

Per-org subdomains (SaaS)

When tenancy.subdomain_public_routes = true, each org’s page is served at {slug}.{public_status.base_domain} (apex-wildcard shape). That needs:

  • a wildcard DNS record *.{domain} pointing at the host (plus explicit A/AAAA records for any operator subdomain — app, mail, etc. — which take precedence over the wildcard);
  • a wildcard TLS cert for *.{domain}. HTTP-01 can’t validate a wildcard, so the custom Caddy image also bundles caddy-dns/hetzner and solves the ACME DNS-01 challenge using a HETZNER_DNS_API_TOKEN (zone-edit scope) from .env. The operator host (app.{domain}) is kept on its own per-host HTTP-01 cert in a separate Caddyfile block so a wildcard-key compromise does not reach the operator surface.

The wildcard means a new org’s page works the moment its owner enables it — no per-org DNS or cert step. The end-to-end runbook (Hetzner zone setup, token scope, building the image, verifying the wildcard cert) is in deployment/README.md. The model — host routing, branding, opt-in gating, cookie scoping — is in Per-org status pages.

For the operator workflow (enabling components, narrating incidents, scheduling maintenance) see Public status page.

Docker

docker compose up -d brings up Postgres 17, ClickHouse 26.3, and the monitor on the same network. Compose env vars wire the monitor to the stack:

UPTIMEPAGE_STORAGE__POSTGRES__URL: postgres://monitor:monitor@postgres:5432/monitor
UPTIMEPAGE_STORAGE__CLICKHOUSE__URL: http://clickhouse:8123
UPTIMEPAGE_STORAGE__CLICKHOUSE__USER: monitor
UPTIMEPAGE_STORAGE__CLICKHOUSE__PASSWORD: monitor
UPTIMEPAGE_OBSERVABILITY__LOG_FORMAT: json

The runtime image is gcr.io/distroless/static-debian12:nonroot for a minimal attack surface, no shell, and no glibc. Built from a static musl binary via rust:1-alpine. Final image is 16 MB — both uptimepage and loadtest binaries fit in the same image.

Bind addresses

Defaults are loopback (127.0.0.1:8080 API, 127.0.0.1:9090 metrics). Override via env for non-loopback exposure:

UPTIMEPAGE_SERVER__API_BIND=0.0.0.0:8080 \
UPTIMEPAGE_SERVER__METRICS_BIND=0.0.0.0:9090 \
./uptimepage

There is no built-in auth on the API port. Front it with a proxy or keep it on a private network. The ready-made Caddy stack under deployment/ does this for you.

Metrics shipping (Grafana Cloud)

The Prometheus /metrics endpoint can be shipped to Grafana Cloud by a Grafana Alloy sidecar. It is opt-in: the compose stack only starts it under the metrics profile (docker compose --profile metrics up -d), so the default deployment is unchanged. Credentials are read from .env (gitignored) and never written into deployment/config.alloy.

deployment/README.md (“Metrics”) is the authoritative setup, including how to obtain the Grafana Cloud URL/token, the internal-network bind, the ready-made dashboard, and how to verify ingestion.

Migrations

  • Postgres: migrations/postgres/*.sql, applied at startup via sqlx::migrate! (tracked in _sqlx_migrations)
  • ClickHouse: migrations/clickhouse/*.sql, applied idempotently via CREATE … IF NOT EXISTS at startup

No external migrator. The app owns its schema lifecycle symmetrically.

Resource sizing

  • checker.max_concurrent_checks caps simultaneous in-flight checks
  • Per-check memory: small (a tokio task + an in-flight hyper request + bookkeeping)
  • The practical ceiling is set by file descriptors and ephemeral ports, not RAM
  • At 50k concurrent checks against external targets, RSS sits around 200-400 MB depending on response sizes
  • The optional metrics profile adds a Grafana Alloy container (~100 MB RSS plus a small bounded remote-write WAL volume) — account for it when sizing the host if you enable it

Graceful shutdown

The binary listens for SIGINT and SIGTERM, cancels the scheduler and batcher via a shared CancellationToken, awaits both background tasks, and exits within 10 s. The batcher’s cancel branch drains any pending results before returning. A warning is logged if the deadline is exceeded.