Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Quotas & rate limits

Every organization is bound to a plan. The plan is the single source of truth for resource quotas and per-minute rate budgets — the number a request is enforced at is the same number the API reports back. Adding a paid tier later is one row in the plans table plus a UI page; nothing in the enforcement path changes.

The free plan

Shipped and seeded on first migration. Generous for a small team, bounded enough to keep abuse on a small VM cheap.

QuotaFreeMeaning
max_targets10Monitored targets in the org
min_check_interval_secs60Plan-side floor on a target’s check interval. The effective floor is max(this, kind_min)kind_min is 3600 for tls_cert / domain_expiry and 10 for http / tcp / dns.
retention_days90Informational — actual check-result retention is the flat ClickHouse table TTL (90d for every org), not this column
max_members5Active members in the org
max_pending_invitations10Outstanding (unaccepted) invitations
max_api_tokens_per_user5API tokens a single user may hold
max_status_pages1Public status pages the org can run
max_public_components10Distinct monitors published across all of the org’s pages (a monitor on several pages counts once)
max_maintenance_windows20Scheduled maintenance windows
max_notification_channels20Notification channels (Slack/webhook/Telegram/WhatsApp/SMS/…) in the org
max_logo_size_bytes1048576Status-page logo upload ceiling (1 MiB)
Rate budget (per minute)FreeCategory
api_writes_per_minute600POST/PATCH/DELETE on /api/v1/*
api_reads_per_minute6000GET/HEAD/OPTIONS on /api/v1/*
bulk_ops_per_minute30/api/v1/targets/bulk*
test_now_per_minute60POST /api/v1/targets/test + the notification-channel test endpoints
check_now_per_minute60POST /api/v1/targets/{id}/check-now

How quotas are enforced

A resource quota is checked atomically at the write, not by a check-then-act in the handler. The friendly handler-side pre-check exists only to produce a clean error on the common, uncontended path; the race-safe guarantee is in the store:

  • Targets — the count bound is inside the INSERT (single and bulk), handed the same max_targets. Concurrent creates at limit - 1 settle at exactly limit, never more.
  • Members — the membership insert runs under a per-org advisory lock, counts, and rolls itself back if it crossed max_members. Re-adding an existing member stays a no-op.
  • Pending invitations — dedupe and the pending cap are enforced in one transaction under the same per-org lock; parallel duplicate-email invites yield exactly one row.
  • Public components — flipping a target public is gated on create, bulk, and PATCH (so “create private, then edit public” cannot bypass the cap).
  • API tokens — count-in-INSERT, scoped per user, handed max_api_tokens_per_user.

Exceeding a resource quota returns 422:

{
  "error": {
    "code": "QUOTA_EXCEEDED",
    "message": "max_targets limit reached: 10 of 10 used on the free plan.",
    "field": null,
    "details": { "quota": "max_targets", "current": 10, "limit": 10, "plan": "free" },
    "trace_id": null
  }
}

The pending-invitation cap is the one exception to the code: it predates the unified envelope and returns 409 INVITATIONS_LIMIT. The cap itself is enforced identically (atomic, never overshoot).

A sub-minimum check interval is its own 422, MIN_CHECK_INTERVAL, enforced on create and PATCH, single and bulk — a target created at the floor cannot be edited below it. The floor is max(plan.min_check_interval_secs, kind_min): the per-kind value (3600 for tls_cert / domain_expiry, 10 for the rest) applies regardless of plan tier — polling an expiry probe faster than once an hour yields no signal.

Rate limiting

Two app-side tiers, both keyed on the authenticated subject (never the TCP peer): (org, category) and (user, category). Both are checked; the org tier fires first because it protects shared resources. The per-minute budget comes from the org’s plan. The request category is derived from the path and method:

  • path contains /bulkbulk_ops
  • path ends /testtest_now
  • path ends /check-nowcheck_now
  • otherwise GET/HEAD/OPTIONSapi_reads, else → api_writes

Exceeding a budget returns 429 with a Retry-After header:

{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Too many requests.",
    "field": null,
    "details": { "scope": "per_org_api_writes", "retry_after_secs": 30 },
    "trace_id": null
  }
}

The limiter is a governor cell per (scope, category) key in a DashMap. A janitor evicts entries idle past the threshold so the map stays bounded by the number of active tenants, not by request volume; its lifetime is bound to the limiter so a refactor cannot silently drop the sweep and leak the map. Unauthenticated requests fall through untouched — per-IP limiting for those (auth endpoints, org creation, the public status surface) is the reverse proxy’s job; see Deployment.

Checks themselves are not rate-limited — the scheduler path never enters this middleware, so monitoring throughput is unaffected.

Every quota / rate-limit / abuse rejection is recorded to the append-only quota_events table (event, quota_name, details, hashed IP) as fire-and-forget — it never blocks the response. It is the data source for abuse review.

Usage transparency

EndpointReturns
GET /api/v1/orgs/{id}/usagePlan + current vs limit for every org-scoped quota, policy values, rate budgets, feature flags. Member-gated (a non-member gets the same 404 as GET /orgs/{id}).
GET /api/v1/me/usageThe caller’s api_tokens and owned_orgs current/limit.

The operator UI surfaces the same numbers at /settings/usage as progress bars (an unlimited self-host limit renders as ∞). Reported limit == enforced limit by construction: both read the same plan and the same count query.

Anti-abuse

Two deny-lists, applied when a target is created, bulk-created, updated, or test-run. A block is a 400, audited to quota_events with event = abuse_blocked.

  • URL patterns — a case-insensitive regex set of attack-recon paths (exposed VCS dirs, .env, credential paths, admin panels, WordPress xmlrpc pingback, Spring actuator, backup/dump extensions, …). A match is 400 URL_PATTERN_BLOCKED / ABUSE_BLOCKED. The shipped patterns and the compiled fallback are kept byte-identical by a drift guard.
  • Domains — a YAML deny-list (config/abuse_denylist.yaml) matched hierarchically: listing example.com also blocks eu.status.example.com. It carries the operator’s own domain (don’t monitor yourself) and competing uptime/status providers (monitoring another monitor forms a load-amplification chain). A match is 400 DOMAIN_DENYLISTED. Dedicated monitoring SaaS are listed at the apex; multi-tenant status-page hosts are listed narrowly so legitimate vendor-status checks are not over-blocked.

The list loads once at startup; changes need a restart in this release. A bad regex or malformed YAML is a clean startup config error, never a crash loop.

Configuration

[quotas]
plan_cache_ttl_secs  = 300   # org→plan cache; a plans-table edit takes
usage_cache_ttl_secs = 10    #   effect within this window

A plans-table change is invisible until the plan cache’s TTL elapses (a cache hit is zero DB round-trips on the hot path), then the next lookup refetches.

Single-tenant deploys raise limits the same way SaaS does: edit (or INSERT) the plans row the org is assigned to, or attach a plan_overrides row with the cap fields you want to raise. There is no config-side override knob — every quota lives in Postgres so the audit-trail covers both modes.

Every numeric quota / rate / interval is validated at config load — < 1 is rejected with the offending field named, never a panic in router or limiter construction.

The reverse-proxy per-IP tiers (auth endpoints, org creation, public surface) are documented in Deployment.