Rate limits + quotas

Every accepted ingest request flows through two gates: a per-second rate limit (token bucket) and a monthly event quota. Both are per-workspace. They are independent — a workspace can be rate-limited without exhausting its quota, and vice versa.

Per-second rate limit

A token bucket with refill rate RATE_LIMIT_RPS and bucket capacity RATE_LIMIT_BURST. Defaults: 100 rps, 200 burst.

When the bucket is empty, the request returns:

HTTP/1.1 429 Too Many Requests
Retry-After: 1
X-Ratelimit-Reason: per_second_rate_limit
Content-Type: application/json

{ "error": { "code": "rate_limit_exceeded", "message": "per-second rate limit exceeded" } }

The SDK retry orchestrator treats 429 as transient and backs off (see Retry). The Retry-After header is always 1 for this gate — the bucket replenishes within one second of refill rate.

Monthly quota

A per-workspace monthly event ceiling. Defaults: 10_000_000 events per calendar month. Two thresholds:

Soft ceiling (QUOTA_SOFT_PCT, default 80%): the request still succeeds but a warning header is attached so operators can set up alerts. Header: X-Ratelimit-Reason: monthly_quota_soft.
Hard ceiling (100%): subsequent requests return:

HTTP/1.1 402 Payment Required
X-Ratelimit-Reason: monthly_quota_exceeded
Content-Type: application/json

{ "error": { "code": "monthly_quota_exceeded", "message": "workspace monthly event quota exhausted" } }

The SDK retry orchestrator treats 402 as permanent for the rest of the month — there is no point retrying when the cause is a quota that won’t reset until midnight on the 1st. This is a deliberate choice to prevent retry storms from sites whose plan is over-limit.

Per-workspace overrides

Defaults set via env are platform-wide. Per-workspace overrides live in the workspaces row:

Column	Purpose
`rate_limit_rps`	Per-second refill rate. NULL falls back to env default.
`rate_limit_burst`	Bucket capacity. NULL falls back to env default.
`quota_monthly_events`	Monthly hard ceiling. NULL falls back to env default.

Operators set these via direct SQL.

What gets counted

Every accepted event counts against the monthly quota — including those that are subsequently dropped by the bot or internal-traffic filter (in Drop mode they were “accepted” by the rate limiter and then dropped by the filter, so they do count). DLQ rows count.

What does NOT count:

Requests rejected at auth / consent (4xx never reaches the counter).
Health checks (/healthz, /metrics — they bypass the limiter entirely).

Performance

The token-bucket check is benchmarked at p99 < 1 µs, well inside the 5 ms ingest budget. The quota counter is incremented once per accepted event inside the same database transaction as the storage write — no extra round trip.

Monitoring

Alerts to consider:

Rate-limit drops sustained for more than 5 min for a single workspace — usually a rogue client or an under-tuned default. The syntarie_events_dropped_total{reason="rate_limit"} metric carries no per-workspace label by design (cardinality), so you must cross-check against operator logs to identify the workspace.
Soft ceiling exceeded — alert per-workspace and contact the customer. Catching this before the hard ceiling avoids end-of-month surprises.
Hard ceiling exceeded — alert per-workspace; the customer’s events are not flowing.

The structured-log line that fires on each rate-limit decision carries the workspace id at debug level for forensic analysis.