Rate Limiting a Voice Agent With One Postgres UPDATE
A real-time voice agent's daily call cap has to be checked before the greeting plays, survive concurrent calls, and roll over at midnight. At Taritas we do all three in one Postgres UPDATE: a CASE expression handles the midnight rollover, the WHERE clause enforces the cap, and RETURNING reports which case fired. One round trip, no race condition, no scheduled job.
Published · Updated · Supreet Tare
All names, numbers, and identifiers in this post are anonymized. The patterns are real.
The requirement
One of our voice agents answers a public-sector phone line. Each tenant gets a daily call cap, 10 by default, adjustable from an admin panel without a deploy. The cap exists for cost control: a stuck loop, a misconfigured dialer, or a runaway bot should not be able to burn a month’s LLM budget in one afternoon.
The requirements are simple to list and annoying to implement correctly:
- The cap is per tenant, not global.
- The counter rolls over at midnight in the client’s timezone.
- Two calls arriving in the same instant cannot both slip in when the counter is one below the cap.
- A database problem must never wrongly reject a caller. If we cannot tell whether the cap is exceeded, we accept the call.
- The check runs before the greeting plays. Otherwise the caller hears a warm welcome followed by an apology, which is worse than an honest rejection upfront.
The designs that do not work
SELECT then UPDATE. Two round trips with a race window between them. Two concurrent callers both read 9 of 10, both decide they are under the cap, both increment, and you have 11 active calls against a cap of 10. On a phone line with bursty business-hours traffic, this is a daily event, not a theoretical one.
SELECT then UPDATE with an optimistic check. Adding WHERE consumed = <observed> to the update closes the race but opens a retry loop: when the update misses because another caller got there first, you try again. Unbounded retries under high concurrency, which is exactly when you least want them, on a path that must finish before the greeting.
An application-level lock. An in-memory lock does not cover pod-to-pod concurrency in a multi-process deployment. A Redis lock works, but it adds a dependency and a failure mode to the rate-limit path. Postgres is already on the call’s critical path for tenant config, so its row-level atomicity costs nothing extra.
The design that shipped
One UPDATE statement does the cap check, the increment, and the midnight rollover:
UPDATE tenant_config
SET
consumed_calls = CASE
WHEN last_reset_date IS DISTINCT FROM ((NOW() AT TIME ZONE 'America/Toronto')::date)
THEN 1
ELSE consumed_calls + 1
END,
last_reset_date = (NOW() AT TIME ZONE 'America/Toronto')::date,
updated_at = NOW()
WHERE tenant_id = %s
AND (
last_reset_date IS DISTINCT FROM ((NOW() AT TIME ZONE 'America/Toronto')::date)
OR consumed_calls < max_calls_per_day
)
RETURNING consumed_calls, max_calls_per_day
Postgres serializes concurrent UPDATEs on the same row, so atomicity is free. Reading the statement piece by piece:
IS DISTINCT FROMis null-safe inequality. On the very first call ever,last_reset_dateis null; a naive<comparison against null returns null and silently breaks the rollover branch.IS DISTINCT FROMtreats null as “different,” so the first call correctly takes the reset path.- The
CASEin the SET clause is the lazy rollover. If the stored reset date is not today in the client’s timezone, the counter resets to 1 (this call is the first of the new day). Otherwise it increments. - The
WHEREclause is the cap. The row matches when either the day has rolled over, or it has not and the counter is under the cap. If neither holds, the UPDATE touches nothing. RETURNINGdistinguishes the outcomes in the same round trip. A row back means the slot was reserved. No row means the cap is hit, and one optional follow-up SELECT fetches the counts purely for accurate logging.
The Python wrapper enforces requirement 4:
async def try_take_call_slot(tenant_id, pool):
"""Returns (allowed, consumed_after, max_per_day). Fail-open by policy:
if the DB cannot answer, the call proceeds."""
if pool is None:
return True, 0, 0
try:
async with pool.connection() as conn:
async with conn.cursor() as cur:
await asyncio.wait_for(cur.execute(SQL, (tenant_id,)), timeout=1.0)
row = await asyncio.wait_for(cur.fetchone(), timeout=1.0)
if row is not None:
return True, int(row[0]), int(row[1])
# No row updated: cap reached. Fetch counts for logging only.
...
return False, consumed, max_per_day
except asyncio.TimeoutError:
return True, 0, 0 # fail-open
except Exception:
return True, 0, 0 # fail-open
Every failure path returns “allowed”: no connection pool, a 1.0 second timeout, any other exception, or a missing tenant row. That is not sloppiness, it is written policy. The rate limit is a cost guardrail, not a security boundary. A wrongly rejected caller costs more than one extra LLM call, and “database outage AND this exact call would have exceeded the cap” is a vanishingly small intersection.
The call site
The check runs in the call entrypoint, before the greeting. On rejection, the agent plays a short canned message that hands the caller the office’s direct number and asks them to call back tomorrow, then closes the room. The post-call cleanup already skips analytics for calls with zero user turns, so rejected calls never pollute the call-history data.
Cost-control checks belong before the user-facing pipeline starts. A caller who hears “thanks for calling” and then gets turned away experienced a worse product than one who got an honest limit message in the first second.
One detail worth copying: the rate-limit query deliberately bypasses the tenant-config cache (a 60 second TTL covers prompts and voice settings) and reads the live value every call, so an admin raising the cap sees the change take effect immediately.
Hardening still on the list
- A counter-runaway alert: if consumed ever exceeds twice the cap, something upstream is wrong (clock skew, a failed rollover, a lowered cap).
- A warning band near the cap, say 80 percent, so an admin can raise the limit before real callers get rejected. The numbers are already returned on every call; nothing acts on them yet.
- Per-tenant timezones. Today every tenant shares one timezone and the SQL hardcodes it; a second region needs it parameterized.
- A metric on rate-limit timeouts per hour, because sustained DB slowness currently means silently exceeding the cap via fail-open.
Key takeaways
- SELECT-then-UPDATE race conditions are a daily reality on bursty phone traffic, not an edge case. Push the check into one atomic statement.
- Lazy rollover beats scheduled rollover: resetting the counter on the first call of the new day removes a cron job, its monitoring, and its failure window.
- Fail-open versus fail-closed is a product decision, not an accident of error handling. Decide it, write it down, and make the code match.
- Run cost-control checks before the greeting, not after.
What this means if you are an IT services firm
If your clients run a voice agent, ask your team three questions: when does the rate-limit check run, what happens when it cannot run, and is that failure mode the one you would choose deliberately? If the answers involve a SELECT followed by an UPDATE, or a cron job that resets counters at midnight, this pattern is a one-afternoon improvement. This is the kind of work we do behind IT services firms, under their brand.
Reading this because a client asked for voice AI? That is the conversation we are built for. What taritas does for partners.