Production engineering POST-2 5 min read

Rate Limiting a Voice Agent With One Postgres UPDATE

Q: What happens at the exact moment of midnight?

The timezone conversion evaluates at query execution time, so the boundary is sharp. A call arriving a millisecond before midnight gets the old day, a call a millisecond after gets the new day, and each concurrent UPDATE evaluates the boundary independently. There is no race window.

Q: Should a rate-limit check fail open or fail closed?

It depends on what the check protects. A cost-control guardrail should fail open: wrongly rejecting a real caller costs more than one extra call beyond the cap. A security boundary should fail closed. The important part is making the choice explicitly and writing it down, not drifting into whichever your error handling happens to do.

Q: How do you change the cap without a deploy?

The cap lives in a tenant configuration row that an admin UI can update. The rate-limit query reads the live value on every call instead of the cached tenant config, so a change takes effect on the very next call.

Q: Why does the rollover happen inside the UPDATE instead of a midnight cron job?

Lazy rollover removes a moving part. A scheduled reset job can fail to fire, needs monitoring, and leaves a window between midnight and the reset. A CASE expression that resets the counter on the first call of the new day runs on a code path that is already touching the row, so there is nothing extra to operate.

A real-time voice agent's daily call cap has to be checked before the greeting plays, survive concurrent calls, and roll over at midnight. At taritas we do all three in one Postgres UPDATE: a CASE expression handles the midnight rollover, the WHERE clause enforces the cap, and RETURNING reports which case fired. One round trip, no race condition, and no scheduled job to reset counters. The reason it is one statement and not three is that a read-then-write check has a race window under concurrent calls where two callers can both pass a cap of ten per day; doing the decision inside a single atomic UPDATE closes that window in the database itself. We pair it with a 1.0-second query timeout and a documented fail-open policy, so a slow or unreachable database lets a real caller through rather than rejecting them. The transferable rule: when a check must be correct under concurrency, push it into one atomic write.

Published June 12, 2026 · Updated June 24, 2026 · Supreet Tare

All names, numbers, and identifiers in this post are anonymized. The patterns are real.

Diagram of three callers hitting a per-tenant daily call cap: the first is allowed at 9 of 10, the second is rejected at the cap, and the third arrives after midnight and resets the counter to 1, all through the same single UPDATE statement

What does a voice agent’s rate limit actually need to do?

One of our voice agents answers a public-sector phone line. Each tenant gets a daily call cap, 10 by default, adjustable from an admin panel without a deploy. The cap exists for cost control: a stuck loop, a misconfigured dialer, or a runaway bot should not be able to burn a month’s LLM budget in one afternoon.

The requirements are simple to list and annoying to implement correctly:

The cap is per tenant, not global.
The counter rolls over at midnight in the client’s timezone.
Two calls arriving in the same instant cannot both slip in when the counter is one below the cap.
A database problem must never wrongly reject a caller. If we cannot tell whether the cap is exceeded, we accept the call.
The check runs before the greeting plays. Otherwise the caller hears a warm welcome followed by an apology, which is worse than an honest rejection upfront.

Why do the obvious rate-limit designs fail under concurrency?

SELECT then UPDATE. Two round trips with a race window between them. Two concurrent callers both read 9 of 10, both decide they are under the cap, both increment, and you have 11 active calls against a cap of 10. On a phone line with bursty business-hours traffic, this is a daily event, not a theoretical one.

SELECT then UPDATE with an optimistic check. Adding WHERE consumed = <observed> to the update closes the race but opens a retry loop: when the update misses because another caller got there first, you try again. Unbounded retries under high concurrency, which is exactly when you least want them, on a path that must finish before the greeting.

An application-level lock. An in-memory lock does not cover pod-to-pod concurrency in a multi-process deployment. A Redis lock works, but it adds a dependency and a failure mode to the rate-limit path. Postgres is already on the call’s critical path for tenant config, so its row-level atomicity costs nothing extra.

How do you rate limit a voice agent with one Postgres UPDATE?

One UPDATE statement does the cap check, the increment, and the midnight rollover:

UPDATE tenant_config
SET
  consumed_calls = CASE
    WHEN last_reset_date IS DISTINCT FROM ((NOW() AT TIME ZONE 'America/Toronto')::date)
      THEN 1
    ELSE consumed_calls + 1
  END,
  last_reset_date = (NOW() AT TIME ZONE 'America/Toronto')::date,
  updated_at = NOW()
WHERE tenant_id = %s
  AND (
    last_reset_date IS DISTINCT FROM ((NOW() AT TIME ZONE 'America/Toronto')::date)
    OR consumed_calls < max_calls_per_day
  )
RETURNING consumed_calls, max_calls_per_day

Postgres serializes concurrent UPDATEs on the same row, so atomicity is free. Reading the statement piece by piece:

IS DISTINCT FROM is null-safe inequality. On the very first call ever, last_reset_date is null; a naive < comparison against null returns null and silently breaks the rollover branch. IS DISTINCT FROM treats null as “different,” so the first call correctly takes the reset path.
The CASE in the SET clause is the lazy rollover. If the stored reset date is not today in the client’s timezone, the counter resets to 1 (this call is the first of the new day). Otherwise it increments.
The WHERE clause is the cap. The row matches when either the day has rolled over, or it has not and the counter is under the cap. If neither holds, the UPDATE touches nothing.
RETURNING distinguishes the outcomes in the same round trip. A row back means the slot was reserved. No row means the cap is hit, and one optional follow-up SELECT fetches the counts purely for accurate logging.

The Python wrapper enforces requirement 4:

async def try_take_call_slot(tenant_id, pool):
    """Returns (allowed, consumed_after, max_per_day). Fail-open by policy:
    if the DB cannot answer, the call proceeds."""
    if pool is None:
        return True, 0, 0
    try:
        async with pool.connection() as conn:
            async with conn.cursor() as cur:
                await asyncio.wait_for(cur.execute(SQL, (tenant_id,)), timeout=1.0)
                row = await asyncio.wait_for(cur.fetchone(), timeout=1.0)
                if row is not None:
                    return True, int(row[0]), int(row[1])
                # No row updated: cap reached. Fetch counts for logging only.
                ...
                return False, consumed, max_per_day
    except asyncio.TimeoutError:
        return True, 0, 0   # fail-open
    except Exception:
        return True, 0, 0   # fail-open

Every failure path returns “allowed”: no connection pool, a 1.0 second timeout, any other exception, or a missing tenant row. That is not sloppiness, it is written policy. The rate limit is a cost guardrail, not a security boundary. A wrongly rejected caller costs more than one extra LLM call, and “database outage AND this exact call would have exceeded the cap” is a vanishingly small intersection.

The call site

The check runs in the call entrypoint, before the greeting. On rejection, the agent plays a short canned message that hands the caller the office’s direct number and asks them to call back tomorrow, then closes the room. The post-call cleanup already skips analytics for calls with zero user turns, so rejected calls never pollute the call-history data.

Cost-control checks belong before the user-facing pipeline starts. A caller who hears “thanks for calling” and then gets turned away experienced a worse product than one who got an honest limit message in the first second.

One detail worth copying: the rate-limit query deliberately bypasses the tenant-config cache (a 60 second TTL covers prompts and voice settings) and reads the live value every call, so an admin raising the cap sees the change take effect immediately.

Hardening still on the list

A counter-runaway alert: if consumed ever exceeds twice the cap, something upstream is wrong (clock skew, a failed rollover, a lowered cap).
A warning band near the cap, say 80 percent, so an admin can raise the limit before real callers get rejected. The numbers are already returned on every call; nothing acts on them yet.
Per-tenant timezones. Today every tenant shares one timezone and the SQL hardcodes it; a second region needs it parameterized.
A metric on rate-limit timeouts per hour, because sustained DB slowness currently means silently exceeding the cap via fail-open.

Key takeaways

SELECT-then-UPDATE race conditions are a daily reality on bursty phone traffic, not an edge case. Push the check into one atomic statement.
Lazy rollover beats scheduled rollover: resetting the counter on the first call of the new day removes a cron job, its monitoring, and its failure window.
Fail-open versus fail-closed is a product decision, not an accident of error handling. Decide it, write it down, and make the code match.
Run cost-control checks before the greeting, not after.

What this means if you are an IT services firm

If your clients run a voice agent, ask your team three questions: when does the rate-limit check run, what happens when it cannot run, and is that failure mode the one you would choose deliberately? If the answers involve a SELECT followed by an UPDATE, or a cron job that resets counters at midnight, this pattern is a one-afternoon improvement. This is the kind of work we do behind IT services firms, under their brand.

Voice Agent Responsiveness: The Latency Callers Feel, and a Fix That Broke Silently

Production engineering July 6, 2026 5 min read

How Long Does It Take to Deploy a Voice AI Receptionist?

 PROJECT taritas.com/blog 
 DWG POST-2 
 REV 1.0 
 DATE 2026-06-24