Field notes June 30, 2026 8 min read Debugging a Voice Agent's Behavior, Not Its Code: The First Live Call You debug a voice agent's behavior by listening to a real call, not by reading its code. At taritas, the first live call of a chronic-pain education agent passed every text test yet still failed out loud. It talked too long, answered its own questions, and dropped its changed footing after a crisis moment. None of that was a code bug. The fix was four edits to the system prompt: a numeric length cap, a stop rule that ends a turn at a question, a rule that keeps crisis awareness alive for the rest of the session, and one reminder repeated at the end of the prompt. We added no new branches. The one true code bug, a name the avatar invented, came from a hardcoded string the prompt could not see. The lesson is to sort failures by root cause first: behavior problems belong in the prompt, and identity problems belong in the code.
→ Production engineering June 24, 2026 8 min read Semantic Turn Detection: How a Voice Agent Knows You're Done Turn detection is how a voice agent decides you have finished speaking. Once it decides, it replies. The simplest method is a silence timer. The agent waits a fixed pause after you stop, then treats the turn as over. That one number forces a bad trade. At taritas we lowered the silence window from 1.5 to 1.0 seconds. We wanted lower latency, and replies did come faster. But shorter silence had a cost. Our speech-to-text began splitting one spoken sentence into fragments whenever a caller paused. More fragments meant more turns marked interrupted. That stress exposed a bug that put the call's opening greeting on a later turn. The honest fix is to stop deciding turns by silence alone. Semantic turn detection is the upgrade. It is a small model that reads the words so far and judges, by meaning, whether you are done. On LiveKit it runs next to voice-activity detection, and the trade mostly goes away.
→ Production engineering June 24, 2026 7 min read How to Integrate Retell AI With Google Calendar for Booking Retell AI does not ship a Google Calendar booking toggle, so the integration is a custom function: the agent calls a webhook on your backend mid-conversation, and your backend talks to the Google Calendar API. The demo version of that is a weekend of work. The production version turns on three details the happy path skips. First, latency: the function runs during a live call, so the round trip has to stay near two seconds or the caller hears silence, which means keeping the on-call work small and letting the agent speak while it runs. Second, idempotency: voice tool calls get retried, so the booking has to carry a deterministic event id that turns a duplicate insert into a no-op instead of a double booking. Third, time zones: a caller who says three o'clock means a wall-clock time, so every event needs an explicit IANA time zone or you book the wrong hour. Get those three right and the booking holds on a real phone line.
→ Build decisions June 23, 2026 7 min read Why We Rejected the Realtime Voice API: Unit Economics From a Live Deployment Because a realtime speech-to-speech API costs more per minute than the price the customer pays. At taritas we measured a realtime API at about 25 to 35 cents per minute against a standard price of 18 cents per minute, while our cascade pipeline runs near 3 cents per minute in variable cost. A voice agent is priced by the minute, so any architecture that costs more per minute than the line charges cannot scale into profit. The cascade also carries a fixed infrastructure base of about 1,900 Canadian dollars a month, which amortizes from 2.28 dollars per minute at one tenant down to about 26 cents at twenty-five, crossing below the 18 cent price as customers share it. That is why the decision is reversible rather than ideological: if realtime pricing ever drops below the line, the same arithmetic flips. Capability was never the blocker. Arithmetic was.
→ Procurement and compliance June 23, 2026 7 min read Can a Voice AI Pass a Healthcare Security Review? We Ran One on Launch Day Yes, if it is built for one. On a healthcare launch, taritas ran a structured security review against the live code and found one high-severity authentication bypass and two medium issues. All three were fixed and verified with tests the same day, and the client's acceptance package included the report with the real findings still in it. What made the result credible to the procurement reviewer was not a clean report but the evidence beside each fix: 51 tests passing and 0 failing, plus a curl reproduction of the exploit that now returns 401 instead of 201. The transferable pattern for an IT services firm is that any endpoint your agent calls back into is an authentication surface, and the shared secret guarding it must be compared in constant time. A voice AI passes a security review when the team can close findings fast and show its work, not when the report comes back empty.
→ Procurement and compliance June 18, 2026 4 min read Data Residency for Voice AI: Why Your Region Decides Your Voice For a regulated voice AI deployment, data residency is not only a storage rule. It quietly decides which models and voices you can use, because a managed provider's catalog is not identical in every region: the flagship voice in the demo may not exist in the region your rules require. At taritas we hit this directly. The natural managed voice we wanted was not available in our client's required region, so we self-hosted an open model in-region to keep both residency and quality, then moved back once the managed voice reached that region in early 2026. The practical rule is to pin the region first, audit what is actually available there, and treat self-hosting in-region as the escape hatch when the managed catalog is thin. Budget for that escape hatch, because in-region self-hosting adds a GPU operations program that a managed voice would otherwise have absorbed for you.
→ Build decisions June 18, 2026 7 min read Chatterbox vs Azure Dragon HD: Choosing a Voice Agent's TTS in Production Text-to-speech is the decision that most shapes how a voice agent sounds and what it costs. At taritas we made the same choice three times for one production agent: a managed cloud voice, then self-hosted Chatterbox for quality, then back to a managed Azure Dragon HD voice once it cleared the bar. The honest comparison is that self-hosted Chatterbox wins on control and in-region flexibility but turns a line item into a GPU operations program, while managed Azure Dragon HD wins on cost and simplicity once its quality is good enough in your region. When the managed HD voice later cleared quality, dropping the GPU cut our text-to-speech cost by about 54 percent, so the right answer changed with the calendar. Neither option is universally correct: pick the cheapest voice that clears your naturalness bar in your region, and re-audition every major release rather than trusting an old verdict.
→ Procurement and compliance June 17, 2026 6 min read What Happens When Your Voice AI Breaks in Production The demo never shows you what happens at 2 a.m. when the agent stops answering and a real customer is on the line. Before buying voice AI, the operations questions decide more than the feature list: who is on call, how fast you hear about an outage, whether a failure degrades safely, and what the postmortem looks like. At taritas we run production voice agents with on-call rotations, a communication discipline kept separate from the debugging, guardrails that fail open, and blameless postmortems. The reason this matters is concrete: our worst outages were not exotic bugs but quiet defaults, a 15.0-second proxy timeout and a 100-calls-per-day cap left over from testing, each of which stopped real calls until someone found it. In an always-on, per-minute product, how you handle the break is what the customer remembers, so treat incident response as a feature you ship, not a thing you improvise.
→ Build decisions June 16, 2026 7 min read White-Label Voice AI for IT Services Firms: 7 Production Lessons At taritas we build white-label voice AI that regional IT services firms resell under their own brand, and the same lessons hold on every engagement. The ones that decide whether it works are rarely about the model: who owns the customer and the intellectual property, whether the per-minute price can carry the architecture, and the configuration and operational discipline that keeps a live agent up. Across production builds those lessons came with numbers: realtime APIs at 25 to 35 cents per minute against an 18 cent line price, a go-live outage from a 100-calls-per-day cap, a 15.0-second transfer timeout, a launch-day security review that fixed a high-severity auth bypass, and a text-to-speech rebuild that cut cost about 54 percent. The transferable point for an IT firm is that the demo is never the hard part: price the minute, own the boundaries, and treat the unglamorous failure modes as a known class.
→ Production engineering June 12, 2026 5 min read Three Attempts at Masking a Voice Agent's Thinking Silence After a caller stops talking, a production voice agent we run at taritas has about 2.2 to 2.5 seconds of dead air before its first audio frame. We made three serious attempts to mask it: a classifier-driven filler, a state-change-driven filler, and preemptive generation. All three are disabled in production today, each for a different structural reason, and the usable lesson is about speech queue ordering: the only filler insertion point that works is the one where the reply does not exist yet, because once the real reply is queued the filler either collides with it or arrives too late. What did ship was unglamorous and real: running the classifier and the knowledge-base lookup in parallel instead of in series cut about 0.6 seconds of median latency. The transferable point is that masking latency is harder than removing it, so spend the first effort on parallelizing serial steps before you try to paper over the gap.
→ Production engineering June 12, 2026 5 min read Rate Limiting a Voice Agent With One Postgres UPDATE A real-time voice agent's daily call cap has to be checked before the greeting plays, survive concurrent calls, and roll over at midnight. At taritas we do all three in one Postgres UPDATE: a CASE expression handles the midnight rollover, the WHERE clause enforces the cap, and RETURNING reports which case fired. One round trip, no race condition, and no scheduled job to reset counters. The reason it is one statement and not three is that a read-then-write check has a race window under concurrent calls where two callers can both pass a cap of ten per day; doing the decision inside a single atomic UPDATE closes that window in the database itself. We pair it with a 1.0-second query timeout and a documented fail-open policy, so a slow or unreachable database lets a real caller through rather than rejecting them. The transferable rule: when a check must be correct under concurrency, push it into one atomic write.
→ Production engineering June 12, 2026 7 min read A 15-Second Default Timeout Broke Our Voice AI's Call Transfers A production voice AI agent we run at taritas stopped transferring callers to staff. Every transfer failed with a 504 at exactly 15 seconds, Envoy Gateway's default route timeout, while the destination line had developed a 13-second post-dial delay. Because the agent's transfer ran as a background API call, each failed dial kept ringing the destination, so staff answered ghost calls with no one on the line. The fix was one scoped timeouts block on the HTTPRoute that raised the limit above the real post-dial delay. The deeper lesson is that the outage was a default nobody chose: 15.0 seconds is sensible for a web request and wrong for a phone transfer that legitimately takes longer to connect. When you put a proxy in front of telephony, enumerate every default timeout and cap on the path and set each one against real call behavior, because an unset default will pick the worst moment to enforce itself.
→ Build decisions January 24, 2025 6 min read Voice AI ROI in Customer Service: How to Build the Business Case The ROI of voice AI in customer service comes down to one comparison, measured per minute. What does a minute of the agent cost to run, and what does the minute it replaces cost you today? At taritas we have the cost side from production. On a cascade pipeline our variable cost runs near three cents per audio-minute, with fixed infrastructure on top that falls as more customers share it. The replaced cost is a human agent's loaded rate and handle time. The spread between them, multiplied by the calls you can safely automate, is the return. The honest part is which calls those are. Routine, high-volume questions automate well. The hard twenty percent still needs a person. Build the case on the routine calls and your own numbers, not on a vendor's headline percentage.
→ Production engineering January 24, 2025 6 min read Enterprise Voice AI Implementation: 5 Challenges We Hit in Production Most enterprise voice AI projects struggle in the same five places. At taritas we have shipped production voice agents for public-sector and healthcare clients, and these are the challenges that actually decided whether they worked. First, latency. A phone call gives you about one to two seconds before silence feels like a dropped line. Second, turn detection. Deciding when the caller has finished talking is harder than it looks, and a fixed silence timer fragments real speech. Third, integration. Our worst outages were defaults nobody set on purpose, like a proxy timeout and a call cap left in an admin panel. Fourth, security and compliance. Voice data is sensitive, and a real security review is a deliverable, not a checkbox. Fifth, cost. A voice agent is priced by the minute, so the per-minute math decides the architecture. None of these show up in a demo.
→ Field notes June 30, 2026 8 min read Why a Voice Agent Told Callers to Call the Line They Were On A voice agent kept telling callers to call the office, on the line they had already dialed to reach it. The cause was not a model error. The knowledge base was written for the web, where ending an answer with call our office is correct, because the reader is not yet on the phone. On a call to that same office, the model read the instruction out loud and sent the caller back to their own line. At taritas the first fix was a prompt rule: never recite the office number unless asked. On a small, cost-optimized model it failed about one call in twenty, because the model anchored on the chunk it was grounding on more than on the rule. The durable fix was structural: strip the office number from each knowledge-base chunk before the model ever sees it. A model cannot relay what it cannot see. The audit path keeps the original text, so grounding and evaluation still see the truth.
→