The blog

Notes from production.

Voice AI engineering, written for people who have to ship it and defend it in a security review. New posts two to three times a week.

Latest Production engineering July 13, 2026 8 min read

Voice Agent Responsiveness: The Latency Callers Feel, and a Fix That Broke Silently

The latency a phone caller actually feels is not the language model's response time. It is the sum of voice-activity-detection silence, speech-to-text segmentation, and turn-detector patience, all of which run before the model starts. At taritas we cut that felt delay by about 800 milliseconds on one production agent by lowering the voice-activity silence window and layering it under a semantic turn detector that reads the words, not just the pause. A further speed trick, starting speech synthesis before the caller's turn was fully processed, looked like free latency and instead produced a silently wrong response, because it raced a flag our text-to-speech step depended on. Tune the layers that run before the model first, and test every latency win against the state it might race.

Read the post →

All posts

Build decisions July 13, 2026 8 min read

We Split One Prompt Into Two to Fix a Small LLM. Then We Deleted Both.

A small, cost-optimized LLM asked to do two jobs in one call, classify a caller's intent and generate a response, silently dropped hard formatting rules on roughly 1 in 30 turns once the combined prompt passed 500 lines. At taritas we fixed this by splitting the work into two focused calls, a classifier and a responder, and ran them so the added latency stayed under a second. Weeks later, on the same voice agent, we deleted the classifier anyway, not because splitting the jobs was wrong, but because a broader review found the whole layered pipeline sounded robotic and moved auditing to a step after the call ends. Small LLMs still prefer narrow jobs. The fastest way to find out whether you still need a layer is to remove it and watch what breaks.

Production engineering July 6, 2026 5 min read

How Long Does It Take to Deploy a Voice AI Receptionist?

A production voice AI agent at taritas went dark 30 minutes into its live launch, and it was not a code defect: an admin-panel soft rate limit of 100 calls a day, generous enough during weeks of testing, hit real launch-day volume and silently rejected every call after that. The deploy itself took hours; the actual risk showed up in the first hours of production traffic, in a setting nobody re-audited against launch volume. That is the honest answer to how long deploying a voice AI receptionist takes: the code ships fast, but the first day in production is where configs, caps, and quotas meet real callers for the first time. Budget a full config audit and a staffed watch window for day one, not just a deployment slot, because every numeric limit in a system, panel caps, quotas, business-hours windows, is production policy whether or not it appears in a code review.

Procurement and compliance July 6, 2026 8 min read

What Security Features Should an Enterprise Voice AI Have?

An enterprise voice AI needs security features at every hop of the call, because a voice pipeline has more moving parts than a web app: carrier, speech to text, language model, text to speech, and the backend that ties them together. The checklist we build to at taritas: encryption in transit and at rest on every hop, authenticated internal endpoints with constant-time secret checks, role-based access to the admin surface, tenant isolation in the knowledge base, redaction of sensitive strings before text reaches the model or the voice, per-turn audit logging with cost and latency metering, rate limits and business-hours gates enforced in code, and safety escalation that notifies a human exactly once, enforced at the database layer. Compliance controls like data residency and signed BAAs sit on top of the engineering, not in place of it. A serious vendor can show each feature working in production, including a security review report with real findings.

Build decisions July 4, 2026 9 min read

When to Build vs Buy Voice AI: What a 3/10 Client Review Taught Us

Buy a platform like Retell or Vapi when the call flows are standard and speed to launch matters more than control. Booking, lead capture, and FAQ agents rarely justify a custom build. Build custom when a requirement cannot live inside a platform: data residency, audit trails procurement will actually read, statutory business rules, or integrations the platform cannot reach. Then avoid the trap in the middle. At taritas we built a custom voice agent for a public-sector office with a classifier, a routing table, canned responses, and a dedicated sub-agent. It was deterministic, tested, and auditable, and the client rated it 3 out of 10 because callers could hear the layers. We deleted about 2,000 lines of that architecture, moved more than 60 business rules into one prompt file of about 150 lines, and kept the audit as a post-call eval. Custom should mean owning the requirements, not adding layers.

Procurement and compliance July 3, 2026 10 min read

Voice AI Subprocessors: Which Vendors Sign a BAA

A voice AI subprocessor is any outside vendor your agent hands data to: the telephony carrier, the speech-to-text engine, the language model, the text-to-speech engine, the cloud that hosts it, and often an orchestration platform on top. Under GDPR these are your subprocessors and each needs a data processing agreement. Under HIPAA the ones that touch protected health information are business associates and each needs a signed BAA. As of mid-2026 most serious vendors will sign, from Anthropic Claude and OpenAI for the model to Deepgram for speech-to-text, Azure and ElevenLabs for the voice, and Twilio for the carrier. The catch is that a BAA is almost never for the whole vendor. It is gated to a specific plan or edition and often to specific endpoints, so a vendor will sign for its API and not its consumer app. At taritas we keep a subprocessor register for every deployment: who touches the data, whether a BAA or DPA is on file, and which plan makes it valid.

Procurement and compliance July 3, 2026 12 min read

What HIPAA and GDPR Actually Require of a Voice AI

A voice AI is not HIPAA or GDPR compliant because a vendor says so. Compliance is a property of every layer that touches the data, decided when you design the system, not a badge you add later. A voice agent is a stack: a telephony carrier, a speech-to-text engine, a language model, a text-to-speech engine, and your own backend. Under HIPAA, every layer that sees protected health information needs a signed Business Associate Agreement, encryption in transit and at rest, role-based access, and an audit trail kept for six years. Under GDPR, every layer is a processor that needs a data processing agreement, a lawful basis, and data residency you can prove, and voice can become special-category biometric data the moment you identify a speaker by their voice. At taritas we build voice AI for regulated deployments, and the pattern that holds is simple: design for the security review on day one, because residency, retention, and audit are architecture decisions you cannot bolt on afterward.

Field notes June 30, 2026 8 min read

Why a Voice Agent Told Callers to Call the Line They Were On

A voice agent kept telling callers to call the office, on the line they had already dialed to reach it. The cause was not a model error. The knowledge base was written for the web, where ending an answer with call our office is correct, because the reader is not yet on the phone. On a call to that same office, the model read the instruction out loud and sent the caller back to their own line. At taritas the first fix was a prompt rule: never recite the office number unless asked. On a small, cost-optimized model it failed about one call in twenty, because the model anchored on the chunk it was grounding on more than on the rule. The durable fix was structural: strip the office number from each knowledge-base chunk before the model ever sees it. A model cannot relay what it cannot see. The audit path keeps the original text, so grounding and evaluation still see the truth.

Field notes June 30, 2026 8 min read

Debugging a Voice Agent's Behavior, Not Its Code: The First Live Call

You debug a voice agent's behavior by listening to a real call, not by reading its code. At taritas, the first live call of a chronic-pain education agent passed every text test yet still failed out loud. It talked too long, answered its own questions, and dropped its changed footing after a crisis moment. None of that was a code bug. The fix was four edits to the system prompt: a numeric length cap, a stop rule that ends a turn at a question, a rule that keeps crisis awareness alive for the rest of the session, and one reminder repeated at the end of the prompt. We added no new branches. The one true code bug, a name the avatar invented, came from a hardcoded string the prompt could not see. The lesson is to sort failures by root cause first: behavior problems belong in the prompt, and identity problems belong in the code.

Production engineering June 24, 2026 8 min read

Semantic Turn Detection: How a Voice Agent Knows You're Done

Turn detection is how a voice agent decides you have finished speaking. Once it decides, it replies. The simplest method is a silence timer. The agent waits a fixed pause after you stop, then treats the turn as over. That one number forces a bad trade. At taritas we lowered the silence window from 1.5 to 1.0 seconds. We wanted lower latency, and replies did come faster. But shorter silence had a cost. Our speech-to-text began splitting one spoken sentence into fragments whenever a caller paused. More fragments meant more turns marked interrupted. That stress exposed a bug that put the call's opening greeting on a later turn. The honest fix is to stop deciding turns by silence alone. Semantic turn detection is the upgrade. It is a small model that reads the words so far and judges, by meaning, whether you are done. On LiveKit it runs next to voice-activity detection, and the trade mostly goes away.

Production engineering June 24, 2026 7 min read

How to Integrate Retell AI With Google Calendar for Booking

Retell AI does not ship a Google Calendar booking toggle, so the integration is a custom function: the agent calls a webhook on your backend mid-conversation, and your backend talks to the Google Calendar API. The demo version of that is a weekend of work. The production version turns on three details the happy path skips. First, latency: the function runs during a live call, so the round trip has to stay near two seconds or the caller hears silence, which means keeping the on-call work small and letting the agent speak while it runs. Second, idempotency: voice tool calls get retried, so the booking has to carry a deterministic event id that turns a duplicate insert into a no-op instead of a double booking. Third, time zones: a caller who says three o'clock means a wall-clock time, so every event needs an explicit IANA time zone or you book the wrong hour. Get those three right and the booking holds on a real phone line.

Build decisions June 23, 2026 7 min read

Why We Rejected the Realtime Voice API: Unit Economics From a Live Deployment

Because a realtime speech-to-speech API costs more per minute than the price the customer pays. At taritas we measured a realtime API at about 25 to 35 cents per minute against a standard price of 18 cents per minute, while our cascade pipeline runs near 3 cents per minute in variable cost. A voice agent is priced by the minute, so any architecture that costs more per minute than the line charges cannot scale into profit. The cascade also carries a fixed infrastructure base of about 1,900 Canadian dollars a month, which amortizes from 2.28 dollars per minute at one tenant down to about 26 cents at twenty-five, crossing below the 18 cent price as customers share it. That is why the decision is reversible rather than ideological: if realtime pricing ever drops below the line, the same arithmetic flips. Capability was never the blocker. Arithmetic was.

Procurement and compliance June 23, 2026 7 min read

Can a Voice AI Pass a Healthcare Security Review? We Ran One on Launch Day

Yes, if it is built for one. On a healthcare launch, taritas ran a structured security review against the live code and found one high-severity authentication bypass and two medium issues. All three were fixed and verified with tests the same day, and the client's acceptance package included the report with the real findings still in it. What made the result credible to the procurement reviewer was not a clean report but the evidence beside each fix: 51 tests passing and 0 failing, plus a curl reproduction of the exploit that now returns 401 instead of 201. The transferable pattern for an IT services firm is that any endpoint your agent calls back into is an authentication surface, and the shared secret guarding it must be compared in constant time. A voice AI passes a security review when the team can close findings fast and show its work, not when the report comes back empty.

Procurement and compliance June 18, 2026 4 min read

Data Residency for Voice AI: Why Your Region Decides Your Voice

For a regulated voice AI deployment, data residency is not only a storage rule. It quietly decides which models and voices you can use, because a managed provider's catalog is not identical in every region: the flagship voice in the demo may not exist in the region your rules require. At taritas we hit this directly. The natural managed voice we wanted was not available in our client's required region, so we self-hosted an open model in-region to keep both residency and quality, then moved back once the managed voice reached that region in early 2026. The practical rule is to pin the region first, audit what is actually available there, and treat self-hosting in-region as the escape hatch when the managed catalog is thin. Budget for that escape hatch, because in-region self-hosting adds a GPU operations program that a managed voice would otherwise have absorbed for you.

Build decisions June 18, 2026 7 min read

Chatterbox vs Azure Dragon HD: Choosing a Voice Agent's TTS in Production

Text-to-speech is the decision that most shapes how a voice agent sounds and what it costs. At taritas we made the same choice three times for one production agent: a managed cloud voice, then self-hosted Chatterbox for quality, then back to a managed Azure Dragon HD voice once it cleared the bar. The honest comparison is that self-hosted Chatterbox wins on control and in-region flexibility but turns a line item into a GPU operations program, while managed Azure Dragon HD wins on cost and simplicity once its quality is good enough in your region. When the managed HD voice later cleared quality, dropping the GPU cut our text-to-speech cost by about 54 percent, so the right answer changed with the calendar. Neither option is universally correct: pick the cheapest voice that clears your naturalness bar in your region, and re-audition every major release rather than trusting an old verdict.

Procurement and compliance June 17, 2026 6 min read

What Happens When Your Voice AI Breaks in Production

The demo never shows you what happens at 2 a.m. when the agent stops answering and a real customer is on the line. Before buying voice AI, the operations questions decide more than the feature list: who is on call, how fast you hear about an outage, whether a failure degrades safely, and what the postmortem looks like. At taritas we run production voice agents with on-call rotations, a communication discipline kept separate from the debugging, guardrails that fail open, and blameless postmortems. The reason this matters is concrete: our worst outages were not exotic bugs but quiet defaults, a 15.0-second proxy timeout and a 100-calls-per-day cap left over from testing, each of which stopped real calls until someone found it. In an always-on, per-minute product, how you handle the break is what the customer remembers, so treat incident response as a feature you ship, not a thing you improvise.

Build decisions June 16, 2026 7 min read

White-Label Voice AI for IT Services Firms: 7 Production Lessons

At taritas we build white-label voice AI that regional IT services firms resell under their own brand, and the same lessons hold on every engagement. The ones that decide whether it works are rarely about the model: who owns the customer and the intellectual property, whether the per-minute price can carry the architecture, and the configuration and operational discipline that keeps a live agent up. Across production builds those lessons came with numbers: realtime APIs at 25 to 35 cents per minute against an 18 cent line price, a go-live outage from a 100-calls-per-day cap, a 15.0-second transfer timeout, a launch-day security review that fixed a high-severity auth bypass, and a text-to-speech rebuild that cut cost about 54 percent. The transferable point for an IT firm is that the demo is never the hard part: price the minute, own the boundaries, and treat the unglamorous failure modes as a known class.

Production engineering June 12, 2026 5 min read

Three Attempts at Masking a Voice Agent's Thinking Silence

After a caller stops talking, a production voice agent we run at taritas has about 2.2 to 2.5 seconds of dead air before its first audio frame. We made three serious attempts to mask it: a classifier-driven filler, a state-change-driven filler, and preemptive generation. All three are disabled in production today, each for a different structural reason, and the usable lesson is about speech queue ordering: the only filler insertion point that works is the one where the reply does not exist yet, because once the real reply is queued the filler either collides with it or arrives too late. What did ship was unglamorous and real: running the classifier and the knowledge-base lookup in parallel instead of in series cut about 0.6 seconds of median latency. The transferable point is that masking latency is harder than removing it, so spend the first effort on parallelizing serial steps before you try to paper over the gap.

Production engineering June 12, 2026 5 min read

Rate Limiting a Voice Agent With One Postgres UPDATE

A real-time voice agent's daily call cap has to be checked before the greeting plays, survive concurrent calls, and roll over at midnight. At taritas we do all three in one Postgres UPDATE: a CASE expression handles the midnight rollover, the WHERE clause enforces the cap, and RETURNING reports which case fired. One round trip, no race condition, and no scheduled job to reset counters. The reason it is one statement and not three is that a read-then-write check has a race window under concurrent calls where two callers can both pass a cap of ten per day; doing the decision inside a single atomic UPDATE closes that window in the database itself. We pair it with a 1.0-second query timeout and a documented fail-open policy, so a slow or unreachable database lets a real caller through rather than rejecting them. The transferable rule: when a check must be correct under concurrency, push it into one atomic write.

Production engineering June 12, 2026 7 min read

A 15-Second Default Timeout Broke Our Voice AI's Call Transfers

A production voice AI agent we run at taritas stopped transferring callers to staff. Every transfer failed with a 504 at exactly 15 seconds, Envoy Gateway's default route timeout, while the destination line had developed a 13-second post-dial delay. Because the agent's transfer ran as a background API call, each failed dial kept ringing the destination, so staff answered ghost calls with no one on the line. The fix was one scoped timeouts block on the HTTPRoute that raised the limit above the real post-dial delay. The deeper lesson is that the outage was a default nobody chose: 15.0 seconds is sensible for a web request and wrong for a phone transfer that legitimately takes longer to connect. When you put a proxy in front of telephony, enumerate every default timeout and cap on the path and set each one against real call behavior, because an unset default will pick the worst moment to enforce itself.

Build decisions January 24, 2025 6 min read

Voice AI ROI in Customer Service: How to Build the Business Case

The ROI of voice AI in customer service comes down to one comparison, measured per minute. What does a minute of the agent cost to run, and what does the minute it replaces cost you today? At taritas we have the cost side from production. On a cascade pipeline our variable cost runs near three cents per audio-minute, with fixed infrastructure on top that falls as more customers share it. The replaced cost is a human agent's loaded rate and handle time. The spread between them, multiplied by the calls you can safely automate, is the return. The honest part is which calls those are. Routine, high-volume questions automate well. The hard twenty percent still needs a person. Build the case on the routine calls and your own numbers, not on a vendor's headline percentage.

Production engineering January 24, 2025 6 min read

Enterprise Voice AI Implementation: 5 Challenges We Hit in Production

Most enterprise voice AI projects struggle in the same five places. At taritas we have shipped production voice agents for public-sector and healthcare clients, and these are the challenges that actually decided whether they worked. First, latency. A phone call gives you about one to two seconds before silence feels like a dropped line. Second, turn detection. Deciding when the caller has finished talking is harder than it looks, and a fixed silence timer fragments real speech. Third, integration. Our worst outages were defaults nobody set on purpose, like a proxy timeout and a call cap left in an admin panel. Fourth, security and compliance. Voice data is sensitive, and a real security review is a deliverable, not a checkbox. Fifth, cost. A voice agent is priced by the minute, so the per-minute math decides the architecture. None of these show up in a demo.

Reading this because a client asked for voice AI?

That is the conversation we are built for. taritas engineers it behind your brand.

What taritas does for partners