Voice AI engineering, written for people who have to ship it and defend it in a security review. New posts two to three times a week.
⌕
LatestProduction engineering5 min read
Three Attempts at Masking a Voice Agent's Thinking Silence
After a caller stops talking, a production voice agent we run at Taritas has about 2.2 to 2.5 seconds of dead air before its first audio frame. We made three serious attempts to mask it: a classifier-driven filler, a state-change-driven filler, and preemptive generation. All three are disabled in production today, each for a different structural reason, and the usable lesson is about speech queue ordering: the only filler insertion point that works is the one where the reply does not exist yet.