Skip to content
tarıtas
Procurement and compliance POST-8 7 min read

Can a Voice AI Pass a Healthcare Security Review? We Ran One on Launch Day

Yes, if it is built for one. On a healthcare launch, taritas ran a structured security review against the live code and found one high-severity authentication bypass and two medium issues. All three were fixed and verified with tests the same day, and the client's acceptance package included the report with the real findings still in it. What made the result credible to the procurement reviewer was not a clean report but the evidence beside each fix: 51 tests passing and 0 failing, plus a curl reproduction of the exploit that now returns 401 instead of 201. The transferable pattern for an IT services firm is that any endpoint your agent calls back into is an authentication surface, and the shared secret guarding it must be compared in constant time. A voice AI passes a security review when the team can close findings fast and show its work, not when the report comes back empty.

Published · Updated · Supreet Tare

All names, numbers, and identifiers in this post are anonymized. The patterns are real.

Two-lane before and after diagram of a voice agent's callback surface. On the left, before the fix, an attacker and the voice agent both reach three API endpoints, and the attacker's write to the messages endpoint is unblocked because there is no authentication check. On the right, after the fix, a single constant-time verify-agent-secret check stands in front of all three endpoints, the attacker's requests bounce off with a 401, and the anonymous read returns only three harmless takeaway fields.

We shipped a healthcare voice agent and ran a full security review on the same day it went live. The review found a way for a stranger to write fake messages into a patient’s medical transcript, attributed to the AI, with no audit trail. We closed it before the afternoon, and then we did something that surprises people: we handed the client the report with the finding still in it. Here is what the review found, how the fixes worked, and why the report itself became part of what closed the deal.

The 30-second architecture

A voice agent, written in Python and built on LiveKit, talks to a patient on the phone. While the call runs, the agent calls back into the web application’s API to save the transcript, read session state, and write structured notes. Those callback endpoints are the agent’s hands inside the backend.

To prove a request really came from our agent and not from a random caller, the agent sends a shared secret in an HTTP header, x-agent-secret, and every callback endpoint is supposed to check it. The phrase “supposed to” is the whole story.

How do you keep a security review from drowning in noise?

The review was a structured static pass over every server-side runtime path: all API routes, the auth middleware, the payment and call webhooks, knowledge-base ingestion and search, the crisis-email sender, the entire voice agent file, and its container image.

The part that made it useful was the triage filter. Only findings with a confidence of 8 out of 10 or higher were allowed to surface. Generic hardening notes, denial-of-service and rate-limit gaps, dependency vulnerabilities (covered by a separate software bill-of-materials scan), and “missing header” write-ups were all excluded up front. That filter is the difference between three things worth fixing and a forty-item wall of noise nobody acts on.

Three findings came back: one high, two medium.

ID   SEVERITY  CATEGORY                               STATUS
01   High      Authentication / authorization bypass  Resolved same day
02   Medium    Insecure direct object reference (PII)  Resolved same day
03   Medium    Non-constant-time secret comparison     Resolved same day

Finding 1: the route that took dictation from anyone (High)

The endpoint the agent uses to write transcript messages accepted a JSON message and inserted it straight into the messages table with no authentication check at all. Its sibling endpoint, the one that updates a message, did check x-agent-secret. The agent even sent the header on every call. But the write route never read it.

The exploit needs only one thing: a session identifier. Those leak constantly, through referrer headers, a shared screen, or a support link like /summary?session_id=.... Anyone who learned a session identifier could post arbitrary messages, labelled as if the assistant had said them, into a patient’s transcript. A doctor reviewing a flagged case would see injected text attributed to the AI, with nothing in the log to say it was fake.

Finding 2: the patient view that returned too much (Medium)

The endpoint that returns a session to an anonymous caller, meant to power a simple post-call summary page, returned the entire assessment object. That included pain locations, a severity score, red flags, current medications, previous treatments, and any feedback the patient had left.

None of that has a name attached, which is exactly the trap. Under Canadian privacy law, severity scores and symptom narratives are protected health information anyway, because the same session identifier also reaches a recording and a billing email. Cross-correlate those and the patient is re-identified. The endpoint was leaking health data through a convenience route that “might need it someday.”

Finding 3: the secret compared the wrong way (Medium)

Where the secret was checked, it was checked with an ordinary equality comparison: provided === expected. That comparison stops at the first character that differs, so the time it takes leaks how many leading characters were correct. A patient attacker can recover the secret one byte at a time by measuring response times. The same secret guards every privileged write path the agent uses, so recovering it once exposes all of them.

Why did the authentication gap go unnoticed?

All three findings share a theme: nothing fails when the caller is honest.

Finding 1 was sibling-route asymmetry. Two endpoints sat next to each other; one had the auth check and one did not, and because the real agent always sent the header, every legitimate request looked authenticated in the logs. The gap was invisible until someone asked what happens when the caller is not the agent.

Finding 3 was the default-equality trap. The obvious comparison is the wrong one for secrets, and it reads as correct in review because it returns the right answer. Finding 2 was scope creep in a convenience endpoint: the patient view returned the whole assessment object because the summary page might want a field from it later.

How do you fix an auth bypass in an agent callback?

Findings 1 and 3 closed with a single shared helper that compares the secret in constant time.

// verify-agent-secret.ts
import crypto from "node:crypto";

export function verifyAgentSecret(request: Request): boolean {
  const provided = request.headers.get("x-agent-secret") ?? "";
  const expected = process.env.AGENT_API_SECRET ?? "";
  if (!provided || !expected) return false;
  const a = Buffer.from(provided);
  const b = Buffer.from(expected);
  if (a.length !== b.length) return false;
  return crypto.timingSafeEqual(a, b);
}

Every endpoint the agent calls back into now gates on this helper:

if (!verifyAgentSecret(request)) {
  const admin = await getAdminUser();
  if (!admin) return unauthorizedResponse();
}

Finding 2 closed by trimming the anonymous view to exactly the three fields the public summary page renders, and nothing else.

const publicExtracted = extracted
  ? {
      topicsDiscussed: extracted.topicsDiscussed ?? [],
      educationalTakeaways: extracted.educationalTakeaways ?? [],
      doctorSuggestions: extracted.doctorSuggestions ?? [],
    }
  : null;
return NextResponse.json({ extractedData: publicExtracted });

Pain locations, severity scores, red flags, medications, prior treatments, and feedback now require an admin cookie or the agent secret. Total remediation time was the same working day as the review.

Hardening

The fix came with controls so the gap cannot reopen quietly:

  • The helper is the single source of truth. No route reads the secret environment variable directly anymore, which we confirmed with a repository-wide search that returns only the helper file.
  • A regression test asserts the anonymous view leaks nothing. It checks that every sensitive field comes back undefined for a patient caller. A curl reproduction of the original exploit now returns 401 instead of 201.
  • The full suite runs green: 51 tests passing, 0 failing across the sessions, messages, token, payment, and flagged-case suites.
  • The deferred items are written into the same report as the next-phase backlog: row-level security in the database, content-security-policy headers, audit scans in continuous integration, rate limiting on the knowledge-base search, and an external penetration test before live billing.

That last list matters as much as the fixes. An honest gap list is what tells a procurement reviewer the team knows where the edges are.

Key takeaways

The verification still surprises people, so here is the short version. If you ship AI agents that call back into a web backend, the agent’s shared secret is a privileged credential and every callback endpoint is an authentication surface. Ask your team which of those endpoints actually verify the secret, and whether the comparison is constant-time. Treat any session-scoped health or personal text as protected data even when no name sits next to it. And keep the review honest by excluding the noise up front, so what survives is worth acting on the same day.

What this means if you are an IT services firm

Public-sector and healthcare buyers do not buy on a clean-looking report. They buy on evidence: where the data lives, how long it is kept, who else can touch it, and what the audit log records. The most useful artifact we produced was not the fix, it was the report with the real findings, the exact remediation, and the passing tests, handed to the client as a deliverable.

If your remediation loop is too slow to survive that, a security review becomes a thing to fear instead of a thing to sell. The question to ask before you offer voice AI to your own clients is simple: would you be comfortable handing the client the report with the findings still in it? If the answer is no, that is the work to do first, and it is the substance of how we work with partners.

Related questions
Can a voice AI platform pass a healthcare or government security review?
Yes, when the review is scoped to real runtime paths and the team can close findings fast. On one healthcare launch the review found one high and two medium issues, all fixed and test-verified within the day. What made it pass was not a zero-finding report. It was evidence of same-day remediation on top of controls that were already in place, like webhook signature checks, parameterised queries, input sanitisation, and non-root containers.
What does an authentication bypass look like in an agent callback architecture?
Usually it is an endpoint the agent writes to that quietly assumes only the agent knows the URL. Session identifiers leak through referrer headers, screen shares, and support tickets. Any endpoint a server-side agent calls must verify a shared secret or signed token, and that verification has to be constant-time so it cannot be guessed byte by byte through response timing.
Is pain or symptom data protected health information if no name is attached?
Under Canadian privacy law it effectively is. Severity scores and symptom narratives become re-identifiable when the same session identifier also reaches a recording or a billing email. Treat session-scoped health text as protected health information regardless of whether a direct identifier sits next to it.
How do you keep a security review from drowning in noise?
Set hard exclusions up front and a confidence threshold per finding. Generic hardening notes, denial-of-service rate limits, and dependency vulnerabilities each have their own pipeline and do not belong in a code review. A review tuned to surface only high-confidence findings returns three things worth fixing instead of forty theoretical ones.
What should go in the acceptance package for a healthcare client?
The security report itself, with the findings included, the exact fixes inline, and the test evidence that proves each one is closed. Add an honest recommendations section that becomes the next phase hardening backlog. Handing over real findings and their remediation builds more trust than a spotless report.

Reading this because a client asked for voice AI? That is the conversation we are built for. What taritas does for partners.

More from Procurement and compliance
PROJECT taritas.com/blog
DWG POST-8
REV 1.0
DATE 2026-06-24