Mining Admissions Calls Without Breaking HIPAA or Part 2



If you work in behavioral health marketing, you’ve probably already figured out that admissions call recordings are the single best source of buyer-language intelligence your client has. The phrasing prospects actually use, the objections, the competitors they name, the order they raise concerns in. None of that shows up in a keyword tool.

So many marketing teams find a way to listen. Sometimes it’s an agency sitting in on calls, sometimes it’s “de-identified” transcripts shared over Slack, sometimes it’s a marketing analyst with a login to the call platform. The output feeds keyword strategy, landing page copy, ad creative, and increasingly, prompts used to optimize for AI-generated search results.

The problem is that the typical setup doesn’t comply with either HIPAA or 42 CFR Part 2. And as of February 2026, the consequences for getting it wrong changed.1

This post walks through why the common setup is non-compliant, what the law actually requires, and what a version of call mining looks like that holds up.

Why everyone does it

Most B2C / B2B marketing teams rely on keyword research tools (Search Console, Ahrefs, Semrush, various AI visibility platforms) that surface the words people type into search engines. Those tools tell you what queries exist. They don’t tell you how your actual buyers think about their problem, what language they use when they’re scared, or which competitors come up in real deliberation.

Admissions calls contain all of that. In behavioral health, where the decision journey often takes months and involves family members, insurance anxiety, shame, and multiple failed attempts at other treatment, the distance between “keyword volume” and “buyer intent” is enormous. A well-mined call corpus can surface patterns like:

  • “I just need to get away from everything” appearing far more often than clinical terms
  • A consistent order of concerns (insurance, then length of stay, then can I bring my phone)
  • Specific competitor comparisons and how prospects frame them
  • Family-member objections that the patient themselves doesn’t voice

That output tells you what to build, what to say, and how to say it. It’s better intelligence than anything a third-party tool produces.

The catch is that the data it comes from is among the most protected types of information in American healthcare law.

Why the standard setup doesn’t work

A few common rationalizations show up repeatedly in the behavioral health marketing community. Each has a specific flaw.

A BAA is a contract that extends HIPAA obligations to a vendor performing services for a covered entity. It authorizes the vendor to handle PHI within the scope of a permitted use. It does not expand what uses are permitted in the first place.

Marketing is not health care operations under HIPAA. It has its own separate authorization regime under the Privacy Rule, which requires specific written patient authorization for most marketing uses of PHI.2 A standard BAA with a call recording vendor doesn’t grant marketing permission. It grants the vendor permission to host and process recordings as part of the treatment center’s operations.

What this means in practice: the BAA makes it legal for the platform to store the recordings. It does not make it legal for the marketing team to listen to them for creative research.

This is often the most dangerous rationalization because it feels compliant. In practice, most “de-identification” means names-removed, which is nowhere near the HIPAA standard.

The HIPAA de-identification standard has two paths.3

Safe Harbor requires removal of all 18 enumerated identifier categories, including names, dates (except year), geographic subdivisions smaller than state, phone numbers, email addresses, account numbers, full-face photos, biometric identifiers (which arguably includes voice recordings themselves), and “any other unique identifying number, characteristic, or code.”

Expert Determination requires a qualified statistician to formally conclude and document that the re-identification risk is “very small.”

Admissions call transcripts are a worst-case environment for Safe Harbor. Callers volunteer identifying detail unprompted. “My husband died three weeks ago” contains a date element. “I work as the head nurse at the hospital in [small town]” contains a geographic subdivision plus a quasi-identifier strong enough to single out an individual. “Dr. Smith at [clinic] referred me” contains a staff name.

A single-pass redaction tool catches maybe 80% of identifiers on a good day. Getting to Safe Harbor on narrative clinical transcripts typically requires layered redaction (rule-based, NER, and a clinical de-id model), custom rules for your vertical, and a human QA review of a random sample.

Use and disclosure are both regulated under HIPAA. A disclosure doesn’t require data leaving the building; an impermissible use inside the building is still impermissible. And “minimum necessary,” a core HIPAA principle, cuts against listening to identifiable recordings when de-identified output would serve the same purpose.

State wiretap consent (the “this call may be recorded for quality purposes” disclosure) is separate from HIPAA authorization and from Part 2 consent. Wiretap consent makes the recording itself lawful. HIPAA and Part 2 govern what you can then do with a lawful recording. These are different legal regimes with different standards, and consent to one does not satisfy the other.

What 42 CFR Part 2 adds on top

If the facility is a substance use disorder treatment program, which includes most residential rehabs, outpatient addiction programs, detox facilities, and medication-assisted treatment providers receiving federal assistance, Part 2 applies in addition to HIPAA, and it’s stricter.

The short version: HIPAA permits certain uses for “health care operations” without patient authorization. Part 2, historically, did not have an equivalent broad carve-out. The 2024 Final Rule (compliance date February 16, 2026) aligned Part 2 more closely with HIPAA but kept several key differences.1

The biggest change for this discussion is the de-identification standard. Under the old Part 2, de-identification was vaguely defined. Under the 2024 Final Rule, 42 CFR 2.16 now requires Part 2 programs to have policies and procedures for rendering patient identifying information de-identified in accordance with 45 CFR 164.514(b), the HIPAA De-Identification Standard.4 Once records meet that standard, they can be used without specific patient consent.

That’s the doorway. Properly de-identified call transcripts are out of Part 2 scope. Improperly de-identified ones are still fully protected records subject to Part 2’s consent requirements.

The penalty structure changed too. Under the old Part 2, violations were criminal-only and rarely pursued. The 2024 Final Rule aligned Part 2 penalties with HIPAA civil and criminal enforcement authorities.1 HIPAA civil penalties in 2025 range from $141 per violation at the lowest tier to $71,162 per violation at the “willful neglect corrected” tier, with annual caps as high as $2,134,831 for identical violations at the top tier.[5](#ref5) The HIPAA Breach Notification Rule now applies to breaches of unsecured Part 2 records the same way it applies to PHI.1

Two months of enforcement history isn’t enough to predict how aggressive OCR will be on Part 2 specifically, but the enforcement environment has changed.

The FTC is the other shoe

Even for behavioral health companies that think they’ve sidestepped HIPAA (direct-to-consumer telehealth, apps, platforms that argue they’re not “covered entities”), the FTC has spent the last three years establishing that unauthorized marketing use of behavioral health data is a Section 5 violation regardless of HIPAA status.

The FTC’s position is that in behavioral health, sensitive data cannot be disclosed to third parties for advertising purposes without affirmative express consent. A privacy policy mention isn’t enough. Cookie banners aren’t enough. The consent has to be specific, prominent, and opt-in.6,9,10

This matters for the call mining discussion because even if a treatment center could argue HIPAA and Part 2 didn’t apply to a particular marketing use, the FTC enforcement pattern is a second source of liability. The agency has made clear that behavioral health data is treated differently, and that marketing uses of it without clear, informed, specific consent are deceptive practices.

OCR has already made this argument about web tracking

A version of this story has already played out for web tracking. In 2022, OCR issued guidance saying Meta Pixel, Google Analytics, and similar tools deployed on healthcare sites could transmit PHI to ad platforms, triggering HIPAA obligations. In June 2024, the U.S. District Court for the Northern District of Texas vacated part of the guidance in American Hospital Association v. Becerra, specifically the portion stating HIPAA is triggered when an online technology connects an individual’s IP address with a visit to an unauthenticated public webpage addressing specific health conditions.[11](#ref11) OCR’s core position held: if tracking technology on a covered entity’s site transmits information tied to a user’s health condition or treatment-seeking, that’s a disclosure of PHI, which requires either a BAA with the tracking vendor or a HIPAA-compliant authorization from the user.11 OCR also explicitly states that “website banners that ask users to accept or reject a website’s use of tracking technologies, such as cookies, do not constitute a valid HIPAA authorization.”[11](#ref11)

The logic applies directly to call mining:

  1. A technology (pixel, call recording, transcription) collects health information in the normal course of operations
  2. A downstream system (ad platform, analytics tool, marketing team) accesses that information for marketing purposes
  3. The organization argues that some combination of BAA, vendor relationship, or consent disclosure covers it
  4. OCR (or the FTC) disagrees, because the specific use, marketing, required specific authorization that wasn’t obtained

Treatment centers that have cleaned up their website tracking but haven’t thought about call mining have addressed half the problem.

What a workable setup looks like

The goal is to get to the buyer-language insight without the marketing team ever touching identifiable call data. The patterns live in the aggregate, not in any individual call.

Three tiers, with a one-way gate between them

Tier 1: Part 2 records (highest protection).
Raw audio and initial transcripts. Segregated infrastructure. Access limited to a small group with documented Part 2 training. Separate AD group, audit logging, MFA, encrypted at rest. Marketing has no access. Transcription happens here. Run Whisper or similar locally, not through a cloud API (even one with a BAA) unless that specific processing is documented in your Part 2 consent and compliance policies.

Tier 2: De-identification (one-way gate).
A documented, validated process runs against the Tier 1 transcripts and produces de-identified output. Layered approach: rule-based redaction (regex for dates, phone numbers, addresses), a named-entity-recognition pass, a clinical de-identification model (something like Philter, or a medical NER model from HuggingFace), and custom rules for your vertical such as common local competitor names, common insurer names, referring provider naming patterns, and rehab-specific quasi-identifiers.

Then a human QA pass. A designated Privacy Officer or trained reviewer spot-checks 10% or more of output against a Safe Harbor checklist. Log the QA.

For real defensibility, commission a one-time Expert Determination from a qualified statistician under 45 CFR 164.514(b)(1). They review your process, sample output, and issue a written determination.3 This costs real money, typically low five figures, but it’s the single biggest risk reduction available and gives you documented legal cover.

Do not keep a mapping table between de-identified transcripts and original caller identity. That converts the output into a limited data set with different rules. Sever the link.

Tier 3: Analysis environment (de-identified only).
This is where the marketing work happens. De-identified transcripts move to an analysis environment. Local LLM (Ollama running Llama 3.1 or similar) for the analysis itself. You can use a cloud LLM with a BAA if it helps, but keeping it local is cheaper, simpler, and your compliance story is easier if no SUD-origin data ever leaves your walls even after de-identification.

The marketing team works with aggregated output (phrase frequencies, objection clusters, competitor mention patterns) rather than individual transcripts. Rule: no verbatim quotes from transcripts in outbound marketing content. Only paraphrased, synthesized phrasing that represents patterns across many callers. A single vivid anecdote can re-identify even from technically de-identified data.

The paper trail that makes it real

The setup above isn’t enough on its own. You also need:

  • Written policies describing each tier, who has access, and why
  • Updated Notice of Privacy Practices and Part 2 patient notice language consistent with the internal processing workflow
  • Documentation that de-identification meets Safe Harbor or Expert Determination
  • Audit logs showing who accessed Tier 1 data and why
  • Training records for everyone with Tier 1 access
  • A signed-off opinion from the Privacy Officer, Compliance Officer, and outside health privacy counsel that the specific workflow is permissible under your consent and NPP

Documentation is what makes a secure setup a compliant one. Without it, you have a locked room nobody can prove is being used correctly.

A simple test for the current state

If you want to assess whether your own call mining practice, or an agency’s, is operating compliantly, two questions will usually reveal it:

  1. On whose infrastructure does de-identification happen, and what standard does it meet, Safe Harbor or Expert Determination?
  2. Does anyone outside the clinical team ever hear or read identifiable caller content?

The common answers (“we use a platform with a BAA,” “our agency has access but they’re trained,” “we strip out names”) don’t survive contact with the actual regulatory standards.

Why this is worth getting right

Call mining produces real buyer-language insight, and that insight is getting more valuable as search shifts toward AI-driven interfaces that reward content matching how people actually express their problems. Treatment centers that figure out how to capture it legally will have an edge over those that either don’t do it or do it in a way that’s a hidden compliance liability.

The cost of doing it right is not huge. An in-house data engineer or consultant, a commitment to local tools, a one-time expert determination, and a few weeks of workflow documentation.

The cost of doing it wrong accumulates quietly. Maybe nothing happens for years. Maybe an OCR audit triggered by an unrelated complaint pulls the thread and finds a systematic marketing use of Part 2 records. Maybe the FTC notices a website privacy policy that claims HIPAA compliance while the marketing team routinely accesses identifiable call recordings. Penalties aligned with HIPAA now reach seven figures annually.5 For Cerebral’s former CEO, personal liability is already on the table.10

The best source of buyer intelligence is still in those call recordings, but accessing it responsibly needs to be the the standard.


References

1. U.S. Department of Health & Human Services, Office for Civil Rights. Fact Sheet: 42 CFR Part 2 Final Rule. Published February 8, 2024; updated January 30, 2026. Compliance date February 16, 2026. https://www.hhs.gov/hipaa/for-professionals/regulatory-initiatives/fact-sheet-42-cfr-part-2-final-rule/index.html

2. HIPAA Privacy Rule, 45 CFR 164.508(a)(3) (marketing authorization requirement). See also OCR, Marketing. https://www.hhs.gov/hipaa/for-professionals/privacy/guidance/marketing/index.html

3. HIPAA Privacy Rule, 45 CFR 164.514(b); OCR, Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html

4. 42 CFR 2.16(a)(1)(i)(E) and (a)(1)(ii)(D) (Feb. 16, 2024), requiring Part 2 programs to have policies and procedures for “rendering patient identifying information de-identified in accordance with the requirements of 45 CFR 164.514(b) such that there is no reasonable basis to believe that the information can be used to identify a particular patient.” https://www.ecfr.gov/current/title-42/chapter-I/subchapter-A/part-2/subpart-B/section-2.16

5. HIPAA civil money penalty amounts for 2025, adjusted for inflation under 45 CFR Part 102. Tier 4 (willful neglect, not corrected): maximum $71,162 per violation, annual cap $2,134,831. See HHS OCR, Notification of Enforcement Discretion Regarding HIPAA Civil Money Penalties, 84 Fed. Reg. 18151 (Apr. 30, 2019), as adjusted. Summary at HIPAA Journal, HIPAA Violation Fines and Penalties. https://www.hipaajournal.com/what-are-the-penalties-for-hipaa-violations-7096/

6. Federal Trade Commission. FTC to Ban BetterHelp from Revealing Consumers’ Data, Including Sensitive Mental Health Information, to Facebook and Others for Targeted Advertising. March 2, 2023. https://www.ftc.gov/news-events/news/press-releases/2023/03/ftc-ban-betterhelp-revealing-consumers-data-including-sensitive-mental-health-information-facebook

7. In the Matter of BetterHelp, Inc., Complaint, Count VIII (Privacy Misrepresentation: HIPAA Certification), ΒΆΒΆ 65-69, 98-100. Federal Trade Commission. https://www.ftc.gov/system/files/ftc_gov/pdf/2023169-betterhelp-complaint_.pdf

8. Federal Trade Commission. FTC Enforcement Action to Bar GoodRx from Sharing Consumers’ Sensitive Health Info for Advertising. February 1, 2023. https://www.ftc.gov/news-events/news/press-releases/2023/02/ftc-enforcement-action-bar-goodrx-sharing-consumers-sensitive-health-info-advertising

9. Federal Trade Commission. Alcohol Addiction Treatment Firm will be Banned from Disclosing Health Data for Advertising to Settle FTC Charges that It Shared Data Without Consent (Monument, Inc.). April 11, 2024. https://www.ftc.gov/news-events/news/press-releases/2024/04/alcohol-addiction-treatment-firm-will-be-banned-disclosing-health-data-advertising-settle-ftc

10. Federal Trade Commission. Proposed FTC Order will Prohibit Telehealth Firm Cerebral from Using or Disclosing Sensitive Data for Advertising Purposes, and Require it to Pay $7 Million. April 15, 2024. https://www.ftc.gov/news-events/news/press-releases/2024/04/proposed-ftc-order-will-prohibit-telehealth-firm-cerebral-using-or-disclosing-sensitive-data

11. HHS Office for Civil Rights. Use of Online Tracking Technologies by HIPAA Covered Entities and Business Associates. Bulletin, revised March 18, 2024, with June 2024 vacatur notice. See also American Hospital Association v. Becerra, No. 4:23-cv-1110 (N.D. Tex. June 20, 2024). https://www.hhs.gov/hipaa/for-professionals/privacy/guidance/hipaa-online-tracking/index.html