Data Liquidity Is Not a Feature. It’s the Precondition for Everything Else.

Most healthcare data platforms solve for completeness. That’s the right first problem. But there’s a harder one behind it: whether the data can get to where it’s needed, in the shape it’s needed, at the moment a physician is making a decision.

Ramani Narayan · June 2026 · 6 min read

a_clean_high_tech_abstract_conceptual_illustratio

One of our customers described the state of healthcare IT data in three words: incomplete, dirty, and frozen.

Incomplete is the most visible problem — patient history scattered across EHRs, HIEs, labs, and facilities that don’t share records. The industry has spent a decade working on it. FHIR, interoperability mandates, HIE expansion: all aimed at completeness.

Dirty is the next layer — duplicate records, conflicting codes for the same clinical fact, narrative sitting unresolved alongside structured data. Ontology annotation — mapping every concept to SNOMED, RxNorm, LOINC — is the engineering response to dirty data. We covered this in the previous post in this series.

But the third failure — frozen — is the one that determines whether complete, clean data changes anything at the point of care. And it is the least discussed of the three.

Frozen means the data exists but cannot move. It is accurate. It may even be normalized. But it sits locked in a format, a system, or an architecture that requires a months-long integration project every time a new application needs to consume it. Getting it moving requires a custom integration project. Every time.

Liquidity is what makes data usable on demand. The other two get you accurate data that sits there.

You can fix the first two problems and still end up with data nobody can use on demand. The industry has spent a decade building accurate, normalized records that still require months of custom integration work before any new application can consume them. That’s not a data quality failure. It’s a liquidity failure.

HIPAA Already Requires It

The minimum necessary standard is one of HIPAA’s foundational Privacy Rule requirements. It demands that access to protected health information be limited to exactly what the specific task requires. Not what the system can reach. Not what permissions allow. What the task requires. Every covered entity and business associate in the United States is bound by it.

This is not a new requirement. It has been in force since 2003. But it has a new and sharper implication in the age of AI.

When a clinical AI system sends the full patient record to a language model — every note, every lab, every problem list entry, everything the EHR has — it isn’t HIPAA-compliant by design. It’s exposing more protected health information than the task requires. The fact that the model only “sees” it briefly, or doesn’t store it, doesn’t resolve the minimum necessary requirement. The access itself is the violation.

The minimum necessary standard is not satisfied by limiting what the AI says in its output. It is satisfied by limiting what the AI sees as its input. That is an architecture requirement, not a prompt engineering requirement.

Liquid data — data that can be precisely scoped, filtered, and shaped before it reaches the model — is not just an efficiency argument. It is a compliance argument. A data layer that can deliver exactly the cardiovascular context relevant to this encounter, for this physician, with this patient, and nothing beyond that, is the only architecture that satisfies minimum necessary at the AI layer.

A frozen data layer has no concept of task-relevant scope. It can surface the full patient record or nothing. It cannot answer the question “give me exactly the PHI this specific clinical task requires and nothing beyond that” — because that query requires the data to be liquid, annotated, and queryable by clinical dimension. Without liquidity, minimum necessary compliance depends on the model guessing correctly about what to use and what to ignore. That is not a compliance strategy.

HIPAA’s minimum necessary standard doesn’t evaluate the model’s outputs. It asks what the model had access to going in. That’s an architectural question, and most clinical AI deployments today don’t have a good answer.

Ontology Is What Makes Liquidity Affordable

Liquidity without precision is just a bigger fire hose. The challenge isn’t getting data to flow — it’s getting the right slice to flow without someone having to manually define what “right” means each time a new query comes in. That turns out to be an ontology problem.

The data quality benefits of ontology are real, but they’re not the point here. What matters for liquidity is that once every clinical concept is mapped to a standard vocabulary, the system can scope a query by meaning rather than by code. That’s what makes automated scoping possible without a human in the loop for every new use case.

Without ontological grounding, scoping a patient record to the clinically relevant subset requires either a human expert to define the scope for each query, or a language model to read the entire record and filter it down. The first doesn’t scale. The second is expensive, slow, and puts unnecessary PHI in front of the model before it has a chance to filter it — the exact HIPAA problem just described.

With ontological grounding, because every Condition in the record resolves to SNOMED, every medication resolves to RxNorm with MED-RT indication annotations, and every lab resolves to LOINC, “give me all cardiovascular content for this patient” becomes a subtree filter rather than a chart review. The model gets what the query needs. The rest of the record stays out of the context window entirely.

Task

Without ontological grounding

With ontological grounding

Scope context to relevant conditions

Send full record to model; ask it to filter. Expensive. PHI exposure before filtering.

Query SNOMED subtree. Cardiovascular content returned as a precise, coded set. Model sees only what the task requires.

Include medication context

Include all medications and hope the model reasons about relevance. Noise competes with signal.

Query MED-RT indications. Only medications indicated for the relevant conditions included. Token count drops. Accuracy rises.

Surface relevant labs

Include all lab history. Model attends to recent values regardless of clinical relevance to the current task.

Query LOINC panel membership. Only labs relevant to the active condition family returned. Context is tight and purposeful.

Satisfy HIPAA minimum necessary

Not achievable. System accesses everything it can reach and filters after the fact.

Achievable. Scope is defined before access. The model sees only what the task requires. Provenance is recorded at query time.

Token costs matter more than they did two years ago, and they’re not going down. A context window stuffed with an unfiltered patient record — fourteen years of notes, every lab, every problem list entry — costs more per query and produces worse output than a scoped one. The irrelevant content doesn’t disappear; it competes with the relevant content for the model’s attention. Ontology-grounded scoping is the only way to fix this without someone reviewing every query for relevance before it runs.

A well-annotated, liquid data layer doesn’t just make AI more accurate. It makes every query cheaper. When the context window contains only what the query needed, inference is faster, the outputs are tighter, and that advantage compounds across every query the system runs.

What Liquid Data Makes Possible at the Point of Care

The abstract argument for liquidity becomes concrete when you look at what a well-built pre-visit intelligence layer must do.

Before the physician walks into the exam room, a clinical AI system running on liquid data can assemble a brief scoped to that specific encounter — this patient, this physician’s specialty, this visit type. The brief isn’t built from a template someone pre-configured. It’s assembled on demand, by querying the patient’s annotated, cross-system record and pulling only what’s clinically active for this context.

Take a CHF follow-up. The system pulls recent cardiology notes, BNP trend, echo findings, diuretic changes, renal function, potassium, weight trend, and any recent ED visits. It does not pull the dermatology note from six months ago, the childhood immunization record, or the orthopedic history. Those aren’t excluded because someone configured a filter for them. They’re excluded because they don’t sit on the cardiovascular or renal axis that a CHF follow-up activates — and the ontological annotation on the data is what lets the system know that, automatically, before the model sees anything.

The same logic applies across every visit type, and the required slice is different each time. A first visit needs breadth: referral reason, full condition history, medications, allergies, recent labs, outside records, and open care gaps. A follow-up needs recency: what changed since the last encounter, which decisions are still open, which labs or medications warrant attention. A chronic condition visit like a diabetes review needs a condition-centered lens: A1c trend, renal function, relevant medications, complications, and guideline gaps. Each of these is a different query against the same underlying record. None of them is a report someone built in advance. A frozen data layer can’t produce these slices. It either returns the full record — leaving the model to filter, at cost, with PHI exposure — or it returns a pre-built report that fits some visits and not others. Liquidity is what lets the system match the data to the encounter rather than the other way around.

The Full Argument, in Order

Healthcare data fails in three ways: incomplete, dirty, and frozen. The first two are widely recognized problems with active industry investment. The third is underappreciated, and it is the one that determines whether the investment in the first two pays off at the point of care.

It helps to think of the three layers as sequential rather than independent. Completeness without cleanliness gives you a large, unreliable record. Add cleanliness and you have something accurate — but if it still requires a custom integration before any application can use it, you’ve solved the data problem and deferred the delivery problem. Liquidity is what closes that gap.

Layer

What it solves

What breaks without it

Cross-system integration

Completeness — every source visible, no blind spots

Clinical picture is partial. Investigation and synthesis operate on an incomplete record and produce confident answers built on invisible gaps.

Ontology annotation

Cleanliness and precision — every concept normalized, queryable by meaning not code

Scoping requires the model to filter, not a query to filter. PHI exposure before minimum necessary is satisfied. Token costs rise. Accuracy falls.

Data liquidity

Flowability — any use case can query the right slice without a bespoke pipeline

Each new application requires months of integration work. The system cannot scope a query to a clinical dimension that doesn’t exist as a queryable layer in the data. Minimum necessary cannot be satisfied by architecture.

What This Means for Clinical AI Infrastructure

Every clinical AI application — the ambient scribe, the pre-visit brief, the prior auth agent, the coding assistant — runs on a data layer. The capability of that layer determines what the application can do, how much it costs to run, and whether it satisfies the regulatory requirements that apply to PHI in the United States.

A data layer that can’t scope to a task hands the whole record to every application that asks for it. That costs more per query, produces noisier outputs, and creates a HIPAA exposure that most teams haven’t fully thought through — because they’re focused on what the model says, not on what it was given access to. The data layer is not the interesting part of the stack to build. But it’s the part that determines whether the interesting parts work.

ThetaRho builds the liquid data layer, accessible to AI agents. RISA is the proof that it works in production, on athenahealth, for the independent and specialty practices that need the same clinical intelligence as the largest health systems — without a three-year integration project to get there.


This post is part of The Clarity Protocol, ThetaRho’s ongoing series on AI, clinical workflow, and healthcare data. The next piece closes the RISA framework with Act — what it means for clinical AI to move from assembling context to taking action, and the guardrails that make agentic action safe and auditable in a regulated environment.


ThetaRho (thetarho.ai) builds clinical AI infrastructure for healthcare organizations. RISA is our clinical intelligence platform — HIPAA-compliant, AICPA SOC certified, and live on the athenahealth Marketplace.



Leave a Reply