Skip to content

← Journal

Healthattica Journal·Entry 04 of 06

The regulator graph: resolving MoHAP, DHA, and the same body under three names

A regulator's name appears half a dozen ways across documents, vendor decks, and submissions. Resolving them to a single canonical entity - with country, scope, and authority as first-class - is what makes a regulatory landscape queryable instead of just searchable.

A regulatory affairs lead spends an afternoon trying to build a market-entry matrix for a Class IIa medical device across forty jurisdictions. The Excel grid takes shape quickly until the names start fighting each other. A vendor’s product label says NMPA. The same vendor’s regulatory submission says China NMPA (formerly CFDA). A consulting deck calls it the Chinese FDA. A clinical-trial registration says State Food and Drug Administration - which is the literal translation of the body’s name from two reorganisations ago. All four refer to the same regulator. None of them resolve to the same string. By the end of the afternoon, the matrix has four China rows, three of which the lead has to silently merge based on context she happens to know.

The matrix that goes to the board the next day looks complete. It is not. It is a search index in a spreadsheet, and the merging she did by hand will get redone by the next analyst, differently, on the next matrix. Multiply this by forty jurisdictions and a few hundred regulators across the relevant scopes, and the cost of treating regulator names as strings stops being a productivity tax and starts being a substrate-shaped hole.

This note is about the shape of that hole and what fills it. The fill is what we call the regulator graph - a substrate where every regulator is a single canonical entity with country, scope, authority, and alternate-labels as first-class fields, and where queries return one row per body, not one row per spelling.

01 / The four shapes of regulator confusion

The disambiguation problem is not one problem. It is four problems that look the same in a spreadsheet and that resolve differently in a substrate.

Same body, multiple names. A regulator’s official name in English, its name in the local language, its acronym, its informal name in trade press, and its historical name before the last reorganisation are commonly all in circulation simultaneously. The UAE Ministry of Health and Prevention is MoHAP, Ministry of Health and Prevention, the older Ministry of Health, the Arabic وزارة الصحة ووقاية المجتمع, and in some trade contexts simply the federal ministry. Five strings, one body. The substrate has to know they are the same.

Different bodies, similar names. MoHAP (federal, UAE) is not DHA (Dubai Health Authority, emirate-level). DHA (Dubai) is not DoH AD (Department of Health, Abu Dhabi). Each has its own scope, its own jurisdiction, its own authority over what. A naive matrix that lists UAE: MoHAP in one row and UAE: DHA in another, with no relationship between them, treats them as alternatives. They are not alternatives - they cover different things. The substrate has to know they are distinct, and it has to know what each is responsible for.

Hierarchical bodies. A ministry contains a regulatory authority which contains a directorate which contains a division. The European Medicines Agency is part of a broader EU regulatory ecosystem with the European Commission above it and the national competent authorities of member states alongside it. A submission references EMA; a guidance document references the CHMP, which is one of the EMA’s scientific committees. The substrate has to know CHMP is a child of EMA, and that a query about EMA’s authority transitively includes CHMP’s mandate.

Renamed or merged bodies. China’s NMPA was CFDA which was SFDA. Saudi Arabia’s SFDA was a different older body. The UK’s MHRA absorbed responsibilities from the Medicines Control Agency and the Medical Devices Agency in 2003. Documents from before the change use the old names. Documents from afterwards use the new ones. Submissions that span the change point use both. The substrate has to track the rename graph - what was renamed to what, when, with what scope continuity.

A regulator graph that does any of these four jobs has to do all four. They are the same engineering work, applied to different relationships.

02 / What a regulator node actually contains

The substrate models every regulator as a single node with a stable identifier, and attaches everything else - labels, jurisdiction, scope, hierarchical relationships, name history - to that node as typed fields with provenance. A few structural decisions matter more than the rest.

The labels attached to a regulator are not a flat string array. Each label carries its kind - whether it is an acronym, a full name, a historical name, or an informal trade-press shorthand - and its language. The substrate distinguishes MoHAP (acronym) from Ministry of Health and Prevention (full name) from Ministry of Health (historical, with an end date attached) from the Arabic form, and a query against any one of them resolves to the same node. Importantly, Ministry of Health does not silently merge with the current full name; it is marked as historical, which preserves the rename history for documents that reference it.

Jurisdiction is structured, not free-text. A regulator is attached to a country (using the international country-code standard rather than a name), to a level - federal, sub-national, supranational - and where the level alone is insufficient, to explicit geographic-coverage notes that disambiguate which sub-national territories the body actually covers.

Scope is treated as a list of regulatory domains the body has authority over, because regulator is not a single concept. The same body can be authoritative over pharmaceuticals and silent on medical devices, or the reverse. The substrate carries the domain coverage for each regulator explicitly, so a downstream consumer asking who covers what is reading a structured answer instead of inferring one.

Relationships between regulators are first-class. A regulator can have a parent body it sits underneath, and siblings at the same conceptual layer in the same country. In the UAE example, MoHAP, DHA, and DoH AD are sibling nodes in the substrate - distinct, with their own scopes and levels, and explicitly related to one another through the sibling relationship. The substrate carries the relationship; downstream queries traverse it.

Everything attached to the node - the labels, the jurisdiction, the scope, the relationships - carries provenance: the source the information came from, the source version, the import timestamp, and the date of the last curator review.

03 / How a query lands

The pay-off of this shape is that the question a regulatory affairs lead actually has - which regulator approves what I’m shipping, in this jurisdiction, for this product class - resolves through the substrate without a string-matching step. A query for which regulator covers IVDs in Dubai walks the jurisdiction and scope relationships directly, and returns two rows: MoHAP at the federal level, DHA at the emirate level. The consumer can then choose - based on the specific submission pathway, which is a separate axis of the substrate - which authority is actually competent for the product in question.

The behaviour worth noticing is what the substrate does not do. It does not over-collapse. It returns both genuinely-relevant authorities and leaves the choice between them as a substrate-supported analytical step, not a guessing game. Its job ends at here are the bodies whose authority overlaps your scenario, with the relevant relationships exposed. The compliance choice is the analyst’s. But the analyst’s choice now stands on a graph she can defend, not a spreadsheet she had to merge by hand.

04 / What this discipline is

The regulator graph is the operational form of jurisdiction-aware reasoning - one of the open research questions the programme is built around. Jurisdiction-aware is not just tagged with a country. It is the commitment that the same regulator across documents resolves to the same node, that different regulators with similar names stay distinct, that hierarchy is queryable, and that the name history is tracked.

It is also a specific application of Aletheia: each alternate label is sourced and dated, each scope claim has a provenance trail, each renaming carries the year it happened. Where the substrate is uncertain - a niche regulator whose mandate has shifted but whose published scope hasn’t caught up - the curator-of-record is on the record, and the uncertainty is marked. The substrate does not flatter its own coverage of the regulatory world by hiding the cases it has not yet curated.

The next note in this thread will be on the conformance shapes the substrate ships with each release - the SHACL definitions that say what a Regulator node has to have, what a CrossMap has to carry, what an _provenance block has to contain, and what a downstream consumer can validate against. The shapes are how the substrate’s claims are testable, not just declared.

Maps to

  • Open question: Jurisdiction-aware reasoning
  • Aletheia at the entity-resolution layer