Ontology vs taxonomy: why the difference is load-bearing in healthcare

The distinction between an ontology and a taxonomy is one of the most casually confused in healthcare knowledge engineering. Teams use the words interchangeably. Vendors use knowledge graph as if it settled the matter. It does not. The difference is real, it is load-bearing for downstream reasoning, and pretending it does not exist is the quiet source of half the failures of healthcare knowledge graphs.

This note is the long version of the short glossary entry. It walks through why the difference matters, what specifically a taxonomy cannot express, and why the substrate underneath a healthcare AI has to commit to being one and not the other.

01 / The textbook definitions, sharpened

A taxonomy is a hierarchical classification. It tells you what category something belongs to, and what categories sit above and below that category. Its core relationship is is a kind of. ICD-10 is the canonical healthcare example: E11.9 (Type 2 diabetes mellitus without complications) is a kind of E10-E14 (Diabetes mellitus), which is a kind of E00-E89 (Endocrine, nutritional and metabolic diseases). The hierarchy is the structure.

An ontology is a classification plus typed relationships. It tells you what category something belongs to, and what the formal relationships between categories are, with the relationship types themselves named and constrained. SNOMED CT is the canonical healthcare example: it carries the same kind-of hierarchy a taxonomy does, but it also carries attribute relationships like finding-site, associated-with, causative-agent, interprets, and many more. A concept like “Diabetic nephropathy” sits in the kind-of tree under “Disorder of kidney”, and also carries an explicit finding-site relationship to “Kidney structure” and an associated-with relationship to “Diabetes mellitus”. The hierarchy plus the relationships are the structure.

Stated as a one-liner: a taxonomy classifies; an ontology classifies and relates. The difference looks small in a definition. It is not small in practice.

02 / The statement that breaks a taxonomy

Consider a clinically obvious claim: ACE inhibitors are contraindicated in bilateral renal artery stenosis.

To put this in a substrate, the substrate has to support, as first-class entities:

A concept node for ACE inhibitor (or each ACE inhibitor)
A concept node for bilateral renal artery stenosis
A relationship type contraindicated-with
A specific instance of that relationship type connecting the two nodes
A rationale, version, and provenance attached to the relationship itself

In a taxonomy, you can classify ACE inhibitors under antihypertensive agents. You can classify bilateral renal artery stenosis under renal vascular disease. You can do this cleanly. What you cannot do, with the tools a taxonomy provides, is express the relationship between these two nodes. Taxonomies have one verb, and that verb is is a kind of. They do not have contraindicated-with. They do not have interacts-with. They do not have causes or requires or is-indicated-for. The structure is hierarchical, and the only thing it can say about two nodes is that one of them is a parent or an ancestor of the other - or, more often, that they share an ancestor at some level up the tree.

Sharing an ancestor is not a contraindication. It is not a drug-drug interaction. It is not a care-pathway prerequisite. It is not any clinically meaningful relationship at all. The taxonomy is silent on the thing that matters.

03 / The “knowledge graph” sleight of hand

This is where the conversation usually gets fuzzy. A team builds a graph database. They load their taxonomy into it. They draw lines between concepts that seem related. They call the result a knowledge graph. The vendor deck says: “structured medical knowledge, queryable as a graph”. The diagram has nodes and edges. The thing looks like an ontology.

It is not. A graph database is a data structure, not a semantic commitment. You can put a taxonomy into Neo4j or any property graph and call it a knowledge graph. It is still a taxonomy. The lines between the nodes are edges in a graph, but they often carry no relationship type at all - or carry a string label like “related to” that means whatever the writer felt that day and nothing reliable to a downstream consumer. The schema is implicit. The constraints are absent. The validation is whatever the loader chose to enforce.

The test is not “is it in a graph database?” The test is: for any two connected nodes, can a downstream consumer ask the substrate what the formal relationship between them is, and get an answer the substrate is willing to defend? If the answer is “they are connected because someone drew a line”, the substrate is a taxonomy in graph clothing. If the answer is “they are connected by contraindicated-with, with this rationale, at this confidence, in this substrate version, with this curator-of-record”, the substrate is an ontology.

The visual is identical. The semantic commitment is not.

04 / What it costs downstream

The cost of pretending shows up wherever a downstream consumer has to reason about the relationships, not just retrieve nodes.

Clinical decision support that cannot compose constraints. A patient has condition X and condition Y. A decision-support tool needs to list drugs that treat Y but are safe in X. In a taxonomy with arrows, the tool can list drugs associated with X and drugs associated with Y separately. It cannot compose the safety constraint, because there is no machine-actionable relationship for it to compose. The composition collapses to a human reading the result and applying the constraint by hand. At that point the substrate has done nothing the doctor was not already doing.

Cross-mapping that loses rationale. A team building an ICD-10 to SNOMED CT cross-map needs to say not only that an ICD-10 code maps to a SNOMED concept but how and why and with what confidence. If the relationship itself is just an unlabelled edge, the mapping is unverifiable. The cross-map collapses to a list of pairs - which is what you get when you treat both sides as taxonomies and draw lines.

AI grounding that retrieves but cannot reason. An AI grounded in a substrate that is a taxonomy with arrows can pull back nodes related by edge-existence. It cannot pull back nodes related by contraindicated-with if the substrate does not formally type the contraindication. The model can guess. It frequently does. The guess does not survive an audit.

Auditor questions that have no substrate answer. An inspector asks: what is the formal relationship between these two concepts in your substrate? If the answer is “they are both in the graph”, the substrate is not an inspectable artefact. It is a search index in graph format.

05 / What ontological commitment actually looks like

Committing to ontology, in the engineering sense, means a small number of properties have to be true of the substrate, all at once.

Every concept has a stable, dereferenceable identifier - an IRI. Not a row number; not a string label; an identifier that does not change when the label changes or the concept moves in the hierarchy.

Every relationship between concepts has a named type. Contraindicated-with is a relationship type. Indicated-for is a relationship type. Causes, interacts-with, requires, is-equivalent-to, is-disjoint-from - each is named, defined, and constrained. The substrate ships a schema that enumerates the relationship types it supports and what each one means.

Every relationship has provenance and a rationale. The fact that ACE inhibitors are contraindicated in bilateral renal artery stenosis is not floating; it is sourced, versioned, and explained. When the rationale is clinical convention, the substrate says so. When the rationale is regulator-mandated, the substrate says so. When the rationale is curator judgement, low confidence, the substrate says that too.

Conformance shapes - shipped as SHACL in our case - encode the constraints the substrate’s ontology makes. Cardinalities, required fields, value-domain restrictions, mandatory provenance. A downstream consumer can validate that the data they produce against the substrate conforms to the same shapes the substrate enforces internally.

This is not a maximalist position. It is what ontology actually requires, as opposed to what the word is often used to mean.

06 / Where the Healthattica fabric makes this commitment

We treat the ontology/taxonomy distinction as a hard line in our engineering, not a stylistic preference. Some of the standards we ingest are ontologies (SNOMED CT, FHIR R5 with its resource graph); some are taxonomies (most of ICD-10, CPV, NAICS, EMDN); some are mixed. Where we ingest a taxonomy, we ingest it as a taxonomy, and we are explicit that the relationships available are kind-of relationships and nothing more. We do not retrofit typed semantics onto a taxonomy and pretend the result is ontological.

Where we cross-map between a taxonomy and an ontology - ICD-10 to SNOMED CT being the canonical case - the cross-map itself is the place where the typed semantics live. The relationship is a typed link with rationale, confidence, version, and curator-of-record attached. The cross-map is the substrate’s commitment to a specific, defensible, inspectable bridge between the two regimes.

This is the Logos commitment of the Attic Standard applied at the relationship layer. Every typed relationship in our substrate carries a reasoned account. The reasoned account is what makes the relationship survive an audit. Without it, the substrate is a graph of edges with no story.

07 / Why this is load-bearing

The distinction matters because everything downstream of the substrate - decision support, cross-mapping, AI grounding, contraindication checks, pathway reasoning, regulatory landscape composition - requires that the relationships between concepts are first-class and machine-actionable. The relationships are where clinical and regulatory meaning lives. A taxonomy carries some meaning; a graph of taxonomies with arrows carries some appearance of meaning; only an ontology carries the kind of meaning a downstream system can reason about, and an auditor can inspect.

For the short version, see the glossary entry. For how the substrate’s relationship layer is surfaced - across graph traversal, linked-data response, and AI agent capability - see the engineering page.

The next note in this thread will be on cross-mapping ICD-10 and SNOMED CT specifically - on the roughly 17% of mappings that need a rationale, not a rule, and why that 17% is where the engineering ends and the curation begins.