On Limitations of Systems Consuming Taxonomies

As taxonomists (and ontologists and semantic modelers, etc.) we want to, and often do, build robust, standards-compliant semantic structures with lots of attributes, identifiers, labels, and other information – to say nothing of advanced semantic relationships.

Enterprise information environments in large organizations with lots of content (that is: all large organizations) tend to have a variety of content and asset management systems. In well-developed information environments, these systems consume taxonomies from a central taxonomy management system that acts as the Single Source of Truth (SSoT) for taxonomy and metadata tags. In these situations, specific consuming systems (CMS, DAM, CRM, and other three-letter acronym systems, among others) take all or part of the taxonomies from the central repository (usually, and hopefully, a dedicated and enterprise-class taxonomy management tool) and use them to populate tagging systems, picklists, and other interfaces in which metadata is applied.

Unfortunately, most consuming systems are…very bad at taxonomy, by which I mean “unequipped to ingest modern taxonomy structures”. Specifically, many such systems:

  • Cannot ingest taxonomies via API, so they have to be manually updated from spreadsheet exports from the taxonomy tool;
  • Cannot store or display attributes, at all;
  • Cannot store any kind of unique ID;
  • Cannot store or display alternative labels;
  • Cannot handle polyhierarchy; and
  • Some cannot even ingest a hierarchy, so taxonomies get flattened into mere lists.

This is a hassle, and not just because it doesn’t take advantage of the beautiful semantic models we build; it also causes workflow (and therefore productivity) issues.

  • Ingesting taxonomies via manual processes instead of API integrations is inefficient and introduces the potential for errors and de-synchronization (lack of alignment) with the enterprise taxonomy system

One thing computers are really good at is passing information back and forth between systems; they can even–bear with me here–transform that data if necessary! This issue is either caused by the lack of the consuming system to ingest taxonomies via API or, more frequently, a lack of IT resources (or institutional will) to prioritize these integrations.

  • Lack of ability to display any attributes is a major issue, as one of the primary use cases for taxonomies in consuming systems (like CMS and DAM software) is for taggers to tag assets.

Fields like Definitions and Scope Notes exist to assist taggers in selecting the correct terms to apply, so the lack of these fields in tagging interfaces means that people performing this work must refer to some external source (either the taxonomy system itself or, more likely, some kind of exported spreadsheet) to access these helpful fields. I am not an expert in Six Sigma but this seems like bad process design.

  • Unique IDs, whether GUIDs, URIs, or system-assigned accession numbers, are extremely helpful as unique strings that refer to a concept regardless of label (or labels).

Lack of unique identifiers causes massive problems when, say, the label for a term changes or different systems use different labels for the same concept (which is extremely common in large enterprises). Tying concepts to identifiers simplifies re-tagging efforts (which can be massive and, accordingly, expensive) and allows data fabrics/lakes/integration layers to aggregate tagging information across systems.

As we try to move away from strings to things (“Things not Strings”) the lack of unique identifiers is regressive, as our rich semantic structures are basically reduced to pushing strings around.

  • Alternative labels are great for enhancing search and many other purposes, but they are also helpful for taggers to find the concepts they need.

The primary problem, as always, is that language is ambiguous. For example, if a tagger is looking for the term “heart attack” and our taxonomy expresses that concept (as the primary label) as “myocardial infarction” they are never going to find the correct tag. Allowing taggers to see (and search, crucially) alternative labels further increases the chance that your assets are tagged with high-quality, consistent, and accurate metadata.

  • Lack of support for polyhierarchy only causes problems if, well, you have polyhierarchies in your enterprise taxonomies.

I will include here the lack of support for any kind of semantic relationships beyond the simple hierarchical parent-child (or BT-NT) relationships. Basically, any kind of semantic modeling is discarded; this can be fine for many purposes but restricts the ability of the modelers to serve business requirements that could benefit from treating semantics as more than pushing lists of words around.

  • Lack of support for even hierarchical relationships reduces all taxonomies to flat lists.

Lastly, in addition to exacerbating the issues in the previous bullet, the inability to even ingest or display hierarchies at all is breathtakingly common. I can’t even imagine[1] what leads to products like this. Needless to say, this is…not great for people that need to apply tags.

[I]f a tagger is looking for the term “heart attack” and our taxonomy expresses that concept (as the primary label) as “myocardial infarction” they are never going to find the correct tag. Allowing taggers to see (and search, crucially) alternative labels further increases the chance that your assets are tagged with high-quality, consistent, and accurate metadata.

[1] N.B.: This is not true; I actually can imagine why, and it’s the same as it ever was: complete disregard for user research during product and interface design.

I’m interested to hear what other issues people have had with systems consuming taxonomies; is my list exhaustive? (I suspect not.) Are you working with any platforms that are good at taxonomy? (I don’t want to get into vendor/product evaluation but you are more than welcome to do so in comments.)

+ posts