On Taxonomy Governance

I believe most taxonomists would agree that building taxonomies is fun.

Most taxonomists also do not spend most of their time building taxonomies.

Rather, after a taxonomy program is up and running most of a taxonomist’s effort is spent on taxonomy governance.

Many words have been spilled on taxonomy governance; a Google search of the exact string “taxonomy governance” yields some 3000 results. Most of these are either about why governance is important (it is) and/or suggested practices and strategies for governance.

We also need to think about taxonomy governance for a large number of taxonomies distributed across large enterprises. In some large organizations, taxonomy changes are so numerous that organizing the required work—and dealing with the downstream effects of changes¹—is at a different scale.

Enterprise Taxonomy Governance at Scale

It’s no exaggeration to say that some large taxonomy programs include hundreds (no, really) of taxonomies housed in and distributed across dozens of systems, such as:

Taxonomy/ontology management systems
Product Information Management (PIM) systems
Content Management Systems (CMS)
Digital Asset Management (DAM) systems
Marketing platforms
Sales and Customer Engagement platforms
Event planning and management platforms
Data analytics platforms
…and any number of websites and other applications

Some systems will store and manage vocabularies and some will ingest them to power navigation, tagging, analytics, and other capabilities requiring vocabulary control.

***Figure 1*** Diagram of a complex system of taxonomies and systems with integration arrows connecting the boxes and databases.

Managing requests, changes, and downstream effects requires, in addition to dedicated staff, some kind of ticketing or request-tracking system, one or (extremely likely) more environments in a dedicated vocabulary management system, and various workflows prioritizing and sorting work to be done.

Requests for changes may conflict or be problematic for other reasons. Adding new and removing old terms require retagging of assets and other downstream changes, causing synchronization issues–assuming you even managed to get all of the systems integrated via API and passing data instead of periodically downloading update files. And sometimes people ask for stuff that’s already there, or doesn’t make sense, or otherwise requires investigation and research and talking to users.

This can all seem quite daunting and hard to organize. Fortunately, organizing stuff is sort of our bag. Using categorization! And diagrams!

Fortunately, organizing stuff is sort of our bag.

There are really only a few basic shapes of taxonomy changes. Let us therefore categorize the types of changes to determine the level of effort (and specific workflow) required for each.

A Taxonomy of Changes to your Taxonomy

Remove a term
Add a term
Move a term
Rename (or update) a term
Split a term
Merge terms

Remove a term

Example: A retailer is no longer selling products under the category “Beijing Olympics”

Removing a term from a taxonomy is a complex operation. Most systems have an option to “deprecate” a term: to remove it from a taxonomy for all intents and purposes but leave it in the system as an object that can be restored. This is different from a “hard” delete which removes the term entirely. Which is correct depends on the governance protocols.

More problematically, what happens to content or objects that have been tagged with this term? Can you easily mass delete those tags, or do you need to put in a ticket to IT to write a script? What consuming systems of this taxonomy need to be updated to reflect the change? Is this change reflected on the corresponding websites? Are there longitudinal analytics that depend on this term, even if it is not being applied anymore? These complexities are often leading reasons to deprecate a term instead of deleting it.

Add a term

Example: To classify new content, a new term “Large Language Models” is required

Adding a term is probably the simplest operation, as downstream systems will simply receive the new term upon the next synchronization with the taxonomy tool.

Assuming you don’t need to secure the permission of the taxonomy owner, most of the effort adding a term comes from clarifying the need of the requestor (what is this for?), settling on a preferred label (what shall we call it?) and filling out the rest of the attributes and fields and relationships for the term. The latter can be quite time-consuming and requires attention to detail.

Although previously published content (or products, whatever) should periodically be reindexed to account for new terms it is better to schedule this as a periodic mass update (unless some pressing business need dictates otherwise).

Move a term

Example: Pluto is no longer a “Planet”, it is now considered a “Dwarf Planet”

Less common than adding and removing terms, moving terms requires some understanding of the downstream systems consuming and using (all or part of) the taxonomy. Many systems only store tags as individual terms, either in a simple hierarchy or flattened into a single list, in which case a simple update or refresh sync of the taxonomy is unproblematic. Some systems—including some methods of document tagging–also store and utilize the “full path” or “path to root” of each term and its parent and parent’s parent and so on up to the top term:

Biology → Molecular biology → Microarrays
In these cases, Move is similar to Remove in that it might trigger a cascade of downstream effects on tagging.

Rename a term

Example: “Facebook” is being renamed to “Meta”

“Rename” here refers to a term name (preferred label) change. As with Move and Remove, the primary concern with Rename is the effect on previously tagged objects, as other systems should be able to refresh with the new term name without other issues. Note, though, that changes to a preferred label might also require updating other fields, like Definition or Scope Note.

It may also be the case that the preferred label stays the same but other properties of the term need to be updated; this may or may not cause disruptions (or trigger updates) in downstream consuming systems (depending whether those fields are also consumed).

Split a term

Example: The category “Education” is being split out to “K-12” and “Higher Ed”

Split is essentially a Remove and two Adds, unless for some reason the old term is being retained and renamed (in which case it is Update and Add), with all of the accompanying issues of both.
Splitting a term is the most complex action described here. For all the other actions it is generally possible to programmatically apply the change to the systems using the taxonomy. However, for Splits every piece of content that had been tagged with the term “Education” would need to be reviewed to determine the correct new tag (“K-12” or “Higher Ed”).

Merge terms

Example: Going forward, “Beanies” and “Stocking Caps” will be considered “Hats”

Perhaps the most uncommon of the six changes, Merge comprises two Removes and an Add. And, again, hopefully your tagging or classification system will allow you to make these changes easily.

Sorting Things Out

Classifying the types of work to be done can help you see a terrifying, undifferentiated pile of work to do as a concrete set of tasks and responsibilities, which is much more manageable.

Each of the tasks outlined above requires different workflows and considerations of downstream effects. Some are simpler than others (often requiring coordination with other groups inside an organization), and understanding the level of effort required for each allows for prioritization and a kind of triage of change requests.

¹ This is another entire post; it will also have diagrams.

A Taxonomy of Taxonomy Governance

On Taxonomy Governance

Enterprise Taxonomy Governance at Scale

A Taxonomy of Changes to your Taxonomy

Sorting Things Out

Leave a Comment Cancel Reply