Meta and metadata

In which Our Heroine meditates, or perhaps rambles, on such subjects as taxonomy, proper names, and content management systems.

OATW the first blog where I’ve bothered about tags.* Previously, I was primarily concerned about categories, and that taxonomy (plus date) was sufficient, especially when a keyword search was also available. But in this case, I thought it made sense (mainly because I wanted to have some way of sorting by APS collection and figured I’d have enough of those that I’d end up with a stupidly long and/or indented Category list). I’m still not sure how I feel about tag clouds—rather, I still think they’re kinda silly and gimmicky, more infographic than navigation…but if one wishes to not have long and/or indented lists, then I suppose it’s not a bad navigational supplement—but I figured I’d give that a try, too. But more interesting is tagging within the WordPress universe.

Interesting is not necessarily powerful. I mean, it is, in theory. It’s a way of aggregating conversations on the same topic. Except.

The obvious limitation is that it’s within the WordPress universe only. I like it as a blogging platform (and have, for that matter, used it in a more comprehensive CMSy sort of way) but there are plenty of others. This limitation is not quite so broken as, say, LiveJournal’s approach to RSS. But I consider it more a curiosity than a search tool. (Okay, part of that is probably because, generally speaking, I don’t really care what random bloggers are saying. I keep up with some non-random bloggers, and that’s enough of a time sink.) The WordPress universe is a subset of the sprawling interwebs. And while there’s no way to truly plumb the depths of everything out there, it’s certain that there are many algorithms that give it a pretty good try…and they don’t begin by restricting results to WordPress.

My other reservation about the usefulness of tags is the purely human factor. There is, theoretically, a powerful tool for categorizing and aggregating content. But people don’t tag their posts. Or they use tags so generic (or specific) as to be useless. Or they’re Neil Gaiman and have fun making long jokey tags (and, really, what does Neil Gaiman care? Everybody’s reading him anyway.) In short, people will take a perfectly well-constructed system and muck it up.

And sometimes it’s not even clear who is mucking it up, or how. When curiously clicking around, I established that I am, thusfar, the sole person in the WordPress universe to tag posts with the names of various nineteenth century scientists. I confess to being less than surprised, as I suspect most people manage to have long and fulfilling lives without ever encountering Sir James Paget. But Lyell? Friend to Darwin? Should there not have been references on more than two blogs?

And in fact, looking for material tagged “Charles Lyell” rather than “Sir Charles Lyell,” one gets more hits. That is a compelling argument in favor of changing my posts’ tags. If there is a standard, one should conform to it. But…I have become quite the fan of the Library of Congress Authorities. They are most useful when doing item-level description. And if they say the “Sir” is included in the authoritative name, then by God I’m going to use it in my tag.

Of course, “Sir Charles Lyell” is not the authoritative format. That would be “Lyell, Charles, Sir, 1797-1875” (and let’s not even get started on en-dashes). Quite the mouthful. Obviously inappropriate for a tag; very unwieldy, possessed of inconvenient commas, and while it has the benefit of enabling alphabetical sorting, that’s just not the way people tag their blogs. And so I have come full circle, citing conventional usage to rationalize aspects of a decision that itself goes against conventional usage.

I am left with conflicting desires. On the one hand, I want to convert the data one can reasonably divine applies to Lyell, Charles, Sir, 1797-1875 and impose conformity. I also want to leave the messy data wholly untouched. Trust anyone who knows/cares enough to look for Sir Charles Lyell as well as Charles Lyell, and trust writers to know/care enough to at least spell his name right. Let Neil Gaiman keep his cutesy tags, because tags can be entertainment as well as tools, and they also speak to the creator’s original order. And that is information, too. Potentially information that we care about. (At least where “we” is defined as “archivists.”) And while we want to be able to impose order, the very concept implies that there’s at least a bit of a mess to begin with. Sorting it out should, in theory, be the fun part. Or the intellectually stimulating part. Or the part that makes us want to strangle somebody, but at least we end up with a good story.

* This is not the first blog I’ve had with an option to tag, but my blogging did predate tags. And I didn’t even jump on the blogging bandwagon early. Sigh. I remember when we had to walk uphill both ways to hand-code our nested HTML tables.

But oh, how I want to go into the back end of AT and clean it up. Despite the availability of the LoC site, discrepancies and duplicates and typos have crept into the data, not to mention the plain old ambiguities because the original creators inconsiderately skimped on using proper names or had execrable handwriting. I want to make global changes, scrub the data until it gleams…at least, until the next time humans screw it up.