-->

KMWorld 2024, Washington, DC - November 18 - 21 

Folksonomy Folktales 2010

Let’s take a closer look at this list of taxonomy characteristics:

1 - “Items do not always fit exactly inside one and only one category.”  Yes, this is true, but taxonomists and ontologists have been dealing with his one for a long time and there are lots of ways to handle various situations, including polyhierarchy and other standard techniques. 

2 – “Hierarchies are rigid, conservative, and centralized.  In a word, inflexible.”  No they aren’t.  Flat out wrong.  Some are, many are not.  Virtually every taxonomy that I or a number of friendly competitors have developed are designed to be flexible (offering alternatives), progressive (built in maintenance plans to reflect change in users and/or corpus), and are hybrid models that include both a central team and constant input from users.  Also, a number of studies including ones I’ve done show that folksonomies at sites like Del.icio.us are surprisingly conservative, with little change in most popular tags and  much fewer new terms than you might expect.

3 – “Hierarchical classifications are influenced by the cataloguer’s view of the world, and, as a consequence, are affected by subjectivity and cultural bias.”  True, but overstated.  Hierarchical classifications are influenced by the cataloguer’s view of the world, and, if well done, by the view of the world of a variety of other cataloguers, current theory of good classification, and dozens to 100’s of world views of potential users.  Plus, we have developed a variety of methods to combat the influence of personal and cultural bias. 

Finally, just how are Folksonomies free of bias?  Communities by their very nature harbor prejudices and bias.  Del-icio.us, for example, is dominated by one culture – high tech computer people and that bias produces things like the tag, “Design” to point almost exclusively to bookmarks about software design.

4 – “Rigid hierarchical classification schemes cannot easily keep up with an increasing and evolving corpus of items.”  False or overstated – depending heavily on the framing words “rigid” and “easily”.  How about flexible hierarchical systems with built-in procedures for adding new terms?    True, these procedures require a level of effort but what is “easy” varies depending on the context.  They are certainly a lot easier than most folksonomy advocates might imagine.

5 – “Hierarchical classifications are costly, complex systems requiring expert cataloguers to guess the users’ way of thinking and vocabulary (mind reading),”  Overstated.  First, note again the emotional frame words – “guess” and “mind reading”.  Sorry, but that’s bogus.  It is not guessing or mind reading when you follow well researched and tested methods for obtaining input into the user’s way of thinking and their vocabulary.  However, it is true that hierarchical classifications require expert cataloguers but they needn’t be complex and as far as being costly, while it is true that taxonomies cost more to develop than a folksonomy in traditional ways of measuring cost, there are two caveats.  First, in corporate environments within the context of enterprise search, the cost of developing a taxonomy is a very minor cost.  Second, the low cost of a folksonomy is based on a myth:  that user’s time is worthless and therefore free.  Try adding up all the time that users spend tagging bookmarks and add that to the cost. 

6– “Hierarchies require predictions on the future to be stable over time (fortune telling).”  False and misleading.  Again, note the derogatory “fortune telling”.  Taxonomies only need to reflect the present (and often the past) and include a mechanism to handle change and novelty.  This is something that all good taxonomists do, but which seems not to have occurred to too many folksonomy enthusiasts.

7 – “Hierarchies tend to establish only one consistent authoritative structured vision.  This implies a loss of precision, erases difference of expression, and does not take into account the variety of user needs and views.” Mostly false.  Hierarchies do tend to establish a single consistent authoritative structured vision, but they can and usually do include variations.  The supposed drawbacks are in the second sentence and they are essentially wrong and seem to be based on a really odd view of the role of taxonomists. 

We’ve already discussed how good taxonomists always do an enormous amount of research into user needs and views.  In fact, given the tyranny of the majority effect you often see on folksonomy sites, I’d say they probably do a better job of handling the needs and views of the users who are not part of the majority community.  Next, “erases difference of expression” – no they don’t, they preserve those differences in a variety of ways, including representing minority views directly in specially designed parts of the taxonomy to the simple mechanism of variant terms, all of which can be exposed. 

8 – “Hierarchies need expert or trained users to be applied consistently.”  Overstated.  Yes,  there are known issues with consistency of tagging with a taxonomy, those problems are massively multiplied when using no taxonomy or, the same thing, a folksonomy.  Also, consistency is a sliding scale – the more complex a taxonomy, the more difficulty with consistency and as we have seen taxonomies are not all Dewey Decimal System level of complexity.  Finally, there are software options that can reduce the need for expert users.

So in summary, the 8 anti-taxonomy myths are:

1 – true, but with established and effective counter measures

2 – False

3 -  Overstated with established and effective counter measures

4 – False

5 – Overstated

6 – False

7 – Mostly false

8 – Overstated

Benefits of Folksonomies

So if the majority of anti-taxonomy myths are false or overstated, what about the other side of the story, the perceived benefits of folksonomies in general and in comparison with taxonomies?

1 – Folksonomies are easy to use. 

The trouble with this one is, as indicated above, it depends on what you are looking at.  Here we have to sharply distinguish between the act of tagging and the use of tags to find information.

It certainly seems on the face of it that just picking a word to apply to a bookmark or document is a cognitively easier task than reading through a complex taxonomy and deciding which term to apply.  However, I believe that the ease of tagging with top of the head terms is over-stated.  And the corollary, that the cognitive task of picking a term from a taxonomy is easier than claimed.   And, of course, selecting from a simple taxonomy (something that folksonomy advocates don’t seem to believe in or know about), is even simpler.  And when we move from general internet sites to either targeted vertical internet sites or enterprise sites with some control over the content, the balance shifts even more, especially when we add in categorization software that can make suggestions from within the taxonomy.  In that case, the cognitive task of agreeing with the suggestion or not is much easier than trying to think up a term. 

And finally, there is the issue of quality, that is, is it easier to generate good and useful tags off the top of your head than to select from a taxonomy?  And while that gets us into difficult waters with deciding what a good tag is, it doesn’t seem to me that you can avoid the question. 

Which brings us to the second sense of easy to use – are folksonomy tags easier to use to find information?  Having no internal structure and no relationships between terms, makes it much more difficult to use tags to find information in a variety of ways.  First of all, let’s take an example of where I’m looking for information about a topic such as neuroscience.  I did a study of LibraryThing, a social bookmarking site for librarians, and discovered that with the lack of even such simple variants as plurals, that I would have had to click on about 15 different terms to get a fairly complete coverage of the just the high level overview of the field of neuroscience. 

Some of the issues were plurals (neuroscience and neurosciences), and related terms like ( cognitive neuroscience, cognitive psychology, etc.).  And even more fundamental was the relationship between the general term neuroscience, and the sub-topics within neuroscience.  In a folksonomy there is no relationship and so each sub-topic is independent and has to be selected separately.  The most general terms in a folksonomy are simply that – general, in other words, they don’t include specific sub-topics.  And so the answer might be that that’s OK, the tag is just used to refer to bookmarks that are only general neuroscience.  Unfortunately, that quickly falls apart when we take a closer look, in part, because people who don’t know a subject very well will always choose the most general term whether the bookmark is for general neuroscience or on synaptic strengths in learning.

2 - Tags more accurately reflect population’s conceptual model

I see two immediate problems with this.  First, it’s not clear that most users have a coherent conceptual model, certainly nothing like the coherence of a taxonomy.  In one sense of the word, yes, everyone has a conceptual model – or rather they have multiple conceptual models some of which are quite deep and detailed and structurally as rich as any taxonomy, but some of which are small, skimpy, and fragmented.  For example, early research on experts found that they typically had a rich collection of about 50,000 elements (combination of facts and concepts and relationships) and that these elements are very well structured in way that allow experts to chunk large aggregates that they can bring to bear on a problem as a single entity.  These structures are probably richer and more useful than any formal taxonomy.

A problem, however, arises when we start talking about non-experts.  And even experts are non-experts in some field.  And here the picture is radically different.    Non-experts tend to have much fewer facts and relationships and the structures are very fragmented and which fragment gets applied to a given situation is often largely determined by outside influences.  For example, someone might tag a bookmark with the tag X the first time they see it, but a virtually identical bookmark might be tagged with Y because in the meantime, they have been exposed to other ideas that influence their categorization.  (The influence of one categorization schema on learning another schema is called “Intertwingledness” in category theory.)  And this is particularly a problem in that there is no practical mechanism for normalizing tags over time.

So the second point is that reflecting an incoherent conceptual model is not really the best way to put a metadata tag front end on a content collection.  This is true even for someone using their earlier tags to find their own information and even more true for other people trying to use that tag to find new information.  This is not to suggest that people are stupid and librarians should rule, but rather that categorizing and tagging is a specific skill that most people are not trained in. 

A related problem is that experts and non-experts categorize differently.  To explain, let’s look at an intriguing idea from category theory which is called a natural level or basic level category.  An example of a basic level category is the word, “dog” which in common taxonomies appears below mammal and above a number of types of dogs like Golden Retriever or Boxer.

Basic level categories are intermediate levels within a hierarchy that have a number of important characteristics.  First, they are categories that children learning a language tend to learn first and they are categories that tend to be used more often.  Second, they are categories whose members have a particularly powerful combination of expressiveness and distinctiveness.  Distinctiveness refers to how strongly members can be distinguished from other members on the same level.   So in our example, dogs are very different from cats – much more different from each other than say a Golden Retriever and a Golden Lab.

Now, aside from a fascinating (for some of us) digression into category theory, what does this have to do with folksonomies and reflecting user conceptual models?  Well, the answer is that experts tend to have a preferred level of categorization that is lower or more specific than a non-expert.  So, for example, a dog show judge would not use the word “dog” for a picture of their favorite toy poodle, they would use the lower level, toy poodle.  On the other hand, someone who knows very little about a subject tends to tag a level higher than average.  They might choose the word “philosophy” for a web site that discussed a number of philosophical issues about epistemology while a more experienced reader might choose the middle tag of “epistemology” and an expert might choose a lower level tag like mind-body problem. 

If all you want to do with folksonomies is use them for community building in which experts would likely self-select and the differences in categorization wouldn’t matter as much, then that is fine.  It is an interesting way for like-minded and like-experienced people to find each other.  The problem is when you start comparing folksonomies and taxonomies in terms of usefulness in finding information.  They are not comparable in any significant way.  Discovering people with the same interests and same level of expertise is great, but it is a very small part of information behaviors like searching for content.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues