Do hierarchical classification systems always suck?
Clay Shirky argued recently that "classification schemes are going to be largely displaced by tagging". He points to Amazon and Wikipedia as two examples of how classification systems suck, and it would be hard to disagree. Shirky is a smart guy, and having just implemented our own user-driven classification system in Reef, his essay made me stop and wonder whether ours was destined to suck, too.
More to the point, it seemed like a good time to stop and think about how our classification system will integrate with our tagging system (which we have, too). Right next to Shirky's post is an interesting post by Tom Coates on how tags behave when the things being tagged also inhabit a hierarchical system.
There's a basic point that Shirky is muddying: he conflates classification systems with rigid, top-down, professionally applied metadata. He should not be mentioning Amazon and Wikipedia in the same breath, because the former (rigid, top-down, etc) deserves to be junked, while the latter (flexible, user-driven) is simply an experiment that needs to be improved.
The reason classification in Wikipedia is lousy is not that it's too expensive or too hard, it's that most people don't care about it; it's not terribly useful, because you don't usually browse encylopedias. I use wikipedia all the time, but I go there with specific questions and search not browse is exactly what I need. This is precisely why I rarely (if ever) bookmark wikipedia pages -- and I never tag them. I don't need to tag them.
The situation in Reef is quite different, because there are all sorts of information types (discussion, events, articles, etc.) that flow past you in this system. There's a need to be able to flag and organize anything and everything in whatever way you want -- Reef works like del.icio.us: you bookmark something by tagging it.
At the same time, articles (and for now, only articles) live in a hierarchical page space. If you want to add a new article, you have to add it as the child of some other page, which means that every article has a place in the page hierarchy. This is different than a traditional wiki because we treat the link as a parent/child relationship and let you explicitly view and edit a page's "paths". That's what's turns this into a classification system rather than a link network. This hierarchy is user-created, user-modifiable, and more flexible than your file system because a page can have as many different parents and children as you want it to.
The reason I think it's imperative that Reef support classification is that for 2People browsing is essential. One of our prime use cases is: you don't know what action to take next. You need to be able to, say, go to the section on "green homes" and get an overview of what your options are.
The interesting question is, what's the relationship between the page hierarchy and the tagging system? In Tom Coates' example, they're using tags applied to songs to generate information about albums and artists. This implies a sort of "summation operator" for tags that lets you derive a tagset that could be applied to the "thing" (say, album) that represents the collection of tagged items. I don't think this model really applies in Reef. Let's say you take all the articles that are descendants of the "energy efficiency" article, and look at their tags. I don't see that there would be much benefit in "summing" the tagsets. The nature of this hierarchical relationship is different, and also the tags are different -- people tag songs with tags like "groovy" and "techno", but in Reef the tags are going to be more content-driven.
But there's another way to look at it. Tags form an implicit, hierarchical classification system that is derivable from tag co-occurrences. (Ask me about this, if you're interested in the algorithm.) So, in principle, we could generate alternate views of the page hierarchy based on tagsets. But for us, now, this is too complicated. In the meantime, it makes sense to think of integrating tag info into the page hierarchy -- perhaps by using tags to generate lists of "related" pages that can appear alongside each page's list of "child" pages.
More to the point, it seemed like a good time to stop and think about how our classification system will integrate with our tagging system (which we have, too). Right next to Shirky's post is an interesting post by Tom Coates on how tags behave when the things being tagged also inhabit a hierarchical system.
There's a basic point that Shirky is muddying: he conflates classification systems with rigid, top-down, professionally applied metadata. He should not be mentioning Amazon and Wikipedia in the same breath, because the former (rigid, top-down, etc) deserves to be junked, while the latter (flexible, user-driven) is simply an experiment that needs to be improved.
The reason classification in Wikipedia is lousy is not that it's too expensive or too hard, it's that most people don't care about it; it's not terribly useful, because you don't usually browse encylopedias. I use wikipedia all the time, but I go there with specific questions and search not browse is exactly what I need. This is precisely why I rarely (if ever) bookmark wikipedia pages -- and I never tag them. I don't need to tag them.
The situation in Reef is quite different, because there are all sorts of information types (discussion, events, articles, etc.) that flow past you in this system. There's a need to be able to flag and organize anything and everything in whatever way you want -- Reef works like del.icio.us: you bookmark something by tagging it.
At the same time, articles (and for now, only articles) live in a hierarchical page space. If you want to add a new article, you have to add it as the child of some other page, which means that every article has a place in the page hierarchy. This is different than a traditional wiki because we treat the link as a parent/child relationship and let you explicitly view and edit a page's "paths". That's what's turns this into a classification system rather than a link network. This hierarchy is user-created, user-modifiable, and more flexible than your file system because a page can have as many different parents and children as you want it to.
The reason I think it's imperative that Reef support classification is that for 2People browsing is essential. One of our prime use cases is: you don't know what action to take next. You need to be able to, say, go to the section on "green homes" and get an overview of what your options are.
The interesting question is, what's the relationship between the page hierarchy and the tagging system? In Tom Coates' example, they're using tags applied to songs to generate information about albums and artists. This implies a sort of "summation operator" for tags that lets you derive a tagset that could be applied to the "thing" (say, album) that represents the collection of tagged items. I don't think this model really applies in Reef. Let's say you take all the articles that are descendants of the "energy efficiency" article, and look at their tags. I don't see that there would be much benefit in "summing" the tagsets. The nature of this hierarchical relationship is different, and also the tags are different -- people tag songs with tags like "groovy" and "techno", but in Reef the tags are going to be more content-driven.
But there's another way to look at it. Tags form an implicit, hierarchical classification system that is derivable from tag co-occurrences. (Ask me about this, if you're interested in the algorithm.) So, in principle, we could generate alternate views of the page hierarchy based on tagsets. But for us, now, this is too complicated. In the meantime, it makes sense to think of integrating tag info into the page hierarchy -- perhaps by using tags to generate lists of "related" pages that can appear alongside each page's list of "child" pages.