
Sifry's State of the Blogosphere May 1st edition: the multilingual, tagged blogosphere
by
Tris Hussey
on May 1, 2006 09:51PM (PDT)
David Sifry's State of the Blogosphere has another installment. While the last one focused on the growth of the blogosphere (that it is still growing, but begs the question, but for how much longer), this one looks at two very important facets of the "modern" blogosphere -- languages used and the growth of tagging.
Let's look at language first. While English might be the de facto language of commerce, and even the Internet, it is becoming less important as the language of the blogosphere. English is steadily declining as the majority language of the blogosphere and Japanese and Chinese are growing quickly. While Sifry admits that many languages, like Korean, might be under represented, it is clear that the nature of the online world is changing. With that change all of us, from software developers to advertisers to ad networks, need to face facts ... we're going to need to support Asian languages better and cater to that huge (and growing) audience.
Now let's look at tagging. Technorati pioneered tagging as a way for people to go beyond gross categories (say blogging or blog editors) to smaller, more granular descriptors (like Qumana). To say that tagging has taken off is like saying Canadians like hockey. The adoption of tagging has been nothing short of stellar. The chart below shows that almost half of all blog posts are tagged and the rate of increase is just growing. Unlike the growth of blogs, the growth of tagging can continue longer because as new bloggers join they have to "catch up" and start tagging. Tagging isn't perfect. Since it is a user-driven system different tags can be used for the same concepts, but the tags are necessarily linked. Take blog editors and offline blog editors. To me both of these tags mean the same thing, and Qumana is one of those tools, but are they linked? If you search for blog editors will you find the same content as offline blog editors? Probably not, unless the post is tagged with both of them.

So this is a clear flaw, but one that those of us who are thinking and writing about tags recognize. I think it's only a matter of time before someone develops a way, probably based on search engine algorithms, to start linking tags together into larger groups ... even across languages.
In addition to the lack of connection among synonymous tags, there is the looming threat of tag-spam. David doesn't think this is going to be a major problem:
Of course, one of the remaining open questions is whether or not that will lead to massive gaming of the system, but current trends seem to present evidence that large-scale gaming is not occurring. In fact, my belief is that because tags are built as hyperlinks inside the document, and thus visible to the reader, that a strong social pressure to use appropriate tags (or at least to not use inappropriate tags) manifests itself, especially with bloggers who want to cultivate influence and readers.
I don't share his opinion that tag-spam will be prevented by a self-policing system, but since it is on Technorati's radar--and many of ours as well--I expect (and hope) that potential solutions can be
found before it becomes a problem.

Qumana is, of course, big on tags and tagging. We put one-click, easy tagging into our editor early on (first we believe). We chat with the folks working on tags and tagging whenever we can. Tags, once the connectors are built between synonymous tags, can be used to build larger and larger groups of interest. All fun stuff to come down the road.
Where does this leave us? This is David's summary of his State of the Blogosphere installment:
- The blogosphere is multilingual, and deeply international
- English, while being the language of the majority of early bloggers, has fallen to less than a third of all blog posts in April 2006.
- Japanese and Chinese language blogging has grown significantly.
- Chinese language blogging, while continuing to grow on an absolute basis, has begun to decline as an overall percentage of the posts that Technorati tracks over the last 6 months
- Japanese, Chinese, English, Spanish, Italian, Russian, French, Portuguese, Dutch, and German are the languages with the greatest number of posts tracked by Technorati.
- The Korean language is underrepresented in this analysis
- Language breakdown does not necessarily imply a particular country or regional breakdown.
- Technorati now tracks more than 100 Million author-created tags and categories on blog posts.
- The rel-tag microformat has been adopted by a number of the large tool makers, making it easy for people to tag their posts. About 47% of all blog posts have non-default tags or categories associated with them.
Hat tips: Bloggers Blog, WebProNews
Tags: Technorati, state of the blogosphere, tags, tagging, tag-spam, international blogosphere