Saturday, December 26, 2009

Blog Datasets for Researchers...

For any academics looking to perform research on blogs, a series of new tools has recently been developed to assist you in the endeavor.

Thanks to a listserv that I subscribe to from the American Political Science Association (APSA), a pair of datasets, and software tools to analyze them, have been made freely available. One dataset consists of nearly 4000 entries from the Huffington Post (.zip), another focuses on the Obama HQ Blog (.zip), and a third, the Blog Analysis Toolkit, is a web application that makes it easy to organize and sift through any and all blog data of a researcher's choosing.

To better understand...

The Blog Analysis Toolkit (BAT) is a free, Web-based system for capturing, archiving and sharing blog posts. Blog posts are acquired via RSS feeds, and stored in a database where they can be accessed and shared by other researchers. So far, 403 users have set up BAT accounts since April 2008, and collectively they have archived 101962 posts from 435 blogs. Users can add individual blogs to the repository or do research using samples from existing collections created by other users.

The significance of these datasets lies not so much in the raw data itself (which is still in its embryonic stage of development), but rather in the potential of a movement of scientifically collecting blogging data. There is a wealth of information out there waiting to teach us lessons about Americans' political behavior, and to standardize and assist the science behind it will bring those lessons closer to being realized.


Post a Comment

<< Home