Dispatches From Blogistan

home   |   about the book   |   about the author   |   email   |   rss   |   search

the book
Dispatches From Blogistan
by suzanne stefanac
peachpit/new riders
voices that matter series
shipping now
> amazon
> barnes & noble
from the book
> table of contents
 
> chapter 2 history of open discourse
 
> chapter 6 history of journalism
 
> 10 blog design tips
 
> what is this long tail?
 
> trackback demystified
 
> blog ethics primer
 
> glossary
 
> resource hotlinks

interviews

> cory doctorow

> farai chideya

> bruce sterling

> denise caruso

> craig newmark

> jamais cascio

> laura lemay

> christian crumlish

> jon lebkowsky

google’s new blog search critiqued

09.14.05 @ 01:46:48 pacific

Searching blogs is still a bit of a turkey shoot. General web search engines like Google, Yahoo, and MSN depend on spider bots that crawl websites at intervals too far apart to suit most bloggers. Search engines like Technorati, Feedster, Ice Rocket, and PubSub that are designed to work specifically with blogs and web feeds solve the time lag problem by registering pings from blogs each time one is updated or by indexing RSS and Atom web feeds, but the sheer number of blogs and blog posts make scaling a challenge for these newer search companies. Both types of search engines favor blogs with the greatest number of inbound links, a point of contention among bloggers who address smaller, more niche audiences. More robust blog search capabilities have been on most bloggers’ wish lists for some time now.

The release today of Google Blog Search (GBS) raises the hopes of many that they will be better able to find blog entries of interest and that others may discover their own blogs more easily. Initial response to the new service is mixed. The new beta service appears to gather its data by indexing RSS or Atom feeds published by many bloggers. How well this solves the latency issue with traditional Google web searches is yet to be seen, but a good number of bloggers are already testing and critiquing the service.

For a bit of context, the Wall Street Journal Online’s Vauhini Vara compiles a useful list of the major blog-specific search engines calling out their pluses and minuses.

Today, SearchEngineWatch’s Gary Price lists Google Blog Search features that he would like to see in the future, including the ability to screen out blogs that merely scrape headlines, to cluster “related blog” posts, and to search by location for entries that contain a Geotag. Price also requests a better understanding about how Google Blog search differs from Google News searches.

The Blog Herald posts initial thoughts about Google’s new search. The Herald particularly liked the ability to place any search term into a web feed and found the split results between “related blogs” and a general index to be useful, although they echo the complaint of many that the latter seems to merely mimic Google News results. The overall size of the index is the major problem (under nine million blogs searched as compared to Technorati’s more than 17 million, for instance), but this may improve over time.

Microsoft’s Robert Scoble reports that the search speed is excellent and the results contain less spam and fewer duplicates than other search results. Technorati still has more up-to-date results, but at this point, Scoble is inclined to favor Google over the other engines.

We’ll have to watch and see how Google measures up and which of the other search giants decide to enter the ring. It was heartening that the new service indexed this blog, which is still relatively small and recent. Stay tuned and we’ll report back on how well Google’s new search suits the overall needs of an ever-growing blog population.




trackback

The trackback address for this entry is:
http://www.dispatchesfromblogistan.com/googles-new-blog-search-critiqued/trackback/

// Begin Comments & Trackbacks ?> // Begin Trackbacks ?> if ($comment->comment_type == "trackback" || $comment->comment_type == "pingback" || ereg("", $comment->comment_content) || ereg("", $comment->comment_content)) { ?> if (!$runonce) { $runonce = true; ?>

Listed below are links to other blogs that have commented on this entry via trackback.

    } ?>
  1. 10.6.05 @ 08:56:08 pacific

    Scoble and I clearly tested with different sets of terms. I found the index to be full of spam blogs of the “hyphen-happy-keyword-stuffer.com” variety. For example: the first hit for “food” and the second hits for “vacation” and “cars” are alll spam blogs. Not so on, e.g. Technorati.

    It seems cleaner today than it was on launch, but I’m not sure if that’s my fickle memory or because they’re really cleaning it.

  2. } ?> if ($comment->comment_type == "trackback" || $comment->comment_type == "pingback" || ereg("", $comment->comment_content) || ereg("", $comment->comment_content)) { ?> if (!$runonce) { $runonce = true; ?>

    Listed below are links to other blogs that have commented on this entry via trackback.

      } ?>
    1. 10.6.05 @ 10:09:03 pacific

      Thanks for the report, Paul.

      One thing I discovered is that Google added a handful of useful BlogSearch operators. For instance, if you want to search for blog entries about quarks written by anyone named Sparky, you could click to the Advanced Search options and fill in the appropriate fields, or simply type [quarks inpostauthor: Sparky] into the search field. Other blessed operators include:

      * link:
      * site:
      * intitle
      * inblogtitle:
      * inposttitle:
      * blogurl:

      More tips on using Google’s BlogSearch can be found here.

      The fact that Google’s BlogSearch, like PubSub, Feedster and a number of other engines, all gather their data from feeds rather than the blogs themselves, reminds us that it pays to publish entries in their entirety rather than just as headlines or excerpts.

    2. } ?> if ($runonce) { ?>

    } ?> // End Trackbacks ?> // Begin Comments ?>

    comments

      if ($comment->comment_type != "trackback" && $comment->comment_type != "pingback" && !ereg("", $comment->comment_content) && !ereg("", $comment->comment_content)) { ?>
    1. Paul says:

      Scoble and I clearly tested with different sets of terms. I found the index to be full of spam blogs of the “hyphen-happy-keyword-stuffer.com” variety. For example: the first hit for “food” and the second hits for “vacation” and “cars” are alll spam blogs. Not so on, e.g. Technorati.

      It seems cleaner today than it was on launch, but I’m not sure if that’s my fickle memory or because they’re really cleaning it.

    2. } ?> if ($comment->comment_type != "trackback" && $comment->comment_type != "pingback" && !ereg("", $comment->comment_content) && !ereg("", $comment->comment_content)) { ?>
    3. suzanne says:

      Thanks for the report, Paul.

      One thing I discovered is that Google added a handful of useful BlogSearch operators. For instance, if you want to search for blog entries about quarks written by anyone named Sparky, you could click to the Advanced Search options and fill in the appropriate fields, or simply type [quarks inpostauthor: Sparky] into the search field. Other blessed operators include:

      * link:
      * site:
      * intitle
      * inblogtitle:
      * inposttitle:
      * blogurl:

      More tips on using Google’s BlogSearch can be found here.

      The fact that Google’s BlogSearch, like PubSub, Feedster and a number of other engines, all gather their data from feeds rather than the blogs themselves, reminds us that it pays to publish entries in their entirety rather than just as headlines or excerpts.

    4. } ?>
    // End Comments ?>

    leave a reply

Thanks for responding. To protect against spam and malicious postings under false names, I request an email address as identification. I will never post your email address publically or use it for any purpose without your express permission. If you'd like to include a URL with your response, it will appear with your comment.