A couple weeks ago I covered Google’s new feature that allows you to monitor pages even when they don’t have RSS feeds. A few days ago reader LP e-mailed me and asked about the new feature, “Did it work?” And I realized I had completely forgotten to write a follow-up post. So yeah, about Google Reader’s new page-monitoring feature….
The first great thing about this feature is that it taught me how many Web pages do in fact have RSS feeds. I went to several places meaning to monitor the page for pages, only to discover that RSS feeds were available now. Yay!
I did find some places that did not have RSS feeds, though; the best example is probably the Twitter lists that use Tweets from ResearchBuzz. The URL for the list is http://twitter.com/ResearchBuzz/lists/memberships but I didn’t know of any way to track when new lists were added to this page. So that was my test case for Google Reader.
Every change to the page is a new entry in Google Reader. The screenshot above shows an example of an entry. There’s no context on the page, and if I wasn’t familiar with the page content to start with, the entry wouldn’t be useful (in other words, I wouldn’t share it.)
I also tried the Google Reader with http://www.ted.com/pages/view?id=348, which is a list of upcoming TEDx events all over the world. Again, I didn’t get any context, just the line that changed.
One Google Reader update monitor I did failed. I was trying to monitor a particular business in Google Maps because I wanted to see what kind of reviews they got. I think this might be my fault, however. I looked up the business in Google, and then used the extremely-long-and-awkward URL supplied by Google as my monitoring URL. Google never got an update for that page, and complained that the page didn’t exist. I’m going to try it again using the link supplied by Google on the business’ page.
For me, the gold standard for page monitoring remains WebSite-Watcher, a client-side application available at http://www.aignes.com/. However it is for Windows only. Until it’s available for my operating system, I think I’ll keep using this new feature of Google Reader.
It’s not as nifty as a cell phone, or as amazing as street views of businesses all over the world, but to me it is big news — really big news. Google announced yesterday
that Google Reader can now be used to monitor pages for Web changes — whether they have RSS feeds or not.
Ten years after I started using RSS, it’s pretty prevalent but not universal. Google’s announcement means it’s going to be a lot easier to follow those random pages that don’t have RSS feeds for update information.
Are you already using the Google Reader for RSS feeds? Adding non-RSS content is easy. Just click on the “Add a Subscription” button and you’ll get a form into which you can paste an RSS feed URL or a regular HTML page URL. Google will ask you to confirm that you would like to create a feed to monitor based on that page.
Now, HTML pages are not RSS feeds. Their information is harder to isolate and delineate. So while the idea is that Google is going to “provide short snippets of page changes,” it’s not clear what those snippets are going to look like. Is going going to get hung up on a date change or counter change? (This has been a problem in the past with software like WebSite Watcher.) Are the snippets going to be meaningful?
I’ve added some pages to Google Reader and will revisit them in a week or so to see what kind of snippets I’m getting as results.
I have actually been using ListiMonkey for a few weeks, ever since I heard about it from Steve Rubel. First I loved it, then I hated it. After several e-mail conversations with the developers and some tweaks they’ve made to the tool I love it again. If you’re at all interested in trapping information via Twitter, I think you’ll love it too.
Have you ever tried to monitor Twitter via its search-results-as-RSS-feeds? It’s tough. For the kinds of keywords I’ve tried to use, I got a lot of spam. It got so I couldn’t use the feeds; they were too spammy.
Enter ListiMonkey at http://listimonkey.com/. ListiMonkey allows you to specify a Twitter list, enter the keywords for which you want to monitor that list, and then specify an e-mail address to which you want to get the results, and how often you want to get the results (hourly or daily). (It’s possible to follow a list and get the all tweets generated by not specifying any keywords, but I don’t recommend that — you’ll get lots of e-mail with lots of tweets unless you choose your lists very very carefully.) That’s it. There’s no registration involved. You WILL have to confirm your e-mail for each alert, of course.
Now, if you monitor a Twitter list, you’re obviously not getting as much as you’d get if you were monitoring the entire Twitter stream. On the other hand, if someone gets added to a Twitter list it’s because someone ELSE thinks they post stuff that’s worth reading. And you’ll cut down the spam level to almost nothing. You’re getting useful results, in other words.
ListiMonkey does have about 250 Twitter lists available, but I think you’ll have more luck finding lists using the TweetDeck Directory at http://tweetdeck.com/#directory. Once you’ve found a list you want to follow, the obvious next question is what kind of keywords do you want to monitor?
This is what was tough for me in figuring out how to use ListiMonkey, and it’s one thing that’s changed a lot thanks to the developers. I found a couple of lists where I just wanted to find out what kind of links people were putting out there. I didn’t necessarily want tweets without links. So my first keyword monitor on ListiMonkey was just http.
Naturally this found all tweets that had a URL in them, and none without. But it also found retweets, checkins using FourSquare/Gowalla, pictures people were posting, etc. I didn’t want any of that. (And making sure I didn’t get that was important, for two reasons: one I didn’t want to get drowned in e-mail alerts and two, ListiMonkey limits its monitors to 100/tweets per mail. If I didn’t filter as closely as I could I would miss stuff.)
Initially ListiMonkey did not allow me to do complex queries like that, where I specified one keyword that I was looking for and a bunch of keywords that I weren’t. But that has been added in. So I did a lot of experiments where I looked for links to resources but not to anything extraneous, and ended up with a ListiMonkey query with several keywords:
http -4sq -gowal -rt -twitpic
That gets me e-mails from ListiMonkey that are full of resource-y link goodness.
When you get an e-mail from ListiMonkey, it’ll look like this:
You’ll get the tweet, of course, with the author and avatar, timestamp, and option to retweet or reply to the tweet (of course you’ll have to be logged in to your Twitter account to do that.) The e-mail also has links to edit your alert or delete your alert if it’s not working out for you.
One thing you should know: ListiMonkey is tracking the clicks on the links in its e-mail. You might think you’re clicking on a bit.ly link when actually you’re clicking on http://listimonkey.com/link/track?alert_id=1378&url=http://bit.ly/5gGhqm . Just a heads-up if you’re concerned about link tracking (I’m not.) If it really bothers you, you can always highlight the link in the tweet and then copy/paste it to your browser.
You can learn more about ListiMonkey via its FAQ. ListiMonkey was a small shop project, and while there’s no charge for the service the developer is accepting donations. I think they’ve put together a great tool here; if you agree with me how about slipping them a few bucks via the Donate tool on the FAQ page?
Do you need to do some serious information trapping on Twitter? Got some keywords you want to monitor and you don’t want to miss a thing? Check out this nifty application I heard about from Ed — RowFeeder. (Thanks Ed!) RowFeeder’s not cheap, but if you want to quickly gather materials from a Twitter flow and get them in a format that you can easily manipulate, it looks like a heck of a tool.
RowFeeder’s at http://rowfeeder.com/. Here’s how it works: you specify a term or hashtag you want to track. Then you pay — well, you’re supposed to pay but the pay button didn’t work when I tried it; instead I got an e-mail address to contact for making payments. This is where the ain’t cheap part comes in — it’s $2.49 to monitor a term/tag for up to 48 hours. (This would very quickly make me very broke.)
RowFeeder monitors the tweetstream and fetches tweets that match your tags, using them to populate a Google Spreadsheet like the one you see in the screenshot. Information is broken out into columns including username, Tweet, homepage, location, and date.
This is too expensive for me to use on a regular basis but I can easily see how a PR firm or company who wants to track comments about a release could get a lot of use out of RowFeeder. I’d have a little concern about monitoring the entire tweetstream — there’s a lot of spam out there. It’d be nice to monitor just specified lists.
This is a little far afield of search engines but humor me for a minute. Cornell University had an interesting story about the life cycle of a news story on blogs and on “traditional” media.
Three researchers tracked 1.6 million online news sites, both traditional media and blogs, over a three-month period leading up to the last presidential election. 90 million articles ended up in the analysis. According to their research, traditional media has a pattern of stories rising to prominence slowly than dying quickly, while in blogs stories would become popular quickly and then hang around longer. Of course in both cases stories eventually “cycle out” and new news comes in.
I care about this because it gives me information that might allow me to refine my strategy as an information trapper. There are of course ongoing topics that I pay attention to all the time. On the other hand there are topics that are current-event-based or based on research that I’m doing that I only want to follow temporarily.
Knowing that mainstream news sources ramp up stories slowly and then drop them off over a few days might make me decide that I only want to run temporary searches for a week or so before discarding the search. Or I might decide that I’ll only give a temporary search a week before I reevaluate my search results and decide to use a different set of keywords, or a different focus. On the other hand, I may decide that the life cycle of a story in a blog might mean that I should extend my search for far longer than I would otherwise.
You can get additional information on the research at http://memetracker.org/supp/. I don’t do a lot of temporary trapping, but I do some, and I’m looking at this research as a first step to come up with some standard approaches. At the very least it’s given me something to think about.
Many many moons ago, I set up a news alert on Google News to send me a ping whenever a news story mentioned ResearchBuzz. I always liked to see where news and mentions ended up. As you might imagine, that Google Alert went very silent last year when ResearchBuzz lapsed into a semi-coma.
So imagine my surprise when I started getting a fair number of Google Alerts within the last few weeks for ResearchBuzz. I’ve been posting more, but not enough that would warrant an avalanche of news mentions. And the summaries were mostly business-related stories. Where were all these new mentions coming from?
I finally went over to Google News (http://news.google.com) and ran a few searches. It didn’t take long to figure out that Google News was being a little more flexible with my searches than I intended. Fortunately there’s an easy remedy.
At the moment (a while after I first noticed this problem) there are three results for the query ResearchBuzz on Google News. Only one of them — a mention by the Search Engine Journal — is about this site. (Thanks, Search Engine Journal!) The other two are about business. Looking at them both I see they both mention publicly-traded companies with a set of research links that includes this: Research, Stock Buzz.
So Google News is taking my ResearchBuzz query and turning into something like “Research * Buzz”. And while I appreciate some flexibility in my searches — especially as the data pool for news is much smaller than that for the Web — it doesn’t find me information I need.
I took a look at the Google News preferences, but while you can change whether or not you can get suggestions, you can’t really change whether or not Google alters your search term. You CAN make sure that Google News searches for your query exactly as you enter it by using a + in front of your search term. Enclosing it in quotes will also do the trick.
This particular Google search quirk isn’t NEARLY as irritating as searching for words, looking in a page cache, and then discovering they’re not there. But it is leading me to think about “best practices” for setting information traps with Google Alerts. Quotes all the time? Plus marks? More test searches so I can make sure ahead of time that I’m getting only the results I want? I haven’t decided yet.
In an effort to get my information traps back up to snuff, I’m spending some more time messing around with Google News to get the best searches possible.
I keep information traps for both ResearchBuzz material and for my Day Job. For my Day Job, I like to keep track of how universities are using both textbooks and ebooks. While I do monitor the local universities’ newspapers directly, I want to be able to track trends on this usage across the country. This is where Google News comes in really handy.
As you probably know, Google Web search uses the “site:” syntax to restrict your searches to results that come from either one domain (example.com) or a set of “top level” domains (.com). The same syntax works for Google News. So if I want to restrict my news search results to just those that come from university sources, I can add site:edu to my search.
I want to monitor for news about electronic textbooks. My information trap would simply be
Of course, I might want to add variants, but that would be my start.
Does site: work with other domains? Sure; for government information you could use site:gov as a search modifier. Want military news? Try site:mil. You can also try those more unusual top-level domains like .biz, .info, and .tv, though it would be harder to narrow those down to one topic or one type of news.
Why not try country codes? I can hear the cool kids in the back saying if those unusual top-level domains should work, then why not use the country codes like .uk, .au, and .ca?
I don’t use country codes when I site search on Google News for two reasons. The first reason is that not all of a country’s media is on a domain that has a country code. For example, I really like The Irish Times. What’s the URL? http://www.irishtimes.com/. Not an .ie to be found. If I did a search for Irish news using site:ie then I would miss The Irish Times. And that would be bad.
The second reason I don’t use country codes with the “site:” syntax is that Google already has a syntax to let you search news by location. That syntax, surprisingly enough, is called “location:”!
“location:”, which is a syntax specific to Google News, allows you to restrict your search results to a particular country (or a particular state — more about that in a minute.) Use the name of the country with the syntax along with any keywords you want to use to restrict your search. For example:
That will find you stories containing the word “today” from the country of Malta. (And though Malta is a small country there are over 500 stories with the keyword “today”.)
I can’t guarantee that all countries will be represented in a Google News search because I haven’t tried all of them. But I found results from countries as small as Monaco:
So if you’ve got a trap and you want to aim it at a certain country, give that location syntax a try.
And once you’ve tried location: for countries, give it a whirl for US states! Yup, you can use location: with the two-letter USPS postal abbreviation for an American state, and you’ll get news from sources originating in that state. Let’s try this for Rhode Island:
As you can see, that search gets you news from sources like Pawtucket Times, Providence Journal, and Woonsocket Call. You can use this syntax with the 50 states, with the District of Columbia (location:dc) and even with at least one US-affiliated place, Guam (location:gu). (Strangely, Google News did not recognize either Puerto Rico or the US Virgin Islands as valid search locations.)
Knowing that we can search individual state news with the location: syntax, let’s go back to the site: syntax. Can you combine site: and location: syntax? Absolutely! Say I want to find college and university news in North Carolina. Nothing easier:
You don’t even have to use any search terms with that, though you will get a LOT of results. If I wanted to restrict my search for textbook or ebook information to one state, this would be the way to do it. Or if I wanted to focus my traps on one topic for one state, I could do it this way too:
basketball site:edu location:ca