Temporary Information Trapping: The Hopscotch Case

This weekend the first annual Hopscotch Music Festival is taking place in Raleigh. Over a hundred bands, parties, and general downtown rocking out for four days.

I work for one of the sponsors so I wanted to keep track of the festival, pictures of the events, news coverage, and so on. To do so I set up several information traps that I’ll use just through this weekend and then expire. It was an interesting exercise so I wanted to share with you what I did.


There’s already been plenty of news coverage so I knew there would more during the festival. Hopscotch is an unusual enough word that I was able to use the query hopscotch location:nc at Google News and get almost all relevant results. (Remember, the location: syntax restricts results to media within a specific state. I might miss a few items, but on the other hand I’ll get really targeted results.) I picked up the RSS feed and put it in my reader.


Finding relevant blog posts was a bit tougher. I tried Technorati but a search for hopscotch got only six results total and most of them were not relevant. Searching Google Blogs for hopscotch found a lot more content and a little spam; still, the results were clean enough that the result feed went in my reader.

I couldn’t find a search interface for Bloglines, and Blogdigger had no results at all. Icerocket had plenty of results but there was a serious relevance problem. Narrowing down my search to “Hopscotch Music (I didn’t want to add another word as I wasn’t sure if people would refer to it as “Fest” or “Festival”) brought me a good set of results, and I added that RSS feed to my reader.


Hopscotch has tags devoted to it and of course you can do some geographic searching with Twitter. But despite the fact that I can do some narrowing of my Twitter search, I did not want to trap for just the hashtag #hopscotch. If I did that, I would be flooded with a lot of less-useful content, like tweets for people who arrive at venues, or leave, or so forth. So I decided to trap for multimedia content.

Searching for #hopscotch yfrog and #hopscotch plixi and #hopscotch twitpic will clue me in to pictures taken at the festival and quickly put online. Those queries went into my RSS feed reader after I saw they were already producing good results even though the festival started just this afternoon. (I’m writing this Thursday night.) I’m testing another query, hopscotch http -yfrog -twitpic, to pick up all the tweets with links that aren’t pictures, but at the moment those are mostly Foursquare checkins.

(These traps are only going to be active for three days, so I probably won’t abandon any of the traps before the end of the festival. But if I were building these traps to keep them for a long period of time, I’d pay careful attention to what my RSS feeds were producing and quickly dump any that were providing spammy or useless results. I only have so much time to review what I’m picking up.)


Looking at the pictures on Twitter reminded me that Flickr might be getting images of Hopscotch as well. A test revealed that hopscotch was currently working okay as a search term, with lots of band pictures and only one irrelevant result. So that went in the feeder too. (For information on how to make keyword-based RSS feeds for Flickr, check out my article.)

That was Quick

Setting up this set of traps only took about twenty minutes. I skipped a lot — didn’t get into discussion forums, for example, didn’t try to trap Facebook, and didn’t expand my news story search beyond what Google offers. But I feel this’ll give me a good overview of what’s going on a feedback from a wide variety of attendees. I’ll try to do a followup article next week about what I found and what I’d do differently next time.

TwitterQ: E-Mail Alerts for Twitter Keyword Matches?

I got a question from @doctorwallin on Twitter. She asks:

“Thx for all your info. Do you know if there’s a way to get keyword news alerts via twitter, similar to google alerts by email?”


There’s TweetAlarm, which asks for a name, password, e-mail, and then the keywords you want to monitor. This one bothers me a bit though because I can’t find a privacy policy. (When someone wants your e-mail address, don’t you want a privacy policy?)

Then there’s the much slicker Twilert, which lets you sign in using your Google or Twitter account, provides an advanced search option, and keeps count of how many alerts it’s sent you. And it also doesn’t have a privacy policy that I can find. Wha?

PEK Interactive notes another option in this article, while Mashable notes a slightly expanded way to monitor various types of replies and mentions on Twitter.

I’m not just old, I am old school. So looking at this question for a while and reviewing the options, I would probably go with “None of the Above” (though I might use Twilert if it had a privacy policy.) Instead I’d take advantage of Twitter’s own search tools and another service.

You can do a Twitter search at http://search.twitter.com/advanced (that’s the advanced page that gives you all the search options.) Once you’ve run a search, you’ll notice on the results page that you can get an RSS feed of your search results.

I would do all my searches using Twitter, and then use an RSS to e-mail service to get the actual alerts. I like the offerings at http://www.feedmyinbox.com/ — it has a free option that’s somewhat limited, but premium services all the way up to e-mailing an unlimited number of RSS feeds per month. (And it has a privacy policy!)

In January 2009 I wrote an article on effective information trapping with Twitter. Maybe it’ll be useful.

Thanks for the question! I hope this helps.

I have no idea if I’m going to do this regularly but it was easier to answer the question in an article than a 140-character tweet.

Google Alerts Slowing Down? Stay Informed With These Seven Information Trapping Tips

I read a recent blog post at Search Engine Roundtable noting that Google Alerts had had its algorithm tweaked and because of that fewer alerts had been going out. (There was also a pointer explaining what to do to loosen things a bit so more alerts go out.) I had noticed that I wasn’t getting as many alerts as I had been, but I was comfortable that I wasn’t missing too many stories.

As I thought about it though I realized that if you are using Google Alerts — and only Google Alerts — for keeping track of new stories and new resources, this change to the algorithm might have thrown for your a loop. So to make sure that doesn’t happen again, and potentially to give you some new ideas, here are six tips for making the most of alerts for information trapping purposes.

1. Allow for Overlap — When I wrote Information Trapping a few years ago, there was plenty on Google Alerts and Web search, but nothing on Facebook or Twitter. One thing I did mention, though, holds true for both of them: build in some overlap. If you’re using a set of tools that provide alerts for the same kind of resource — like, say, Google Alerts, Tracerlock, and Yahoo Alerts, you might be tempted to create unique, non-overlapping queries for each one.

Don’t do it. Instead, duplicate your queries with the idea that you’re going to have a 10-20% overlap in the results you get back, and that you will be doing some duplicate reading. That way if you do lose a resource, or it gets tweaked, you’re not going to miss much. Evaluating your volume of overlap might tip you to when a story or resource is getting an extra-large amount of play.

2. The Right Tool for the Right Job — Google Alerts provides alerts by e-mail, but you don’t have to get all your alerts by e-mail. In fact, you may feel a little overwhelmed if you do. Instead, you might want to take the most critical alerts you’re following — the ones where you want to know immediately — and get them by e-mail or even text message. Items of secondary importance, e-mail or RSS feed. And items of lesser importance, perhaps just RSS feed. I monitor at least a hundred queries via Google Alerts — but I also have several hundred RSS feeds in my feed reader. The tools are complementary. (I have never understood the “e-mail alerts vs. RSS” controversy. Isn’t “both” a legitimate answer?)

3. Constantly Reevaluate — It’s easy to imagine that Google has always been the biggest search engine, that RSS feeds have always been around, etc. But that’s wrong. AltaVista was the dominant search engine for a long time, and RSS didn’t become popular until well after 2000 (I’m thinking maybe 2003 or 2004.) Stay aware of the resources you’re using. You may find that over time the alerts you’re getting are becoming less useful, or that the resource is going off in a particular direction, or even that it goes defunct. There’s always going to be a certain amount of churn in Web sites and alert services; be prepared for it and don’t be afraid to switch, or at least to try new resources. (Otherwise you might find yourself still using Feedster and DayPop!)

4. Expand Your Horizons — I am a confirmed text-crawler, but now more than ever the Web is about multimedia. So don’t confine yourself to Google Alerts, Twitter, or other text-based alert systems when you want to keep up! You can get alerts or keyword-based RSS feeds from YouTube, Flickr, Slideshare, and other non-text Web sites. I have an RSS feed for Flickr photos tagged flowchart and it gets me some crazy stuff, but not in overwhelming amounts. If I tried to monitor Google Alerts for the word flowchart I’d be buried in results.

5. Remember there are People on the Other Side of the Screen — You’re not going to get alerted to everything in your sphere of interest. You’re just not; you can’t keep up. Don’t feel bad; this has been true since about 1994. But you might get a little more information if people know what you’re interested in. I have plenty of people who send me terrific sites that I had not heard of, and I try to return the favor when I know what someone’s interested in. So if you find a great resource that you think someone you know would like, pass it on. And maybe you’ll get that back in good karma when someone sends you a link to a great new search engine, database, or whatever you’re interested in.

6. Don’t Be Afraid to Get General — If you have a very specific interest, say forensic accounting, it may be that you can articulate all the topics you want to follow via several well-crafted queries. And if you can that’s terrific and you’re doing a lot better than me. But it may be that your interests are pretty far ranging and you can’t put them all in a query. In that case, don’t try. Instead, monitor resources that are focused on your topic but which aren’t narrowed as far as specific keywords. Twitter lists (which you can find by the thousands at Listorious) are one example. Another would be Facebook “Like” pages — did you know they have RSS feeds? So do Facebook groups.

7. Enough is Enough — It would be easy to follow all these tips and create for yourself a huge firehose of information useful, relevant, and interesting to you. There’s only one problem: IT’S STILL A FIREHOSE. Having all that information flowing to you does you no good if you can’t ingest and use it. So don’t feel compelled to create alerts for every last site out there. Instead, focus on generating keyword-based RSS feeds and e-mail alerts that are as specific as possible, and if you feel yourself unable to keep up with alerts, cut back. It’s better to have 100 alerts and be able to fully read and use them, than to have 1000 that you barely look at because you’re constantly overwhelmed.

Happy trapping!

How To Scan Thousands of Tweets Without Tears

Over the weekend I wrote a post about Twitter and cruft. While I love Twitter, I hated the cruft that I had to wade through when trying to follow Twitter lists for information.

After I wrote the post I found a JavaScript solution and included that in my writeup. I also heard from Hanan Cohen, who’s put together his own PHP solution. But the problem was really bothering me, so I spent this past weekend trying to figure out a way to get a single overview of the several Twitter lists I follow, with as much cruft removed as possible, so that I can easily scan through it and find the good stuff. And I think I’ve made a good start on a solution. Here’s what I did:

1) First, I ListiMonkey’d. ListiMonkey is a service that will e-mail you the contents of any Twitter list you specify. You can set up how often you want to get a list of tweets, and you can specify how often you want to receive them. You’ll receive up to 100 tweets per e-mail. You can do some preliminary filtering through ListiMonkey, though I found there was a limit to how many terms I could filter. Every e-mail I got (maybe 300-400 a day) went into a text file.

2) Next, I TextPipe’d. I took my one day’s worth of tweets from Twitter lists (a big text file) and fed it to a software program called TextPipe, which describes itself as an “industrial strength text transformation, conversion, cleansing and extraction workbench.” Using TextPipe I stripped out all the HTML, removed all duplicate lines (every tweet is on its own line), removed all lines that had cruft I didn’t want (filtering out two or three dozen keywords) and then output it to a nice, clean, much smaller text file.

3) Then, I TEA’d. Using the TEA Text Editor, I scanned through the list of remaining Tweets, deleting the tweet-lines I didn’t want to review further. After I was done with that I used TEA’s HTML tools to convert the list of leftover, “to be looked at further” tweets into an HTML document.

4) At this Point, I Converted. TEA can turn the list into an HTML file, but the problem remains that the links are unclickable. So my last step was to go to David Weinberger’s Convert URL’s to Hyperlinks utility and turn my basic HTML file into a basic HTML file with clickable URLs.

5) Finally, I Firefox’d. I opened this HTML file in Firefox and quickly opened and scanned through the tweets I had put aside for further review.

Going through these steps is going to let me review a lot of content from a lot of lists and save me a tremendous amount of time.

A few additional thoughts:

a) I can probably do this in Perl. I know, but I can experiment with and implement filters in TextPipe way faster than I can do it in Perl.

b) It’s not perfect. TextPipe doesn’t truly remove all the duplicates, as the same tweet can be posted three times with three different bit.ly URLs. To eliminate those I’ll have to do some spadework with regular expressions.

c) TextPipe is expensive. TextPipe Standard is $199. For the amount of time this will save me in trying to keep up with all the tweetstreams that capture my interest, it’ll pay for itself.

d) This solution will tempt me to subscribe to even more Twitter lists. THIS is the problem I’m going to have to watch out for….

Google Reader’s New Page-Monitoring Feature — How’s it Working?

A couple weeks ago I covered Google’s new feature that allows you to monitor pages even when they don’t have RSS feeds. A few days ago reader LP e-mailed me and asked about the new feature, “Did it work?” And I realized I had completely forgotten to write a follow-up post. So yeah, about Google Reader’s new page-monitoring feature….

The first great thing about this feature is that it taught me how many Web pages do in fact have RSS feeds. I went to several places meaning to monitor the page for pages, only to discover that RSS feeds were available now. Yay!

I did find some places that did not have RSS feeds, though; the best example is probably the Twitter lists that use Tweets from ResearchBuzz. The URL for the list is http://twitter.com/ResearchBuzz/lists/memberships but I didn’t know of any way to track when new lists were added to this page. So that was my test case for Google Reader.

Every change to the page is a new entry in Google Reader. The screenshot above shows an example of an entry. There’s no context on the page, and if I wasn’t familiar with the page content to start with, the entry wouldn’t be useful (in other words, I wouldn’t share it.)

I also tried the Google Reader with http://www.ted.com/pages/view?id=348, which is a list of upcoming TEDx events all over the world. Again, I didn’t get any context, just the line that changed.

One Google Reader update monitor I did failed. I was trying to monitor a particular business in Google Maps because I wanted to see what kind of reviews they got. I think this might be my fault, however. I looked up the business in Google, and then used the extremely-long-and-awkward URL supplied by Google as my monitoring URL. Google never got an update for that page, and complained that the page didn’t exist. I’m going to try it again using the link supplied by Google on the business’ page.

For me, the gold standard for page monitoring remains WebSite-Watcher, a client-side application available at http://www.aignes.com/. However it is for Windows only. Until it’s available for my operating system, I think I’ll keep using this new feature of Google Reader.