Using Google’s New Features to Get Twitter Information

I like Twitter a lot. I use it every day. Thanks to Twitterdoodle, I am even using Twitter to restart doing the “LittleBuzz” feature that I used to do on ResearchBuzz. But one thing about Twitter really drives me wild, and that, as you might guess, is the search engine.

There are so many brilliant things about Twitter’s advanced search. You can search near places. You can filter your searches by whether a tweet has links. You can filter by a particular date (though with the speed at which Twitter adds material, I seriously wish you could search by a decimal Julian date like 2454966.30038. Wow, that’s off-the-hook nerdy, isn’t it?)

But what you CAN’T do is search by what twittering folks have in their bios, or what they’ve listed as their Web site. I can do a search for twitters within 50 miles of New York City, sure, but I can’t search for people who have “gov” in their Web page address.

At least I can’t on TWITTER. However with a little shaking and baking I can run that search, and many other interesting Twitter searches, on GOOGLE. Furthermore I can use Google’s new “sort by date” option for Web results and get the newest results at the top. Let the goofing around commence!

(Before we get started, a disclaimer: I always assume that an external search engine will never index as much as an internal search engine. Therefore I assume that these numbers are inaccurate, probably incomplete. Furthermore, they shifted as I tested the searches. Don’t take them as gospel, just as Google’s count. Thank you.)

In order to focus Google just on Twitter we’ll have to use Google’s special syntax. Special syntax are just a way to focus Google’s searching attention on a particular site or part of a Web page. The way I use them here should be fairly self-explanatory but if you need additional guidance, check out this page. Okay, let’s start with the basics. To you want to get an idea of how many Twitter profiles have been indexed by Google? Start with this search:

intitle:”on Twitter” site:twitter.com

The search for “On Twitter” in your page result titles just means you’re looking for home pages, not status pages or things like that. Looking for this search in “recent” results gives you 15.3 million results, though a search for results of the past 24 hours “only” finds you 2.4 million results. Yikes.

Well, what about individual Tweets? Pie-simple. Just search for this:

(inurl:status | inurl:statuses) site:twitter.com

With this query you’re telling Google to search the Twitter site for URLs that have “status” or “statuses” in them. (I thought all Tweets were stored under a “status” directory but I saw several with “statuses” instead, so I’m searching for either in the query above. That’s what the pipe ( | ) symbol means.) The count this time? The recent results number is 53.2 million. (Boggle.) The 24-hour results number is 10.1 million.

Thus far I’ve shown you queries without specific keywords, though you could add keywords to those queries if you wanted to (How many Twitter people are named Fred, etc.) But this query works best with a keyword:

(inurl:favorites | inurl:favourites) site:twitter.com keyword

This query searches just those Twitter tweets that have been marked as “favorites” by users — in this case the query is searching for the keyword “keyword”. However you can change that to any word or phrase. Try a username:

(inurl:favorites | inurl:favourites) site:twitter.com mattcutts

Will find Twitterers who have favorited Tweets by or about the famous Googler Matt Cutts.

At the beginning of this article I mentioned that I didn’t like Twitter’s inability to let me search by information in a twitterer’s Web address. You can add that Twitter search to Google by using Google’s wildcard. Google has a wildcard — * — that used to stand for one word. That is, if you used * in a phrase, it would find the phrase with any one word substituted for the wildcard. (Searching for “I am * man” would find “I am Iron Man”, “I am modern man”, “I am old man”, etc.) Now the * is more flexible — it just stands for some space. If you use with Google now, you’re kind of doing a proximity search. (A search for “I am * man” now will find “I am Iron Man” but also “I am an old man” or “I am the real Spider-Man”.)

You can use this wildcard in conjunction with the “Web:” part of Twitter’s profile like this:

intitle:”on Twitter” site:twitter.com “Web * gov * bio”

You’re using the Web: part of the profile and the Bio: part of the profile to set boundaries the area where you want to search. Then you’re adding a couple of wildcards to account for any other words that might be there, then you’re searching for the string “gov” in the Web address, so ideally you’re finding government employees with Twitter accounts. (There were 252 results when I ran this search, which seemed low.) You can also do entire domain names. Let’s look for CNN people:

intitle:”on Twitter” site:twitter.com “Web * cnn.com * bio”

You might have to do some filtering of search results like this if a company domain name is used for multiple things. For example, if you wanted to search for Yahoo people you might also find people who had http://my.yahoo.com as their Web URL. So don’t assume every result you get will automatically be affiliated with the domain name for which you searched.

Google’s new features allow you to slice results into meaningful, timely chunks. Twitter is generating enough data that even searching for narrowly defined time periods generates a LOT of current information. It’s a match made in Heaven. Stay tuned as I make some more attempts to put these two resources together.