Category Archives: News
Quit Trying to Be the Next Google Dammit, Pt. 2: The Goal Should Be An Internet That Makes Us Better Humans
We have a houseguest, my husband and I. She is staying with us while she receives medical treatment, and will be here for a while.
I am on all my manners. I have almost stopped singing out loud to myself, and talking twee to the cat, and blurting out observations which make sense to me but no one else. I am cooking dinner and keeping the kitchen clean and checking twice a day to make sure there are plenty of clean towels in the linen closet. I do not feel much faith in my powers as a hostess — I am too big and rumpled and introverted and strange and I’m always convinced something will go wrong. I cooked pierogies and the house smelled like fried onions even hours later, and I went in the bathroom and cried because everything the house would smell like fried onions forever and I was the worst person in the world.
Through all this I go back to the Internet over and over again to try to be better. To find good recipes to cook. To do medical research. To figure out how to make our ancient bathroom sparkle. To get rid of the fried onion smell, dammit. To be a more productive person and a more effective hostess for this family member with her blue cane who is so, so patient with me and makes me feel ridiculous for crying over food.
I don’t say to myself that I am using Google because it indexes so many Web pages so quickly and thus and such. I don’t say to myself that I’m searching PubMed because it has so much information organized in such a way. I say to myself that I want to use THIS resource or THAT resource because it’s helping me in doing a job at which I feel completely rubbish. It’s making me better.
Wouldn’t it be wonderful if instead of headline touting “the next Google” (a phrase which has 2,660,000 matches on Google itself, by the way), stories and Web pages encouraged aspiring companymakers to build the things that make us more capable and stronger? To encourage people to, instead of merely reflecting an existing quo, build tools that will expand horizons and give us new ways of being and lead us to becoming better humans?
… I suppose that now that I have admitted in front of God and everybody to crying over fried onion stink that I should also tell you my secret dream. My secret dream is to have a place to send every bit of information I look at. I read literally hundreds of RSS feeds. I am subscribed to dozens of Google Alerts. And my perfect day would be able to match every bit of information to someone who would be delighted to have it.
That’s my particular itch. To direct information to people who could use it. That’s why I spend so much time reading those feeds and alert services — because there are so many great resources out there, and more coming every day, and y’all don’t know them, and that drives me nuts.
If I were building an Internet company, that would be what I would build. A delivery system to tell you about all the beautiful stuff I find. A system that’s so simple and easy to use that I could spend 99% of my time finding and reporting the beautiful stuff and only 1% of the time doing bullshit, which is anything that’s not finding and reporting beautiful stuff.
Well meaning people would ask me, “Is it going to be like Google? Or Facebook?” And I would say “No no, if either of those worked for me I would be using them now.” And I would make something that worked perfectly for me, no matter how it ended up looking like. And then I would invite other people to play. And if they liked it, away we go! And if they didn’t — well, at least I had solved one of my own problems, yes?
Technology is for the purpose of us. We are not for the purpose of technology. When we aspire to merely imitate an existing structure we are doing ourselves a disservice. Even a better Google is still a Google. But to focus on solving a problem and letting people do better those things that make us so uniquely us — when that is your goal, you have moved outside history and technology becomes merely an element of construction and not a force that bends you.
Earlier this month I read an interesting article in ScienceNOW. It was about how people can recognize how they have changed in the past, but are less good at recognizing how they will change in the future. “Gilbert and colleagues call this effect ‘the end of history illusion,’ because it suggests that people believe, consciously or not, that the present marks the point at which they’ve finally stopped changing.”
I thought this was interesting because it’s a huge blind spot in one’s development as a person and may explain why it’s so hard for people to enact radical change on themselves (and it may also give some hints on how it could become easier to do so.) I also think it may explain how people see current companies and technology.
I thought of this study yesterday when I read an article on Mashable called Free Database of the Entire Web May Spawn the Next Google. It was an overview of a new non-profit that’s making a huge bucket of Web data that people can splash around in. This is great, but not new (ODP data was being used for the same purpose by sites like Oingo, and that was over a dozen years ago) and I found the idea that this might bring about “the next Google” to be as galling as it ever is. Only this time I’m going to write about it because I can’t stand it any longer.
Seventeen years ago this spring I wrote my first book on Internet and search engines. I have been reading and writing about search engines and finding things online ever since. And I would like to bring all this experience to bear and disclose something to you:
One day Google is going to suck.
This is not disrespect. It’s history. The more successful Google gets, the bigger it gets. The bigger it gets, the slower it moves. The slower it moves, the more difficulty it has in responding to rapid changes of technology. The more difficulty.. you get the idea. The very fact of a company’s existence and the requirements heaped on it from all sides — from the government, shareholders, customers, employees — eventually coats it in layers of bullshit that have nothing to do with mission and innovation and everything to do with placating someone or other. The more success, the more of that there is. Bureaucratic barnacles.
Because we are always in the present, we can’t imagine the Internet without our right-now-essential tools. But eventually they will not be essential. Eventually the Internet will change enough that they will take a more minor role, specialize to the point that they appeal to a much smaller audience, or deprecate entirely.
HotBot? AltaVista? The Open Directory Project? All once hailed as great innovations, hugely useful, where-would-we-be-without-them, tools of the Internet. And now they all pretty much suck. (Though some people involved, like Rich Skrenta (ODP) and his search engine blekko, have moved on to greater things.)
I’m not saying that tomorrow Google is going to start sucking, and I’m not saying it sucks now. It doesn’t. I’m saying that it can’t be what it is indefinitely no matter how unstoppable and monolithic it looks now. And I’m saying that if you start off trying to “be the next Google,” you are setting yourself up for failure.
There are so many problems of discovery and usage on the Internet that have nothing to do with what Google does right, right now. Searching for podcasts is a pointless nightmare. It’s still hard to find and use “deep Web” resources like those which are found within library catalogs and online exhibits. Natural language searching has gone from being difficult and odd (but somewhat useful) to, in my experience, misunderstanding what I actually want. Special character searching is still a niche for engines like SymbolHound. Translation tools, while better, are still pretty bad. The only Twitter viewing/monitoring tool I can find that doesn’t make me want to punch a wall in frustration is Undrip.
Here’s my point: now matter how pervasive Google is, no matter how unshakable it looks, there are still issues with the way the Internet and the Web work. There are still structures to be invented and innovations to be made. And that will be true forever.
For your success, scratch what makes you itch. Look at the Web/Internet/whatever, see what pisses you off, and address that. Take Common Crawl’s excellent offerings and makes your job easier. (Now I’m wondering what Wikia is doing with Grub.) What you do may overlap Google’s endeavors or it may not. But it seems to me you will be much more successful with that approach than by trying to replicate the success of what came before.
The University at Buffalo announced earlier this week that it had digitzed the entire run of the Buffalo Jazz Report and made it available in the UB Institutional Repository.
The Buffalo Jazz Report was a freebie newspaper distributed between March 1974 and December 1978. You can browse the entire 58-issue run in all its 1970s glory at http://digital.lib.buffalo.edu/cdm/landingpage/collection/BuffJazz.
You can browse issues or do a search. (You can also browse by author or subject, but there’s only one author and only five subjects.) The search is full-text but it’s pretty basic; a search for Monk found 37 results but the results simply pulled PDFs of full issues and did not direct me to excerpts or articles. Issues appear to be available only as PDFs; download them and read them in your favorite viewer.
The newspapers themselves include obituaries of musicians, occasionally articles on musicians, reviews of recordings, event listings, and relentlessly hip ads which could only be more 1970s if they were actually dipped in fondue. My favorite one was for a haircutter, “Crazy Ron,” who advertised with and without “Nanci.” And don’t forget Eskil’s Clog Shop (“When Your Feet Need a Friend.”)
The newspaper evolves from a fairly brief affair with some drawings early on to a much larger newspaper with lots of articles, photographs, and concert reviews. I can’t find any indication that the last issue was the last issue; it seems to have just … ended.
Even if you don’t have a predilection for jazz you’ll enjoy the energy in the collection — editor and publisher Bill Wahl clearly loved what he was doing. (And he’s apparently still doing it! Check out Jazz-Blues.com for a database of over 8000 reviews of jazz recordings.) I recommending browsing, as the search doesn’t get you very far and there’s not enough detail in the subject trees to try to browse that way.
Three cheers to Anne F, who let me know about the new Chicago Foreign Language Press Survey from the Newberry Library. It’s available at
The Chicago Foreign Language Press Survey was actually published over 70 years ago; the Newberry Library has brought it into the 21st century. Here’s how the site describes it: “The Chicago Foreign Language Press Survey was published in 1942 by the Chicago Public Library Omnibus Project of the Works Projects Administration of Illinois. The purpose of the project was to translate and classify selected news articles that appeared in the foreign language press from 1855 to 1938. The project consists of 120,000 typewritten pages translated from newspapers of 22 different foreign language communities of Chicago.”
There are over 48,000 articles in the collection. They can be searched by keyword, browsed by groups (groups include Albanian, Filipino, Lithuanian, Croatian, and Slovak), browsed by year (1855-1940), browsed by “Codes” (This is a tree of subject headings — a huge tree), or browsed by source (there are over 400, from the 1933 World’s Fair Weekly to Zwei Jahrhunderte Chicago.
The subject matter spans a great deal, but there’s a lot to be found on the topics of immigration laws, assimilation, education, economics, and social mores. I found many interesting articles just searching for the names of figures of the time. A Russian newspaper wrote a very kind eulogy to Will Rogers in 1935, while in a Lithuanian newspaper I found a reference to a letter from Upton Sinclair (though, sadly, not the letter itself.)
I did a search for computer and got 45 results, mostly because the search engine was matching on things like compute. Attempts to alleviate this by searching for “computer” and +computer didn’t work, in fact they made the results a lot worse. So be sure to use very precise, or, ideally, multiple keywords when you search this resource.
That aside, I love the elegance of the results page. A permanent link to the search results is available at the top of the page. After that there are summaries of matching articles along with information about the original language, source, and date. Click on a summary for the full article, and, beneath the full article, images of the cards from which the article came. Clicking on the headline of the article took me to a direct link to the article with a little additional information, including the article and its information in raw XML.
Though the articles were translations, I did not find them awkward or difficult to read. I did find myself at times interested in a particular source, but didn’t find any additional information at Newberry. Going to the LOC’s historical US Newspaper Directory got me more data about titles. One time it didn’t have the title I was looking for (Cesky Odd Fellow), but it did have a similar title (Cesky republikan) which was also in Chicago.
With the wide matching that the keyword search does, you might have to do some experimental searching before you get the best results, but even a casual browse here turned up fascinating historical material.
Just wanted you to know that there’s some offline stuff that has taken my attention and posts might be sparse for a while. It’s not like I don’t love you because I totally do; I’m just a bit distracted at the moment.
On the upside, this means when I DO get time to do ResearchBuzz my filter will be turned off! Yay! That means MOAR USES OF “STUPIDNESS” but hopefully less uses of the quote “Robert Scoble is full of bean dip.”
Love love love,
Good morning peoples, there is no Morning Buzz today. Instead of going through my information traps, I spent the evening building a resource/information list for coverage of Hurricane Sandy, with a focus on New York City (though there is plenty of coverage that goes beyond as well.)
It is available at
Have you heard of IFTTT? It’s available at http://ifttt.com. Pronounced “ift” (like “lift” without the l), IFTTT is a free Web tool that uses channels to easily automate Web tasks. You can get a basic overview at https://ifttt.com/wtf but the premise is really simple — you choose a trigger (like a new item on an RSS feed, someone tagging you on Facebook, someone following you on Twitter, etc.) and in response to that trigger you can choose an action (automatically following a new Twitter follower page, sending Facebook-tagged photos of you to Dropbox, storing your Tweets in an Evernote account, etc.)
At first glance it looks simple and somewhat limited, because there are only so many triggers and actions. But as I spent a lot of time playing with it (I’m using it to automate a bunch of stuff at work) I realized that it could help me solve one of those annoyances that’s been bugging me for a long time, and that is keeping up with The Flickr Commons.
The Flickr Commons is a group of about five dozen institutions and repositories from all over the world that have come together to make some of their collections’ visual content available online without copyright. Group members include the New York Public Library, NASA, the National Archives of Norway, and the National Library of Scotland. So you can imagine there’s tons of great material there.
Unfortunately I couldn’t find a way to look at the latest Commons photographs in toto. I could look at individual institutions and follow them through an RSS feed; I could search Commons content; I could not find a way to look at the latest Commons stuff. I did not want to have to monitor 60-odd feeds. I wanted all the latest Commons content in one place.
IFTTT to the rescue!
IFTTT and RSS Feeds
IFTTT lets you pull content from RSS feeds as one of its triggers, which is probably what I do the most with it, as there are countless RSS feeds out there. Each institution participating in Flickr Commons has an RSS feed of the latest photographs added to its content.
I grabbed an RSS feed from one of the Flickr Commons members and started messing with it. Since an image thumbnail shows up in the feed, I tried grabbing the image and sending it any number of places, like Picasa and Dropbox. I wanted to make the photographs available publicly and I wanted to have an easy way to go to the original image if I saw something I liked and wanted to look at more closely (remember, the RSS feed has only a small image and not the full-sized photograph.) Picasa didn’t allow me to append enough information and Dropbox didn’t allow me to delineate the images enough.
So finally I ended up using Flickr itself — specifically, my own photostream.
Setting Up IFTTT
The IFTTT trigger/response sets are called recipes. So my recipe trigger was new content in one of the Flickr Commons institutional feeds. (I had to set up about 60 recipes, which was the most tedious part of this whole business.) If you want to play along at home and have an IFTTT account, I shared my recipe at https://ifttt.com/recipes/52593.
The action was to take the content from the institution’s feed and put it in my own Flickr photostream. But that wouldn’t be enough because there’s only so much good I’d get from a random image – I’d also want to know where it came from and where I could go to see larger versions of the image. So in addition to just moving the image over, the recipe also puts the source of the image and a link back to the original image in the description. There’s also an option to create new tags for each image as well — remember that because I’m going to come back to it later.
The Harvest on My Photostream
So I set up umpty-zillion recipes based on RSS feeds from Flickr Commons institutions let them run, and within a day I started having images automatically post to my Flickr photostream at http://www.flickr.com/photos/taracal/.
The URL in the description is not clickable from the galley page, but it is clickable on the individual picture’s page.
So what do I have now? Now I have a constantly-growing group of photos from the Flickr commons as my very own photostream, but in addition I have an RSS feed of all the latest content posted to Flickr Commons (via my account’s RSS feed on Flickr.) And with IFTTT, I can take that feed and do something else with it. In this case, I set up IFTTT to send me an alert via the iOS notification Pushover whenever the RSS feed updated. This came in handy when a picture of Queen Elizabeth came through on my iPhone and I was able to immediately text it to my anglophile friend Dee.
I had no hesitation in setting up these RSS feeds of visual content to aggregate on my own photostream because the Flickr Commons is just that — a Commons — and violating copyright was not a concern. Besides, I made sure that each description sourced the original image and linked back to it, trying to ensure that nobody thinks I’m the creator/keeper of these images.
If the aggregation of thumbnails, with clear attribution and links back to original content, could be considered fair use, I would really like to go further with this. There are so many institutions using Flickr. If you do just a simple people search for State Library you’ll find all kinds of goodies.
With IFTTT you could take the RSS feeds of the institutions in which you’re most interested and start a flow of thumbnails to your own Flickr stream, but more than that, you could give all pictures from that group of institutions the same tag and start creating your very own repository.
For example, I could go through Flickr’s people search and find North Carolina organizations — the NC State Archives, the Museum of Natural Sciences, the North Carolina State Library for the Blind, etc. I could set each of these up with an IFTTT recipe to send new content to my photostream, and tag each item as it’s added with not only the photo’s description but also with a unique tag of my own — maybe NCGROUPRB (something that probably isn’t replicated elsewhere on Flickr.) Then I just let it run. What I’m doing here is creating my very own Flickr subset from lots of different sources, in this case photographs from North Carolina organizations and institutions. (You could do this with any other topic you can imagine that can be found in the people search — state fairs, national museums, or even cooking schools!) When searching this collection, I could use incredibly general search queries (school, food, etc.) along with my unique tag and have success in finding images relevant to my context because I had narrowed down the searched pool of images in advance via the IFTTT image aggregation.
This setup isn’t perfect — IFTTT limits how much you can extract from a given RSS feed — but I’m having a lot of fun with my newly aggregated feed of Commons content and looking at a lot more pictures. If you find this useful and end up doing your own Flickr mini-content-curation project, let me know in the comments!