Learning Search

In Praise of Inurl:

Writing last week’s article took me back to the early days of the Web. (I had to walk ten miles through the snow uphill to get online, you know.) Web searching was mostly basic keywords, pages took ages to get indexed, and there were many institutions and agencies that weren’t even online. So while the early days of search engines were exciting and filled with promise, they were also restrictive in how you could search!

I was reflecting on that and realized that while some special search syntax (like site) get a lot of love, there are other syntax that people don’t seem to care as much about. So today I’m going to tell you three ways I like to use one of my favorite search syntax, inurl.

What Is Inurl?

Many searchers are familiar with the site special syntax. Site allows you to limit your search to either a specific domain or top level domain. For example, you could add either site:bbc.com or site:com to your search and they would work.

Inurl allows you to similarly limit your search, but you can use it to filter any part of an URL. inurl:com would certainly work, but it’ll find more search results than just sites ending in .com. If we searched for cows inurl:com, we might find these URLs in our search results:

https://www.olx.com.pk/animals/q-cows/

https://southaussiewithcosi.com.au/cosis-corner/cows-for-cambodia

https://rf.com.br/language/en/cows-cellular-on-wheels/

These are all URLs from different countries that end in a country code, not .com. They wouldn’t appear in a search for site:com.

Let’s do a search that’s a little more complex. Say I’m looking for historical press release mentions of Mae Jemison, the first African-American woman in space. I can do searches like:

“Press release” “Mae Jemison”

But that’s not going to find me just historical, archived press releases. Thanks to URL patterns (more about those later) I can do a search like this:

inurl:archive “press release” “Mae Jemison”

And get these kinds of results:

screenshot from 2018 07 07 09 26 12

Yes, you can get results from the Internet Archive, as we did in this case. But look at the rest of the results. We’ve got a press release from 2003 from Intel, and other releases and news stories from 2014, 2016, and what looks like 2010.

PLEASE NOTE!: I know when searching we don’t tend to worry about whether we’re capitalizing our search and syntax terms or not. I discovered by accident while I was writing this that if you capitalize inurl: as Inurl:, it doesn’t seem to work. To be safe, please make sure that inurl is not capitalized when you use it.

You can also use inurl in combination with site. Perhaps you want to exclude .com sites from the Mae Jemison search results to see what’s left. This query works fine:

inurl:archive “press release” “Mae Jemison” -site:com

screenshot from 2018 07 07 09 54 02

As you can see the results are a lot more academically oriented. You do have one holdover from the previous search (the Internet Archive result) but the rest of the results would have been harder to surface had you been including .com sites.

This leads us to the first way I use inurl – when using site just won’t do.

Inurl: When Site: Just Won’t Do

The best example of using inurl instead of site was in the first example of this article: when you’re trying to find government sites that aren’t in the US.

The top level domain gov by itself is used in the United States. In other parts of the world, it’s gov and the country code: gov.au, gov.uk, etc. Consider a research topic like wastewater management. A query like “wastewater management” site:gov will get you US-based results. What if you searched for “wastewater management” inurl:gov -site:gov ?

screenshot from 2018 07 08 04 57 18

I count results from Jamaica, Australia, Canada, and Jordan. Government sites not ending in gov.

Using inurl with site also helps when searching local government sites in the US. A lot of county and state Web sites have an URL like this: state.xx.us, where xx is the state’s abbreviation. www.state.nd.us , for example, redirects to the state of North Dakota’s Web site.

If you search for site:us you’re going to end up with a lot of non-government junk. If you add inurl:state to it you’re going to keep the focus on mostly local US government and all but eliminate the junk. Do a search for “financial management” site:us and then try a search for “financial management” inurl:state site:us . See the difference?

Finding (or Avoiding!) Patterns

In last week’s article on WordPress I mentioned that WordPress powers 30% of the Web as a content management system. It’s not the only one – there are services like Wix and Squarespace which also help people build Web sites without knowing a lot of coding.

That’s a big advantage for us, since that means the URLs are going to be similar. When a WordPress Web site creates a blog post, it creates the URL in the same way every time. It doesn’t randomize the words or the directories into which the page goes. We can take advantage of that.

Here’s an URL from ResearchBuzz:

https://researchbuzz.me/2018/07/06/using-the-wordpress-search-tool/

That’s for an article. With separate directories for the year, month, and day, you can infer from this URL that doing a search for site:researchbuzz.me inurl:2018 you’ll get only content from the  year 2018. With the article title in the URL you could infer (though not as confidently) that you can use inurl to specify words that might be an article title. Thus you could try a search like this:

site:researchbuzz.me inurl:2016 inurl:google

… and what you would get back would be articles with Google in the title that I wrote in 2016.

You can expand this technique way past one site. Say you wanted to get information on digital archives, but you wanted to be sure your focus was recent. Google’s date-based searching isn’t quite doing it for you. This search is perfectly valid:

“Digital archives” inurl:2018

Often libraries, when they’re part of universities or other institutions, will have library as part of their URLs. Try this search:

“Digital archives” inurl:2018 inurl:library

screenshot from 2018 07 08 06 26 23

Now, not all Web sites are going to use CMS and consistent URL patterns. And the ones that do might not use them in a way you’re searching. Will you miss search results using inurl like this? Yes you will. I don’t think you’ll miss enough for it to be problematic. If you’re worried, use inurl searches as “cleanup” after you’ve done your main searches.

The date examples I’ve used here are also a primary way I use inurl searches.

Narrowing In on a Date

Google’s ability to find items based on dates is … problematic. But by taking advantage of inurl’s ability to find years in URLs, you can do your own date searching that’s more useful.

Say you’re writing something about Marissa Mayer, former CEO of Yahoo. Before she was at Google, but you’re interested in her Yahoo tenure. She joined Yahoo in 2012. What kind of results will you get if you search for “Marissa Mayer” inurl:2012?

screenshot from 2018 07 08 08 03 00

Now you’ve got a set of really focused results just from one use of inurl. But you can stack them too! Marissa Mayer resigned from Yahoo in 2017. This search works just fine: Marissa Mayer (inurl:2012 | inurl:2013 | inurl:2014 | inurl:2015 | inurl:2016 | inurl:2017)

screenshot from 2018 07 08 09 00 59

Again, it won’t get you everything. Not all sites organize their content by date, and even some which do don’t use URLs in the date. But you will get a lot, and it you will find that it filters a lot of irrelevancy that using Google’s date search doesn’t do as well.

I find special syntax to be really useful both when building searches and alerts. The next time you’re about to use site in a search, think for a minute about how you might use inurl instead. You’ll be surprised how well it can focus and filter your results.

Categories: Learning Search

Tagged as: , ,

1 reply »

Whaddaya think?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.