Wikipedia can be invaluable for searching, but sometimes it’s hard to extract Wikipedia’s Wikidata in a way that makes it immediately-useful for ongoing research. If you wanted to get all the official Web sites in a category, for example, you might go trawling around in the Wikidata Web site, or you might use Wikidata Quick Dip and generate a list that way. But you’d still have to do a lot of copying and pasting.
So I made Sheet-Shaped Wikipedia, at https://searchgizmos.com/ssw/ . SSW lets you provide a list of categories and get a carat-delimited text file (or files) of all the pages in those categories which share a Wikidata property. It’s very easy to use. Let’s step through it.
How to Use Sheet-Shaped Wikipedia
First, enter a list of Wikipedia categories from which you want to extract Wikidata properties. You can paste the entire URL as you can see in the screenshot, or you can use the name of the category without the Category: string ( RuPaul’s Drag Race All Stars contestants ) or with (Category:People with narcolepsy). Sometimes special characters will break an URL and SSW won’t give you any results. If that happens, try again with just the category name.
After you’ve entered the categories you want to review, use the drop-down menu to choose a Wikidata property you want to search for. I couldn’t list all the properties as there are over 10,000 of them, so this focuses mostly on social media. Wikidata properties to search for include official website, Facebook ID, Twitter ID, LinkedIn Personal/Company ID, YouTube Video/Channel ID, GitHub ID, and Google+ ID. (I left the last one in as a joke.)
Finally, you can choose to get the output as one text file per category, or you can merge them into a single file.
Let’s say we wanted to find official Web sites for the pages in these categories and we want all the results in one file:
Companies in the Dow Jones Transportation Average
Here’s what SSW looks like for that search:
All I have to do after I’ve made my choices is click Process Categories.
A spinning circle will appear to indicate Things Are Happening and a few seconds after that I’ll get a clickable link to a carat(^)-delimited text file.
Click on that and it’ll download to your computer.
Opening a Sheet-Shaped Wikipedia Text File
Here’s how to import the SSW-generated text file in Google Sheets. First, open a new Google Sheet and choose File -> Import.
Google Sheets will ask for the imported data. Upload your just-downloaded merged_data.txt file. .
You’ll get an Import file box. You have several Import Location options but I recommend Replace current sheet (the empty one you opened) because it’s simplest. For Separator Type choose Custom, and for Custom separator choose ^ , which is Shift-6 on the keyboards I’m familiar with. (Don’t rely on Google to try to detect the separator automatically.) There’s also a box for automatically converting text to numbers. Make sure that’s unchecked.
When you’ve made your choices click Import data and bam! Instant spreadsheet. (There’s a column B which has a link back to the original Wikipedia article, but I’ve hidden it to make the screenshot easier to understand.)
It’s important to note that not every page may have an official web site property associated with it. It all depends on what information has been provided to Wikidata. (On the other hand, the example query did yield a text sheet with 47 items. )
Sometimes the property Wikidata provides isn’t complete in itself. For example, if you get LinkedIn company IDs, you might get a string like via-rail-canada . You can use Google Sheets’ CONCATENATE function and a LinkedIn stem to turn that into a whole URL : https://www.linkedin.com/company/via-rail-canada/ .
I didn’t do a lot of deep thinking about the Wikidata properties to include in the dropdown menu, so if you have other items you’d like to see there, let me know. I don’t want it to end up at 100 properties, though!
Categories: Learning Search, RB Search Gizmos
Leave a Reply