Big Bad NLP Database, Northwest Kansas Library System Newsletters, Library of Congress, More: Saturday Afternoon ResearchBuzz, February 29, 2020


From last month, but I just learned about it today. Quantum Stat: 100s of datasets for machine learning developers (and counting). “With the advent of deep learning and the necessity for more and diverse data, researchers are constantly hunting for the most up-to-date datasets that can help train their ML model. Currently, NLP data seems to be scattered across several 3rd party libraries, Reddit, or in the research arms of big tech. And while these mediums are useful, there doesn’t seem to be a central hub for housing NLP data that can be easily reached and searched by the ML engineer. As a result, we’ve created the ‘Big Bad NLP Database,’ the world’s largest data library in natural language processing:”

Kansas State Library: Northwest Kansas Library System Newsletters, 1977-2017. “Recently, with the approval of NWKLS director, George Seamon Jr., the State Library of Kansas added its collection of Northwest Kansas Library System newsletters to our KGI Online Library collection. As stated in previous KGI Blog articles these regional library system newsletters are a treasure trove of historical information on the development and activities of library systems, libraries, library folks and library issues in Kansas. Our NWKLS newsletter collection spans about 40 years from 1977 to 2017.”


Library of Congress: New Collaboration between LC Labs, British Library, and the Zooniverse. “The project is titled ‘From crowdsourcing to digitally-enabled participation: the state of the art in collaboration, access, and inclusion for cultural heritage institutions,’ resulting from this call. The project will convene experts in several ways over the next 12 months. Together, these groups will describe and document practical approaches and future paths in crowdsourcing through a book sprint, and open comment period, and a follow up workshop.”

Search Engine Journal: Google Appoints First-Ever ‘Creator Liaison’ for YouTube. “Google has created a new position within its company; a ‘creator liaison’ for communication between YouTube and its video publishers. A former YouTuber named Matt Koval has been appointed to the position, so you can now think of him as the Danny Sullivan (Google’s Search Liaison) of YouTube.”

The Verge: The World Health Organization has joined TikTok to fight coronavirus misinformation. “The World Health Organization launched a TikTok account on Friday as part of its efforts to cut through coronavirus misinformation online. A specialized public health agency of the United Nations, WHO is one of the leading organizations working to contain the spread of the virus.”


Bloomberg: When Big Tech Goes Green, Taxpayers Help Foot the Bill. “Google is hardly the only company to shield its identity as it negotiates with local governments and utilities. The practice is becoming standard among the technology industry’s biggest companies. As it builds out its data center footprint, Facebook consistently conditions potential deals on the utmost secrecy.”

Global Investigative Journalism Network: GIJN’s Data Journalism Top 10: Weird Maps, ‘Out of Control’ Airbnb, Augmented Reality Graphics, Russian Doctors, Brazilian Data. “What’s the global data journalism community tweeting about this week? Our NodeXL #ddj mapping from February 17 to 23 finds geographer Tim Wallace collecting some amusingly unusual maps, The Guardian analyzing the effect of Airbnb on home ownership in Great Britain, and former Ogilvy & Mather chief creative officer Tham Khai Meng sharing how a Japanese newspaper utilized augmented reality to animate graphics.”


New Indian Express: Police meet Google officials on rising cyber frauds in Hyderabad. “Owing to an increase in cyber frauds through Google-based services such as customer care frauds on Google search, frauds through Google View Form, Google Pay and Google Ad services, Cyberabad police held a meeting with Google representatives. The cops discussed issues faced by investigating officers and vulnerabilities in Google services. They also discussed remedial steps taken by Google so far and further steps to be taken.”

BuzzFeed News: Clearview’s Facial Recognition App Has Been Used By The Justice Department, ICE, Macy’s, Walmart, And The NBA. “The United States’ main immigration enforcement agency, the Department of Justice, retailers including Best Buy and Macy’s, and a sovereign wealth fund in the United Arab Emirates are among the thousands of government entities and private businesses around the world listed as clients of the controversial facial recognition startup with a database of billions of photos scraped from social media and the web. The startup, Clearview AI, is facing legal threats from Facebook, Google, and Twitter, as well as calls for regulation and scrutiny in the US. But new documents reviewed by BuzzFeed News reveal that it has already shared or sold its technology to thousands of organizations around the world.”


Medium: How Much Data Is Too Much Data? — Federating Data in the Age of Connectivity. “It is projected that every day humans produce approximately 2.5 quintillion bytes of data. With this insane amount of new data, surely some of it must be redundant, right? For data science, analytics, and machine learning, this increase in the amount of data available leads to previously unthinkable new avenues for research. But while more and more data is being harvested for a variety of reasons, could better curation of the data we have already collected lead to better outcomes for research?” Good afternoon, Internet…

Do you like ResearchBuzz? Does it help you out? Please consider supporting it on Patreon. Not interested in commitment? Perhaps you’d buy me an iced tea. I love your comments, I love your site suggestions, and I love you. Feel free to comment on the blog, or @ResearchBuzz on Twitter. Thanks!

Categories: afternoonbuzz

Leave a Reply