In 2012, artificial intelligence researchers have taken a big step forward in computer vision thanks, in part, to an unusually large set of images – thousands of everyday objects, people, and scenes in photos that were retrieved from the Web and labeled by hand. This data set, called ImageNet, is still used in thousands of AI research projects and experiments today.
But last week every human face included in ImageNet suddenly vanished – after the researchers managing the dataset decided to scramble them.
Just as ImageNet helped usher in a new era of AI, efforts to address it reflect the challenges that plague countless AI programs, datasets, and products.
“We were concerned about the issue of confidentiality,” says Olga Russakovsky, Assistant Professor at Princeton University and one of the managers of ImageNet.
ImageNet was created as part of a challenge which invited computer scientists to develop algorithms capable of identifying objects in images. In 2012, it was a very difficult task. Then a technique called deep learning, which consists of “teaching” a neural network by giving it labeled examples, it proved to be more adept at the task than previous approaches.
Since then, deep learning has led to an AI renaissance that has also exposed the shortcomings of the field. For example, facial recognition has proven a particularly popular and lucrative use of deep learning, but it is also controversial. A number of American cities banned the use of technology by the government on concerns about invasion of privacy or citizen biases, because programs are less accurate on non-white faces.
Today, ImageNet contains 1.5 million images with approximately 1,000 labels. It is widely used to evaluate the performance of machine learning algorithms, or to train algorithms that perform specialized computer vision tasks. Face blur affected 243,198 images.
Russakovsky says the ImageNet team wanted to determine if it was possible to blur faces in the dataset without changing the way they recognize objects. “People were fortuitous in the data since they appeared in web photos depicting these objects,” she says. In other words, in an image that shows a bottle of beer, even though the face of the person drinking it is a pink stain, the bottle itself remains intact.
In a research paper, posted with the update on ImageNet, the team behind the dataset says it blurred faces using Amazon’s artificial intelligence service Recognition; then they paid Mechanical Turkish workers to confirm and adjust selections.
Face blurring did not affect the performance of several object recognition algorithms trained on ImageNet, the researchers said. They also show that other algorithms built with these object recognition algorithms are not affected in the same way. “We hope this proof of concept will pave the way for more privacy-friendly visual data collection practices in the field,” says Russakovsky.
This is not the first effort to adjust the famous image library. In December 2019, the ImageNet team removal of biased and derogatory terms introduced by human taggers after a project called AI excavation drew attention to the problem.
In July 2020 Vinay Prabhu, a machine learning scientist at UnifyID and Abeba Birhane, doctoral student at University College Dublin in Ireland, published research showing that they could identify individuals, including computer researchers, in the dataset. They also found pornographic images there.
Prabhu says blurring faces is a good thing, but he’s disappointed the ImageNet team didn’t recognize the work he and Birhane have done.
Face blurring can still have unintended consequences for algorithms trained on ImageNet data. Algorithms can, for example, learn to look for fuzzy faces when looking for particular objects.
“An important issue to consider is what happens when you deploy a model that has been trained on a fuzzy dataset,” says Russakovsky. For example, a robot trained on the dataset can be thrown by faces in the real world.
More WIRED stories