Many details on the exact sequence of events leading up to Gebru’s departure are not yet clear; she and Google declined to comment beyond their social media posts. But MIT Technology Review obtained a copy of the research paper from one of the co-authors, Emily M. Bender, professor of computational linguistics at the University of Washington. While Bender asked us not to publish the article himself because the authors didn’t want such an early project circulating online, it does provide insight into the questions Gebru and his colleagues were raising about the AI that might worry Google.
Entitled “On the dangers of stochastic parrots: can language models be too big?” the paper exposes the risks of large AI-language models trained on staggering amounts of textual data. These have grown more and more popular-and bigger and bigger– during the last three years. They are now extraordinarily good, under the right conditions, at producing what looks like compelling and meaningful new text – and sometimes at estimating meaning from language. But, says the introduction to the paper, “we ask whether sufficient thought has been given to the potential risks associated with their development and strategies to mitigate those risks.”
The paper, which builds on the work of other researchers, presents the history of natural language processing, an overview of four main risks of large language models, and suggestions for further research. Since the conflict with Google seems to be over the risks, we’ve focused on summarizing them here.
Environmental and financial costs
Training large AI models consumes a lot of computer processing power, and therefore a lot of electricity. Gebru and her co-authors refer to a 2019 article by Emma Strubell and colleagues on carbon emissions and financial costs great linguistic models. He found that their energy use and carbon footprint had exploded since 2017, with models feeding more and more data.
Strubell’s study found that a language model with a particular type of “neural architecture research” (NAS) method would have produced the equivalent of 626,155 pounds (284 metric tons) of carbon dioxide, or the lifetime production of five medium-sized American cars. A version of Google’s language model, BERT, which underpins the company’s search engine, produced 1,438 pounds of CO2 equivalent according to Strubell’s estimate – almost the same as a return flight from New York to San Francisco.
Gebru’s draft paper points out that the resources needed to build and maintain these large AI models mean they tend to benefit wealthy organizations, while climate change hits marginalized communities harder. “It is high time for researchers to prioritize energy efficiency and cost to reduce negative environmental impact and inequitable access to resources,” they write.
Big data, unfathomable models
Large language models are also trained on exponentially growing amounts of text. This means that the researchers sought to collect all the data they could from the internet, so there is a risk that racist, sexist and otherwise abusive material could end up in the training data.
An AI model taught to regard racist language as normal is obviously bad. The researchers point out a few more subtle issues, however. The first is that language changes play an important role in social change; the MeToo and Black Lives Matter movements, for example, have attempted to establish a new anti-sexist and anti-racist vocabulary. An AI model trained over vast swathes of the internet will not listen to the nuances of this vocabulary and will not produce or interpret language in accordance with these new cultural norms.
It will also fail to capture the language and standards of countries and peoples who have less access to the Internet and therefore a smaller linguistic footprint online. The result is that the language generated by AI will be homogenized, reflecting the practices of the richest countries and communities.
Also, since the training data sets are so large, it is difficult to audit them to check for these built-in biases. “A methodology that relies on data sets that are too large to be documented is therefore inherently risky,” the researchers conclude. “While documentation allows for potential liability, […] undocumented training data perpetuates harm without recourse. “
Look for opportunity costs
Researchers summarize the third challenge as the risk of a “misguided research effort”. Although most AI researchers recognize that large language models not really understand Language and are simply excellent at manipulate This allows Big Techs to make money from models that manipulate language more precisely, so they keep investing in them. “This research effort comes at an opportunity cost,” write Gebru and colleagues. Working on AI models that could provide insight or perform well with smaller, more carefully organized (and therefore less power-consuming) datasets doesn’t put so much effort.
Illusions of meaning
The final problem with large language models, say the researchers, is that because they’re so good at mimicking real human language, it’s easy to use them to fool people. There have been a few high profile cases, such as the student who produced AI-generated self-help and productivity tips on a blog, which went viral.
The dangers are obvious: AI models could be used to generate misinformation about an election or the covid-19 pandemic, for example. They can also inadvertently get it wrong when used for machine translation. The researchers cite an example: in 2017, Facebook poorly translated a message from a Palestinian, who said “hello” in Arabic, such as “attack them” in Hebrew, leading to his arrest.
Why is this important
The Gebru and Bender article has six co-authors, four of whom are Google researchers. Bender asked to avoid releasing their names for fear of repercussions. (Bender, on the other hand, is a full professor: “I think that underlines the value of academic freedom,” she says.)
The purpose of the article, Bender says, was to take stock of the current research landscape in natural language processing. “We’re working at a scale where the people who are building things can’t actually get to grips with the data,” she said. “And because the pros are so obvious, it’s especially important to take a step back and ask yourself what are the possible downsides? … How to take advantage of it while mitigating the risks? “
In his internal email, Dean, Google’s chief AI officer, said that one of the reasons the document “was not meeting our bar” was because it “was ignoring too much relevant research.” Specifically, he said he was not mentioning more recent work on how to make large language models more energy efficient and mitigate bias issues.
However, the six collaborators drew on a wide range of knowledge. The article’s citation list, with 128 references, is particularly long. “It’s the kind of work that no individual or even pair of writers can do,” Bender said. “It really required this collaboration.”
The version of the article that we saw also refers to several research efforts on reducing the size and computational costs of large language models, and measuring the built-in bias of models. He argues, however, that these efforts have not been sufficient. “I’m very open to seeing what other references we should include,” Bender said.
Nicolas Le Roux, Google AI researcher in the Montreal office, later noted on Twitter that the reasoning in Dean’s email was unusual. “My submissions have always been checked for disclosure of sensitive material, never for the quality of the literature review,” he said.