Last June, Antonio Radić, the host of a YouTube chess channel with over a million subscribers, was broadcasting live an interview with the grandmaster Hikaru Nakamura when the broadcast suddenly cut off.
Instead of a heated discussion of chess openings, famous games, and iconic players, viewers were told RadicVideo in the video has been removed due to its “harmful and dangerous” content. Radić saw a message stating that the video, which contained nothing more outrageous than a discussion of the Indian defense of the king, had violated the YouTube community policy. He stayed offline for 24 hours.
What happened is still unclear. YouTube declined to comment beyond saying that deleting Radić’s video was a mistake. But a new study suggests it reflects shortcomings in artificial intelligence programs designed to automatically detect hate speech, abuse and misinformation online.
Ashique KhudaBukhsh, a scientist on the AI project at Carnegie Mellon University and a serious chess player himself, wondered if YouTube’s algorithm could have been confused by discussions involving black and white pieces, attacks and tusks.
So he and Rupak sarkar, engineer at CMU, designed an experiment. They formed two versions of a language model called BERT, one using posts from the racist far-right website Stormy front and the other using data from Twitter. They then tested the algorithms on the text and comments of 8,818 chess videos and found them far from perfect. Algorithms flagged around 1% of transcripts or comments as hate speech. But over 80% of those reported were false positives – read in context, the language was not racist. “Without a human in the loop,” both men say in their article, “trusting the predictions of standard classifiers on chess discussions can be misleading.”
The experiment revealed a major problem for AI language programs. Detecting hate speech or abuse isn’t just about catching wrongs words and phrases. The same words can have very different meanings in different contexts, so an algorithm must infer a meaning from a string of words.
“Basically the language is still a very subtle thing”, says Tom mitchell, CMU professor who previously worked with KhudaBukhsh. “These types of trained classifiers won’t be 100% accurate any time soon.”
Yejin Choi, an associate professor at the University of Washington specializing in AI and language, says she is “not at all” surprised by YouTube’s withdrawal, given the limits of understanding languages today. Choi says further progress in detecting hate speech will require big investments and new approaches. She says algorithms work best when they analyze more than just a piece of text in isolation, incorporating, for example, a user’s comment history or the nature of the channel the comments are posted to.
But Choi’s research also shows how detecting hate speech can perpetuate stigma. In one Study 2019, she and others found that human annotators were more likely to label Twitter posts from users who identify as African-American as abusive, and that algorithms trained to identify abuse using these annotations. will repeat these prejudices.
Companies have spent millions of dollars collecting and annotating training data for self-driving cars, but Choi says the same effort was not made in annotation language. So far, no one has collected and annotated a high-quality hate speech or abuse dataset that includes many “extreme cases” with ambiguous language. “If we invest that level in data collection – or even a small fraction of it – I’m sure AI can do a lot better,” she says.
Mitchell, Professor CMU, says YouTube and other platforms probably have more sophisticated AI algorithms than the one built by KhudaBukhsh; but even these are still limited.