Social media platforms like Facebook, Twitter, and YouTube have been making significant investments in the development of artificial intelligence to moderate content and automate the removal of harmful posts. These decision-making technologies typically rely on machine learning techniques and are specific to types of content, such as images, videos, sounds, and written text. Some of these AI systems, developed to measure the “toxicity” of text-based content, make use of natural language processing and sentiment assessment to detect harmful text.
WIRED OPINION
ABOUT
Dennys Antonialli is a privacy law scholar and the executive director of InternetLab, an internet law and policy think tank based in São Paulo, Brazil.
While these technologies may appear to represent a turning point in the debate around hate speech on the internet, recent research has shown that they are still far from being able to distinguish context or intent. If such AI tools are entrusted with the power to police content online, they have the potential to suppress legitimate speech and censor the use of specific words, particularly by vulnerable groups.
At InternetLab, we recently conducted a study focused on Perspective, an AI technology developed by Jigsaw (owned by Google’s parent company, Alphabet). The AI measures the perceived level of “toxicity” of text-based content. Perspective defines “toxic” as “a rude, disrespectful, or unreasonable comment that is likely to make you leave a discussion.” Accordingly, the AI model was trained by asking people to rate internet comments on a scale from “very healthy” to “very toxic.” The level of perceived toxicity indicates the likelihood that a specific comment will be considered toxic.
We used Perspective’s API to compare the perceived levels of toxicity of well-known drag queens and far-right political figures. The study compared the Twitter accounts of all the former participants of RuPaul’s Drag Race with those of far-right leaders such as David Duke, Richard Spencer, Stefan Molyneux, and Faith Goldy. Additionally, we included prominent non-LGBTQ Twitter users, including Donald Trump and Michelle Obama. We analyzed over 114,000 tweets posted in English with Perspective’s most recent version.
Our results indicate that a significant number of drag queen Twitter accounts were calculated to have higher perceived levels of toxicity than white nationalist leaders. On average, the toxicity levels of the drag queens’ accounts ranged from 16.68 percent to 37.81 percent, while the white nationalists’ averages spanned from 21.30 percent to 28.87 percent. The toxicity level of President Trump’s Twitter account was 21.84 percent.
We also ran tests measuring the toxicity levels of words commonly found in the tweets of drag queens. These words had significantly high levels of toxicity: gay (76.10 percent), lesbian (60.79 percent), queer (51.03 percent), and transvestite (44.48 percent). That means that, regardless of context, neutral words such as “gay,” “lesbian,” and “queer” were ranked as significantly “toxic” by Perspective’s AI. This indicates important biases in Perspective’s tool.
Additionally, words such as “fag” (91.94 percent), “sissy” (83.20 percent), and “bitch” (98.18 percent) registered high levels of toxicity. Though those words might be commonly perceived as harmful, their use by members of the LGBTQ community most often serves a different purpose.
Drag queens can be sharp-tongued. From “reads”—a specific form of insult that acerbically exposes someone’s flaws—to harsh jokes and comebacks, drag queens often reclaim words traditionally used as slurs to build a distinctive communication style.
In person, it is easier to understand context and see this as a form of self-expression. But when reading such missives online, it is significantly more challenging to distinguish between harmful and legitimate speech—especially when that assessment is made by machines. These in-group uses were also found in various tweets we analyzed. But in many of those cases, Perspective still deemed the post extremely toxic:
Level of toxicity: 95.98 percent
Level of toxicity: 91.16 percent