AI models believe racist stereotypes about African Americans that predate the Civil Rights movement — and they 'try to hide it when confronted'
When exposed to terms common in different racial dialects, large language models make racist assumptions about people from particular racial groups, even without explicitly knowing their race.
Scientists have discovered that common AI models express a covert form of racism based on dialect — manifesting chiefly against speakers of African American English (AAE)
In a new study published Aug. 28 in the journal Nature, scientists found evidence for the first time that common large language models including OpenAI's GPT3.5 and GPT-4, as well as Meta's RoBERTa, express hidden racial biases.
Replicating previous experiments designed to examine hidden racial biases in humans, the scientists tested 12 AI models by asking them to judge a "speaker" based on their speech pattern — which the scientists drew up based on AAE and reference texts. Three of the most common adjectives associated most strongly with AAE were "ignorant," "lazy" and "stupid" — while other descriptors included "dirty," "rude" and "aggressive." The AI models were not told the racial group of the speaker.
The AI models tested, especially GPT-3.5 and GPT-4, even obscured this covert racism by describing African Americans with positive attributes such as "brilliant" when asked directly about their views on this group.
While the more overt assumptions that emerge from AI training data about African Americans in AI aren't racist, more covert racism manifests in large language models (LLMs) and actually exacerbates the discrepancy between covert and overt stereotypes, by superficially obscuring the racism that language models maintain on a deeper level, the scientists said.
The findings also show there is a fundamental different between overt and covert racism in LLMs, and that mitigating overt stereotypes does not translate to mitigating the covert stereotypes. Effectively, attempts to train against explicit bias are masking the hidden biases that remain baked in.
Related: 32 times artificial intelligence got it catastrophically wrong
Sign up for the Live Science daily newsletter now
Get the world’s most fascinating discoveries delivered straight to your inbox.
"As the stakes of the decisions entrusted to language models rise, so does the concern that they mirror or even amplify human biases encoded in the data they were trained on, thereby perpetuating discrimination against racialized, gendered and other minoritized social groups," the scientists said in the paper.
Concerns about prejudice baked into AI training data is a longstanding concern, especially as the technologies are more widely used. Previous research into AI bias has focused concentrated on overt instances of racism. One common test method is to name a racial group, discern connections to stereotypes about them in training data and analyze the stereotype for any prejudiced views on the respective group.
But the scientists argued in the paper that social scientists contend there's a "new racism" in the present-day United States that is more subtle — and it's now finding its way into AI. One can claim not to see color but still hold negative beliefs about racial groups — which maintains racial inequalities through covert racial discourses and practices, they said.
As the paper found, those belief frameworks are finding their way into the data used to train LLMs in the form of bias AAE speakers.
The effect comes largely because, in human-trained chatbot models like ChatGPT, the race of the speaker isn't necessarily revealed or brought up in the discussion. However, subtle differences in people's regional or cultural dialects aren't lost on the chatbot because of similar features in the data it was trained on. When the AI determines that it's talking to an AAE speaker, it manifests the more covert racist assumptions from its training data.
"As well as the representational harms, by which we mean the pernicious representation of AAE speakers, we also found evidence for substantial allocational harms. This refers to the inequitable allocation of resources to AAE speakers, and adds to known cases of language technology putting speakers of AAE at a disadvantage by performing worse on AAE, misclassifying AAE as hate speech or treating AAE as incorrect English," the scientists added. "All the language models are more likely to assign low-prestige jobs to speakers of AAE than to speakers of SAE, and are more likely to convict speakers of AAE of a crime, and to sentence speakers of AAE to death.
These findings should push companies to work harder to reduce bias in their LLMs and should also push policymakers to consider banning LLMs in contexts where biases may show. These instances include academic assessments, hiring or legal decision-making, the scientists said in a statement. AI engineers should also better understand how racial bias manifests in AI models.
Drew is a freelance science and technology journalist with 20 years of experience. After growing up knowing he wanted to change the world, he realized it was easier to write about other people changing it instead. As an expert in science and technology for decades, he’s written everything from reviews of the latest smartphones to deep dives into data centers, cloud computing, security, AI, mixed reality and everything in between.