american flag speech bubble language diversity

Does AI Think in English — With an American Accent?

| |

AI doesn’t just speak “English” — it speaks a very specific kind. A Berkeley study reveals a 60%+ bias toward Standard American English, and a Nature paper shows that dialect discrimination extends to decisions about jobs, credibility, and punishment. Then: does AI know the difference between Kuwaiti and Yemeni Arabic?

A Scene You Won’t Forget

In Guy Ritchie’s Snatch (2000), Brad Pitt plays Mickey — a bare-knuckle fighter from the Irish Traveller community in Britain, speaking a dialect so thick and so fast that even the British actors in the scene exchange glances of complete bafflement. American audiences understood nothing. British audiences didn’t fare much better. And the real joke, which Ritchie built entire scenes around, is that Brad Pitt — born in Shawnee, Oklahoma — had so completely mastered the Pikey Romani dialect that the other characters’ incomprehension became a plot device.

Years earlier, in Thelma & Louise (1991), the same Brad Pitt appeared for the first time with the easy drawl of a naive Southern drifter — a completely different voice, a different world. Then in Inglourious Basterds (2009), he speaks in a deliberately exaggerated Tennessee accent that Quentin Tarantino has publicly confirmed was intentional: the terrible Italian he attempts mid-film was designed to portray an arrogant American trying on a European identity that doesn’t belong to him.[1]

The question that matters here: if an AI model were given a text passage written in the voice of Mickey from Snatch and another in the voice of Lt. Aldo Raine from Inglourious Basterds, would it recognize them as the same actor? Would it distinguish between the dialects? And would its response change if we described the characters in writing rather than showing the film? That seemingly playful question leads somewhere more serious than it first appears.

Standard American English — Whose Standard?

In the first article, we established that AI “thinks in English.” But when you press the question — which English, exactly? — a gap opens that wasn’t visible from a distance.

In 2024, the Berkeley Artificial Intelligence Research (BAIR) Lab published a study that generated substantial discussion across research and journalism. Led by computer scientist Eve Fleisig, the team examined how GPT models responded to text written in ten varieties of English: Standard American English, Standard British English, and eight other widely-spoken non-“standard” varieties including Indian, Nigerian, Irish, Jamaican, and African American English. The finding that struck the researchers themselves: model responses retained features of Standard American English by a margin of more than 60% compared to any other variety.[2]

A detail that seems small but carries large implications: when users write with British spelling — “colour” instead of “color,” “organise” instead of “organize” — the model almost universally reverts to American spelling in its responses.[2] British English, which is the standard in the vast majority of non-American English-speaking countries, gets quietly corrected to the American version — as if the model is telling you: “you’ve been writing this wrong, let me fix it.”

The bias wasn’t only toward English at the expense of other languages — it was toward one specific variety of English at the expense of all the others. That sharpens the accusation considerably, and makes it both more precise and more uncomfortable.

When Dialect Follows Its Speaker Into Judgment

The story doesn’t stop at gentle preferences for American spelling and vocabulary. In the same year, Nature — one of the most prestigious scientific journals in the world — published a study led by researcher Valentin Hofmann of the Allen Institute for AI at Oxford University. The findings shook the field: language models don’t merely distinguish between English dialects — they issue implicit judgments about the people who speak them.[3]

When models were asked to make hypothetical decisions about people based only on how they wrote — job recommendations, credibility assessments, legal judgments — the results were alarming. Speakers of African American English (AAVE) were more likely to be assigned less prestigious jobs, more likely to be judged guilty in fictional criminal scenarios, and more likely to be recommended harsher sentences in hypothetical legal cases. The more disturbing paradox: when the same models were asked directly about their views on African Americans, the responses were positive and measured — because the models had learned to conceal overt bias. But when they acted implicitly, they revealed stereotypes more negative than any human bias toward this group ever recorded in experimental research.[3]

Genevieve Smith of BAIR puts it plainly: “Language carries power.” The numbers give that statement weight. On average, model responses to non-standard varieties were rated 22% more demeaning and 16% more stereotyping than responses to Standard American or British English.[2]

Does It Know the Difference Between Oxford and New Orleans?

Let’s sharpen the question further. If you wrote to a model in the dialect of New Orleans — that Southern American tongue mixing French Creole heritage, jazz rhythms, and a vocabulary unlike anything spoken in the rest of the country — and then wrote the same content in formal Oxford English, would the responses be equivalent in quality and tone?

What the research suggests: no. But the interesting part is not that the model prefers Oxford over New Orleans. The more accurate finding is that it prefers neither — it prefers a third variety nobody asked for: the neutral Standard American English of a national television broadcaster. A voice that belongs everywhere and nowhere in particular. That is what it produces by default, even when that isn’t what the context requires.

This is where the Brad Pitt example earns its weight. Mickey in Snatch represents the furthest edge of English dialect — a voice so specific that native English speakers couldn’t follow it. If you described a character like Mickey in writing and asked a model to continue his dialogue, the model would almost certainly produce something smoother and more legible than Mickey would ever say. The bias requires no intention. It only requires more data of one kind than another.

And What About Kuwaiti vs. Yemeni?

When we move to Arabic, the question becomes sharper and more personally felt. Take a clear example: Kuwaiti Arabic and Yemeni Arabic. Both are sometimes grouped under the broad label of “Gulf Arabic” in general linguistic classifications — but any Arabic speaker knows the distance between them is enormous. Kuwaiti Arabic carries Persian vocabulary absorbed through centuries of Gulf trade. Yemeni Arabic preserves linguistic structures among the oldest documented in the language, with a rhythm and cadence entirely its own. They are not the same thing wearing the same label.

If you write to a model in authentic Yemeni dialect and ask it something, does it accurately recognize that dialect and respond from within its cultural context? Or does it flatten everything to a generic “Gulf Arabic” — and produce a response that implicitly frames the question through a Kuwaiti or Saudi lens, simply because that content is better represented in its training data? (See our article: Dialect Prompting — How to Force AI to Respect Levantine, Egyptian, Gulf and Moroccan Arabic with Precision)

The honest answer is complicated. Recent models have improved at distinguishing Arabic dialects in clear textual cases. But the subtler layer — the rhythm, the unstated cultural reference, the locally-specific historical frame — remains fragile territory. This isn’t a subjective impression. A 2025 feature in Nature acknowledges directly that AI models remain primarily designed for English-speaking users in high-income countries, and that even speakers of non-standard English varieties — Indian, Jamaican, Nigerian — report feeling unrepresented by these tools.[4] The Arabic situation is several degrees more complex than that.

Chomsky, Then and Now

Noam Chomsky — who appeared in our first article as the architect of Universal Grammar — returns here in a different register. Chomsky is not only the linguist who proposed that all languages share a common deep structure. He is also one of the most consistent critics of American cultural hegemony writing today, and one of the clearest voices about how that hegemony operates not through declaration but through normalization: by making one version of the world appear “standard” and “neutral” when it is in fact a very specific cultural position.[5]

The irony is that Chomsky the political thinker and Chomsky the linguist are in tension here. The linguistic Chomsky gives AI a potential defense: if all languages share a deep structure, a model that “thinks in Standard American English” might be reaching that shared layer through the most heavily documented gateway available — not by intention, but by the path of least resistance. But the political Chomsky would answer immediately: the path of least resistance is always the path the powerful found easiest to build. The “neutral” standard exists because someone invested in it becoming the standard.

Standard American English appears “neutral” because it dominates the training data — not because it actually is neutral. Claimed neutrality is itself a form of bias, one that’s harder to see precisely because it doesn’t announce itself.

But Perhaps the Problem Is Already Retreating

Before settling into a final judgment, there’s an honest counterargument that deserves its place. The Berkeley study that demonstrated the bias also showed something else: models imitate non-standard dialects more when those dialects have greater data representation. Nigerian and Indian English are imitated more than Jamaican English — because the data volume is higher.[2] This means the problem isn’t architecturally permanent — it’s a data problem, and data problems can be addressed.

In 2024, the company Acree AI launched a model called Arcee-Meraj specifically designed for greater accuracy across Arabic dialects.[4] Chinese models like DeepSeek and Qwen are beginning to show performance parity between Arabic and English — suggesting that the American monopoly on how these models “think” is a product of a particular historical moment in AI development, not an immovable feature of the technology.

What I Notice in My Own Work

I return to the personal observation I described in the first article. When I start a new conversation in formal Arabic, no model asks: where are you from? What’s your dialect? What’s your cultural reference frame? It gives me the default setting. And the default setting for Arabic — across most major models — is a blend that skews toward formal Gulf-inflected vocabulary, just as the English default skews toward Standard American.

What genuinely interests me, though, isn’t the existence of this default. It’s how quickly the model abandons it. Two or three sentences in a particular style, a keyword from a specific dialect, a local cultural reference — and the model shifts. Does this mean the bias is shallow? Possibly. Or does it mean the adaptation is fast because the model is perpetually searching for more context to narrow the gap? That question — about memory, adaptation, and what it would mean for a model to truly remember you — is what the fourth article in this series will try to answer. (See our article: How AI Learns From You — and What It Actually Knows About You)

For now, the hanging question is this: if Standard American English is the most present dialect in English training data, and Gulf Arabic is the most documented variety in Arabic — where does that leave Arabic with all its twenty dialects? And is the bias toward Gulf Arabic within Arabic similar in nature to the bias toward American English within English, or is it deeper and more structurally entangled? That is exactly what the next article will examine.

Article 3: Arabic Is Twenty Languages Inside One — At Least


References

  1. Tarantino, Q., quoted in multiple interviews regarding Brad Pitt’s intentional accent choices in Inglourious Basterds (2009). See also: Singer, E. (dialect coach), “Movie Accent Expert Breaks Down 32 Famous Roles,” WIRED, 2016. wired.com
  2. Fleisig, E., Smith, G., Bossi, M., Rustagi, I., Yin, X., & Klein, D. (2024). Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination. Berkeley Artificial Intelligence Research (BAIR) Lab, September 2024. bair.berkeley.edu/blog/2024/09/20/linguistic-bias
  3. Hofmann, V., Kalluri, P. R., Jurafsky, D., & King, S. (2024). Dialect prejudice predicts AI decisions about people’s character, employability, and criminality. Nature, 633(8028), 147–154. nature.com/articles/s41586-024-07856-5
  4. Nature News Feature. “Large language models are biased — local initiatives are fighting for change.” November 2025. nature.com/articles/d41586-025-03891-y
  5. Chomsky, N. (1999). Profit Over People: Neoliberalism and Global Order. Seven Stories Press. For Chomsky’s analysis of how cultural dominance operates through normalization rather than declaration.

american flag speech bubble language diversity

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *