A Question That Started With a Very Ordinary Conversation

I was writing in Arabic — formal, deliberate, unhurried — and waiting for a response. The conversation itself was unremarkable, nothing that should have provoked much reflection. But when the answer arrived, carrying something that resembled genuine understanding, a question surfaced that I haven’t been able to shake since that exact moment: when I wrote to it in Arabic, where did my words go?

I mean that literally. Did my words remain Arabic through every stage of processing — right up to the final moment before the response was shaped? Or does something happen out of sight, in those layers we never see, that converts what I wrote into another language, processes it there, and then hands it back to me re-translated?

This isn’t a technical question in the narrow sense. Or it is, on its surface — but underneath, it’s a question about something more fundamental: does the tool I work with every day genuinely understand me, or does it first translate me into another language, process the translation, and then return me to myself, reassembled? And does the difference between these two possibilities even matter enough to worry about?

I wasn’t looking for a philosophical answer when I started thinking about this. I was looking for a simple, practical understanding. But honest practical inquiry has a way of leading you, sometimes, toward questions you never intended to ask.

What the Research Says — and What It Doesn’t

In 2024, researchers at EPFL (École Polytechnique Fédérale de Lausanne) published a paper that stirred genuine debate in AI research circles. Its title was deliberately provocative: Do Llamas Work in English? — a reference to Meta’s LLaMA model family. Their central finding: large language models that claim multilingual capability use English internally, even when addressed in an entirely different language.^[1]

What this means in practice is striking. When you write to a model in Arabic, or Chinese, or Swahili, the model doesn’t process your input in that language directly. There’s a silent conversion to English first, then comprehension and response formation happen there, and then comes a re-translation back into the language you used. English isn’t just the language the model speaks — according to this finding, it’s the language the model thinks in, beneath the surface, before it says anything at all.

In February 2025, a paper appeared with a more direct title that left little room for misreading: Do Multilingual LLMs Think In English? The researchers found that the internal concept space of English-trained models is fundamentally English-centric — English tokens appear first in the deeper processing layers before any conversion toward the target output language takes place.^[2] Around the same time, a study from MIT reached a parallel conclusion: when an English-trained model processes input in another language, it passes through an intermediate stage where the processing appears to happen in English, before the final response is generated.^[3]

English may not only be a language the model speaks — it may be the language the model thinks in before it speaks yours at all. And the difference between thinking and speaking is precisely what makes this worth considering carefully.

The training data numbers make this finding feel less surprising and more structural. Common Crawl — one of the largest datasets used in training major language models worldwide — contains over 2.7 trillion English tokens.^[4] Against that number, languages spoken by tens of millions of people across millennia sit hundreds of times lower on the same scale. English isn’t merely the most represented language in these datasets — it’s a presence large enough to reshape how everything else is learned, because the model doesn’t acquire languages in isolation. It acquires them against the background of all that English that came before.

But this is precisely where we need to pause: does this mean the model is “biased” in the sense we usually mean? Or is the situation more complicated than a straightforward accusation?

The Internet Itself Has English Preferences

To understand the scale of English dominance in AI training data, it helps to step back and see a wider picture. The internet — the primary source of most of this training material — is not an even reflection of the world’s languages. Statistics from W3Techs indicate that English accounts for approximately 49-50% of website content worldwide, while native English speakers represent only around 17-18% of the global population.^[5] English occupies roughly three times more internet space than its proportion of human speakers would suggest.

And the issue isn’t only quantity. The type of English content available online is different in kind: academic papers, peer-reviewed scientific studies, journalistic archives spanning decades, philosophical discourse, literary fiction, Wikipedia with over six million English articles versus fewer than a million in Arabic. The model doesn’t only train on more English — it trains on English content that is more varied, more densely interconnected, and more epistemically rich than what exists in most other languages online.

This matters practically. When you ask a model about a philosophical concept, a historical event, or a scientific question, it draws on knowledge built primarily from English sources. The answer you receive — even when it arrives in Arabic — is, in some portion, a translation of how English-language texts approached that subject.

But Perhaps the Question Is Wrong to Begin With

Before walking too far in one direction, there’s another voice that deserves genuine attention — not as a counterargument to be dismissed, but as a real possibility that complicates the picture and makes it more honest.

Noam Chomsky built his most influential theory on the concept of Universal Grammar: the idea that all human languages, beneath their vast surface differences, derive from a single shared mental structure present in every person from birth — not as a specific language, but as an innate capacity for language itself.^[6] Steven Pinker extended this argument in The Language Instinct (1994), framing language as a biological instinct rather than a purely cultural acquisition, and suggesting that all humans think in a “language of thought” before translating it into their spoken tongue.^[7]

If all languages share a common deep structure, then perhaps when an AI model “thinks in English,” it isn’t privileging one culture over others so much as accessing that deeper shared layer — the substrate beneath language that no particular culture owns. The real problem, on this reading, may not be the language the model thinks in, but the cultural assumptions embedded in the English texts it trained on: the values, framings, silences, and blind spots that arrived as passengers alongside the grammar and vocabulary.

This splits the question into two tracks that can’t easily be merged: is “language” and “culture” the same thing in this context? Can a model think in English without inheriting its cultural dispositions? Or are language and culture so entangled that separating them is itself a kind of fiction?

What I See With My Own Eyes Every Day

I want to share a personal observation honestly, because it doesn’t support any single argument here — it sits in the gray zone between all of them, which is perhaps the most accurate place to be.

When I open a fresh conversation with any major language model — Claude, Gemini, ChatGPT — there’s an initial strangeness. The model defaults to Gulf Arabic vocabulary, partly because I registered with an email where I listed my location as the UAE — so it reads me as a Gulf user before I’ve said a word. It chooses expressions more characteristic of that region, even when I write in standard formal Arabic from Damascus. This isn’t a criticism — it’s simply what the data produces, and perhaps what the location signal reinforces: Gulf Arabic content is more abundant and better documented online than most other Arabic varieties, so the model gives you what it trained on most. (See our article: Dialect Prompting — How to Force AI to Respect Levantine, Egyptian, Gulf and Moroccan Arabic with Precision)

Then the conversation continues and lengthens. And gradually, without announcement, the model begins to adjust. The unfamiliar vocabulary decreases, the style moves closer to mine, the model starts tracking my local references and unstated assumptions. By the end of an extended conversation — or in the middle of a topic that’s given it enough context — I sometimes find text that genuinely resembles the way I think, in ways that feel almost uncanny.

What exactly happened? Did the model shed its English-and-Gulf bias and come to actually understand me? Or did it recalibrate its statistical defaults based on the conversation at hand, producing something that looks like understanding without being understanding in any deeper sense? The difference between these two interpretations isn’t merely technical — it’s the philosophically significant difference between a tool that genuinely learns and one that adapts so effectively it becomes indistinguishable from learning.

Statistical adaptation produces results that resemble understanding. And that resemblance is convincing enough, from the inside, to feel like the real thing. But resemblance alone is not evidence that they’re the same.

The Stochastic Parrot and the Limits of the Accusation

In 2021, Emily M. Bender and her colleagues published a paper that became one of the most cited and debated in AI ethics literature: On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Their argument, stated plainly: language models don’t understand in any meaningful sense — they aggregate vast statistical patterns from text and produce linguistic output without any genuine grasp of meaning or connection to the world it supposedly describes.^[8]

If this argument holds — and it remains a subject of serious, unresolved debate — it redraws the entire boundary of the problem we’ve been discussing. An AI model doesn’t carry cultural bias the way a human being does. It doesn’t prefer English because it grew up in it, built its identity through it, or loves it. It reflects what it was trained on, the way a mirror reflects what stands before it — without preference, without nostalgia, without any felt attachment. The problem then isn’t “AI bias” in any deep anthropomorphic sense. The problem is that we handed it a mirror in which one language appeared vastly more than the others, and we’re now surprised by the reflection.

This distinction, if we take it seriously, transforms the original question completely — from “does it think in your language?” to “did you even see yourself in it at all?” And from “is it biased?” to “biased toward whom, in what sense, and is this really different from any tool humans have ever made?”

That last question is what this entire series will try to work through — not with a ready-made answer, but by following the question honestly to wherever it leads. (See our article: The Lost Meaning — From Bathroom Soup to the Atomic Bomb)

What Are We Actually Looking For?

I don’t want to close this question with an answer that arrives too early. What I can say now — after thinking about this from several different angles — is that reality doesn’t match either the alarm or the reassurance I started with. The AI doesn’t “think in English” the way a person raised in the language does. It carries none of English’s memories, its emotional associations, its living cultural inheritance with all its contradictions and beauties and anxieties. But it also doesn’t think in your language in the way you might hope: that you’d find something in it that sees the world from your exact window, understands your unstated assumptions, grasps your local context without needing it explained.

There is a distance between those two things. And that distance isn’t fixed — it contracts when you extend the conversation and give it context, and expands when you close the window and start again from nothing. Understanding the nature of that distance, and what it means for the work we do with these tools, is what the rest of this series is about.

In the next article, we’ll find that English itself — the language the model supposedly “thinks in” — is not a single language so much as a family of dialects, registers, and cultural orientations with real, measurable differences between them. What we call “bias toward English” may in fact be something more specific: a bias toward one very particular variety of it, spoken in one place, shaped by one cultural moment. That turns out to matter quite a bit.

→ Article 2: Does AI Think in English — With an American Accent?

References

Wendler, R. et al. (2024). Do Llamas Work in English? On the Latent Language of Multilingual Transformers. EPFL / ETH Zurich. Presented at ACL 2024. aclanthology.org/2024.acl-long.820
Tang et al. (2025). Do Multilingual LLMs Think In English? arXiv preprint, February 2025. arxiv.org/abs/2502.05260
MIT Technology Review. “AI models process other languages by ‘thinking’ in English first.” February 2025. technologyreview.com
Common Crawl Foundation, Dataset Statistics. commoncrawl.org
W3Techs. “Usage statistics of content languages for websites.” 2024. w3techs.com
Chomsky, N. (1965). Aspects of the Theory of Syntax. MIT Press.
Pinker, S. (1994). The Language Instinct: How the Mind Creates Language. William Morrow.
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? FAccT ’21. dl.acm.org/doi/10.1145/3442188.3445922

🌐 Read this article in Arabic

Does AI Think in Your Language, or Is English Its Mother Tongue?

A Question That Started With a Very Ordinary Conversation

What the Research Says — and What It Doesn’t

The Internet Itself Has English Preferences

But Perhaps the Question Is Wrong to Begin With

What I See With My Own Eyes Every Day

The Stochastic Parrot and the Limits of the Accusation

What Are We Actually Looking For?

References

Zy Yazan’s Language Guide: How to Perfect Your Accent and Pronunciation Using AI

Zy Yazan’s Language Guide: How I Rediscovered Language Learning via AI

Zy Yazan’s Language Guide: Eloquence Engineering and Building a Digital Linguistic Memory

Your Web3 Starter Kit: Wallets, DAOs & How to Earn Online

Your Professional Memory — Building Your Personal Glossary with AI

Leave a Reply Cancel reply

A Question That Started With a Very Ordinary Conversation

What the Research Says — and What It Doesn’t

The Internet Itself Has English Preferences

But Perhaps the Question Is Wrong to Begin With

What I See With My Own Eyes Every Day

The Stochastic Parrot and the Limits of the Accusation

What Are We Actually Looking For?

References

Similar Posts

Leave a Reply Cancel reply

Spy On Us