Can language make us see?

Imagine an elephant. One of those African bush elephants. Large ears covering its shoulders; a powerful trunk falling to the ground; a pair of magnificent ivory tusks.

Were you able to see it? If so, it is tempting to jump to a conclusion: Yes, language makes us see things we are not currently perceiving through our eyes. But does your mental picture of an elephant have anything to do with real vision? Here’s the answer.

PortraitFiction writers constantly conjure new images through language – that is their job. Take Borges’s Book of imaginary beings (published when Borges had already gone blind and so couldn’t actually see anymore). There we learn that “The Amphisbaena is a serpent having two heads, the one in its proper place and the other in its tail; and it can bite with both, and run with agility, and its eyes glare like candles.” Language triggers images in our minds we haven’t seen before.

The ability to create images from language seems to be present early in life, as testified by the interest children take when listening to stories. Kids are (seemingly) able to visualize the characters and scenes that make up the plot. I recently had some first-hand experience of this ability when I gave my seven-year-old daughter the following playful task:

I printed out cards with the name of thirty animal species (gorilla, cheetah, hippo, mosquito, …) and asked my daughter to sort them in different ways: by size, from the smallest to the largest animal; by shape; by colour; etc. Her outward behaviour was very similar each time. She silently looked at the cards, then took one and moved it into a pile, stared at the next, moved it either to the same or a different pile, and so on. The piles were of course different for each sorting criterion. But each time her groupings were very similar to the ones I would have made.


We take our ability to solve this type of task for granted. However, from the point of view of a naïve observer (the proverbial alien from a different galaxy who peacefully studies the human species), it is quite a remarkable behaviour. Here we have a set of pieces of paper with some strange scribbles on them. When given a certain cue (“shape”), the human groups the pieces of paper in a certain way. When given a different cue (“colour”), the groupings change. Notably the groupings are systematic: give the same task to a different human and they will come up with almost identical solutions. There is nothing in the strange symbols themselves that could predict how they are grouped. What are these humans doing?

The trick is not only that these symbols refer to things in the world, such that we can group the symbols as we would group the things, like with like. We can go a step further and recreate the things in our mind, flexibly inspect them for the relevant feature (say, colour), and finally group those things according to that feature.

Now here is the critical question: How do we do this? How did my daughter solve the task? I’ve been talking about language making us see and evoking mental images. Have I just been poetic (or sloppy), using image and see in a loosely metaphorical sense? Or was your experience when you imagined an elephant in any meaningful way like seeing an actual elephant?

It is not just a metaphor. Visual imagery is surprisingly similar to actual seeing. This is the upshot of a decade-long debate that has recently been solved. The debate opposed two camps. One side argued that visual imagery operates on symbolic representations. The other side argued that it instead relies on depictive representations. The latter side won.[1]

Let me clarify the two contrasting hypotheses of the imagery debate with an analogy. First the symbolic hypothesis. Your computer is able to store information that allows it to reproduce an image. If you google “elephant”, for instance, the image of an elephant will appear on the screen. The relevant information to build the image has to be stored somewhere on your computer. Now imagine you take a microscope and open up your computer (its hard drive, graphical card, whatever) in search for this image, however small. No matter how hard you search for the image of an elephant, you won’t find it. The image of the elephant you are seeing on the screen is encoded in a purely symbolic fashion in the computer. It is made up of 0s and 1s that bear no depictive resemblance to the elephant picture. For the computer, then, the answer is clear: it operates on symbolic representations.

Note that the screen is not part of the inner workings of your computer, it is merely useful to us humans when we want to interact with the computer. Your computer can perform all kinds of operations on the information that encodes the image of an elephant without ever outputting this image on the screen. It could for instance compute its similarity to another image (say, of a giraffe) by merely comparing long vectors of 0s and 1s, completely oblivious to what this information represents – indeed for the computer it does not represent anything.

So the symbolic camp lost the debate. When you imagine an elephant and I ask you to describe its shape, your brain does not operate in a symbolic fashion. It operates with depictive representations. Does this mean that you would find an image of an elephant if you opened up your skull while you are imagining one? Well, even this is true to some extent (more on this in a second), but proving that visual imagery does not operate on symbolic representations doesn’t quite require this. What we need to show is that imagining an elephant relies on the same mechanisms we use to see an elephant in the real world. And that is what has been shown to be the case.

Upon being summoned to imagine an elephant, the part of your brain that recognizes the animal name sends an order to brain areas involved in vision that they should activate an image of an elephant. This engages the very same part of your brain (the visual cortex, in the back of your head) that is activated when you see an actual elephant. In a sense, you may indeed find an image of an elephant, due to the retinotopic organization of early visual cortex. Retinotopy means that the spatial arrangement of active neurons mirrors the light that enters through the retina, so that you can read off the shape of an object you are looking at from the associated neural activity in the visual cortex. A crucial finding in the imagery debate was that the same type of shape information can be decoded when someone is merely imagining an object.[2]

We should not take the image-in-your-brain idea too far, however: For instance, you will not find that parts of your brain turn yellow when you think of a giraffe. To avoid any comical misunderstanding, the depictive camp won the debate because they showed that visual imagery engages the same processes in the brain as vision, not because they found small grey elephants in people’s heads.

The ability to literally conjure up images makes us very flexible. Think of my daughter solving the task. Once the image was there, she could inspect it and extract whatever information was relevant to the task. If it was about size, she would mentally measure the size of the animals side by side and see that a cheetah is smaller than a cow. If the task was about colour, she would instead notice the yellowish fur covered with spots and would therefore group it together with a giraffe.

We engage in mental imagery all the time. Coupled with language this ability goes a long way into conferring us super powers. It allows us to replay the past. Critical to our survival as a species, we can also imagine – and thus prepare ourselves for – the future, the unknown, the impossible. (Even though this ability seems to be largely failing us in the current climate crisis.)

So yes, we may say that language makes us see. And if all you care of is a definite answer, this is the right place to stop reading and carry on with your day.


What drives science forward are conundrums. Here is one. People who are blind from birth but otherwise perfectly normal make animal groupings that are remarkably similar to those of sighted people.[3] Now if we commit to the story above as the only way of solving the animal grouping task, we would have to reach a somewhat miraculous conclusion: “Language gives sight to the blind” (paraphrasing Psalm 146:8). Taken in a literal sense (which is how I’ve argued we should understand the idea of mental images), this just seems impossible. The behaviour of the blind poses a problem for the story above. Let us simply conclude that there is more to language and visual imagery than meets the eye. Considering the intriguing case of people born blind will have to wait for another post.

Featured image: “Surpris !” by French painter Henri Rousseau (1844–1910). Rousseau painted many jungle scenes without ever having seen the jungle (he never left France). Much of his imagination may have been nourished by descriptions in natural history books.

[1] Pearson, J., & Kosslyn, S. M. (2015). The heterogeneity of mental representation: Ending the imagery debate. Proceedings of the National Academy of Sciences, 112(33), 10089–10092.

[2] Naselaris, T., Olman, C. A., Stansbury, D. E., Ugurbil, K., & Gallant, J. L. (2015). A voxel-wise encoding model for early visual areas decodes mental images of remembered scenes. NeuroImage, 105, 215–228.

[3] Kim, J. S., Elli, G. V., & Bedny, M. (2019). Knowledge of animal appearance among sighted and blind adults. Proceedings of the National Academy of Sciences, 201900952.