An A-Z of the Languages and Loanwords of the English Lexicon – from Arabic to Zulu!

Among language lovers and loathers alike, it’s well known that the modern lexicon of English is drawn from different sources. While a solid chunk of English vocabulary has a Germanic origin, much also comes from French, in a large part due to Duke William of Normandy and some battle that happened in 1066 AD. For instance, in the first sentence, we have among, lovers, and, loathers, alike, well, known and drawn from Germanic, and language, modern, different and sources from French, all sitting happily side by side.

Yet the sources of English words don’t stop at two. You might know that Old Norse had quite an influence on the languages of Britain starting in the 8th century, while classical languages like Latin and Ancient Greek have also contributed an abundance of vocabulary and a plethora of phrases, especially in technical spheres like medicine and engineering.

Nor do these three exhaust the sources either! On close inspection, we can find words from numerous other word-springs: from Semitic languages like Arabic and Hebrew, from South Asian languages like Hindi-Urdu and Tamil, from indigenous South American languages like Quechua, et cetera, et cetera. Such loanwords have become part of everyday English gradually; both the amount and the number of languages of origin have greatly increased since Old English times.

What is necessary is contact with those languages, which has occurred in various ways; exploration, trade, diplomacy, scholarship, military conquest and colonial expansion have each brought English into contact with the speakers (or writers, or signers) of different languages. Often a word has been introduced into English in tandem with the introduction into Anglophone cultures of the thing that the word denotes, such as the humble potato in the 16th century.

An illustration in a 1633 edition of John Gerard’s Herball or Generall Historie of Plantes of “Virginian Potatoes”. From here.

To illustrate the great number of languages that have helped to build the English lexicon of today, I’ve been set a challenge. More truthfully, I came up with the vague idea for the challenge, and then a friend readily challenged me to actually undertake it. The task is as follows:

To identify twenty-six languages, one for each letter of the English alphabet, that have been a source (directly or indirectly) of at least one ordinary English word!

Now this will be a list of languages and loanwords entirely of my own choosing. It’s subjective and personal; I only hope it is also accurate and interesting!

A given language does not need to be the ultimate known origin of a particular loanword to make the list, only a necessary link in the chain of its etymology. There is also no clear line for telling whether or not a word is a part of English, or of any language for that matter, so I’m working only off a crude threshold and a vague sense of everday-ness. What’s more, for many letters of the alphabet, there are several languages that start with that letter to pick from; my choice for G could have been Greek, German, Guarani or Gaulish. In such cases I’ve opted for one that I reckon will be more unexpected. Why not make it harder for myself, and avoid the obvious choices? It’s not going to surprise many people that English gets a word from Spanish; it may be more surprising to learn that English gets a word from Sami too. (It’s tundra.)

For each language, I’ve also chosen one loanword to highlight. Think of these as my personally recommend factlets – curated etymologies for you to bring out during your next dinner party/academic discussion/night at the pub/haircut.

So, here we go!

A is for: Arabic

To start off the list, I’ve chosen Arabic, a language of massive, international influence. Originating in Arabia, the language spread far and wide with the expansion of Islam, and today Arabic is the official and majority language of many countries – although the simple name ‘Arabic’ conceals a huge amount of linguistic diversity.

Many of the words that English gets from Arabic relate to the sciences, such as zenith, nadir and azimuth in astronomy, alcohol in chemistry, and zero, algorithm and algebra in maths. Others unsurprisingly concern Islam, like mosque, halal or Muslim. As three of these examples demonstrate, some words come into English with the Arabic definite article al– still attached, although it may be obscured (as in elixir) or even partial (as in lute). Arabic-speaking places may even give their name to products associated with them; this is the case with muslin cloth, named after Mosul, and tabby cats, derived from ʿAttābiy, a part of Baghdad renowned for its striped cloths and silks.

One word that I’d like to highlight though? I’m going with:


From French magasin, from Italian magazzino, from Arabic maḵzan ‘storeroom, storehouse’.

B is for: Basque

Basque, mostly spoken in northern Spain and south-west France, is fascinating. For one thing, Basque is an isolated language, alone and adrift in a sea of unrelated Indo-European, although it has both affected and been affected by nearby languages, including Spanish. It may perhaps be the origin of the fish anchovy, as well as the Spanish word vega ‘fertile lowland, plain’ behind the name Las Vegas. Likewise, jingo and jingoism have found a potential derivation in Basque jainko ‘god/God’, perhaps picked up by means of the Basque exclamation ala Jainkoa!, meaning ‘by God!’. I think though that we’re on firmer footing with the following:


From the name of Étienne de Silhouette, a French politician, itself likely from the Basque place-name Zilhoeta.
The location of the wider region of the Basque country, located in both Spain and France. From here.

C is for: Czech

If you know me, you’ll know I really had no option but to pick Czech for the letter C. This delightful Slavic language, of official and majority status in the Czech Republic, is the source of not only headaches for those who try to learn it, but also the English word pistol, the explosive Semtex and the Pilsner beer. These make it sound like a pretty wild country. My pick for one key loanword is even better known though:


From Czech robota ‘unpaid labour, serfdom’, from the Slavic root for ‘work’ (compare Russian rabotat’), created and popularised by the Čapek brothers through the 1920 play Rossum’s Universal Robots.
A poster for the first run of Rossum’s Universal Robots at the National Theatre in Prague back in January 1921. Although it’s a Czech poster, it attests to robot in English! From here.

D is for: Dutch

Dutch may be another Germanic language like English, but the divergence of the two over time means that we can spot words that have later been borrowed from the former into the latter. Some still reflect the linguistic processes of the original language; for instance, both cookie and mannequin go back to Dutch words with a diminutive suffix added on (literally, a cookie is a little baked thing, and a mannequin is a little man), while decoy may derive from the Dutch word kooi ‘cage’ and the definite article de. As a maritime power, many Dutch loanwords relate to ships and seafaring, such as yacht, dock, skipper and the cry avast!. In the same theme, there’s this:


From Dutch kruisen ‘to cross’, from kruis ‘cross’, ultimately from Latin crux.

E is for: Etruscan

Now humour me here. Etruscan is a non-Indo-European, now-defunct language of Italy, spoken in classical times until its eventual replacement by Latin. Etruscan culture was very prestigious and influential during the era of the Roman kingdom and republic, and hence Latin seems to have borrowed much from it – it’s the usual suspect suggested for Latin words that defy obvious derivation. Some of what Etruscan gave to Latin, Latin has in turn given to English, for which reason it makes my list.

For example, Latin harēna ‘sand’ and populus ‘people’ are two words for which Etruscan ancestry has been proposed, and from which English gets arena and people. The Etruscans are also behind the name of the beautiful region of Tuscany. Moreover, some Latin words appear to come from Greek, yet differ too much to be straightforward borrowings; hence, it may be that Etruscan was an intermediary language for some Greek terms, with Etruscan phonology and morphology putting their own spin on things. Lantern and triumph are two possible examples of such Etruscan middle-manning. One further potential case of this is this:


From Latin persōna ‘mask, character’, most likely from Etruscan phersu, perhaps itself from Greek.
A map showing the approximate locations of the native speakers of Etruscan prior to Roman expansion, along with Lemnian and Rhaetic, two apparently related languages! From here.

F is for: Finnish

Finnish belongs to the Finnic family of languages, and beyond that to the macro-family of Uralic. This means that it is unrelated (as far as we know) to Indo-European languages like English, but this hasn’t stopped one Finnish word from becoming globally popular:


From Finnish sauna ‘sauna’, from Proto-Finnic *sakna, for which a Germanic origin has been proposed.

G is for: Greenlandic

The letter G could be my chance to point out some German words, or to try to acknowledge the huge Hellenic heritage of Greek, but no, I’ve opted for Greenlandic. As the name suggests, this language is spoken across Greenland, with West Greenlandic as the taught standard. Known as Kalaallisut to its speakers, it’s a member of the Inuit language family. It has also, to my knowledge, given Engish one single word:


From Greenlandic annoraaq ‘clothing, coat’.

H is for: Hungarian

Although estranged by geography and language change, Hungarian is a cousin of Finnish and is another Uralic language. From its Central European epicentre, various Hungarian words have passed into neighbouring Slavic and Germanic languages. Some of the few in English are indeed known for their Hungarian history, like goulash (from gulyás ‘cowherd’) and biro (named after László Bíró), but others not so much, such as the sabre kind of sword, which makes for a cool Hungarian influence in Star Wars. An arguably more common word though is this one:


From French coche, from Hungarian kocsi, derived adjective for Kocs, a village today in the Komárom-Esztergom county of Hungary.
A very early depiction (from 1568 apparently) of a Hungarian coach by Jeremias Schemel. From here.

I is for: Irish

Irish, a Celtic language, has also contributed to English vocabulary, sometimes via Scottish Gaelic. Naturally, such words are more numerous in Hiberno-English, but appear in other varieties like British English too; slogan, whisky, smithereens and banshee are but four great examples. Trousers (that’s pants to our American listeners) comes from Irish triubhas. A Tory, typically someone who supports the British Conservative and Unionist Party, does not come from the clipping of conservative or somesuch word, but instead from Irish tóraí, meaning ‘outlaw’. Make of this what you will.

My choice for one Irish loanword to highlight not only reflects the form of the Irish original, but also something of its grammar, since it still follows in English the thing that it quantifies:


From the Irish phrase go leor ‘until ample, enough’.

J is for: Japanese

Many aspects of Japanese culture have found renown and popularity in Anglophone societies. Japanese of course gives us sushi, karaoke, bonsai, origami and karate. Many of these words are originally compounds; Japanese , meaning ‘gentle’, is the first element of both of the martial arts judo and jujutsu. Japanese too has loanwords that were themselves once compounds in another language, one of which has given English this:


From Japanese taikun ‘great lord, prince’, from Chinese dàjūn ‘great lord, emperor’, from ‘big’ and jūn ‘ruler, lord’.

K is for: Kannada

Kannada is a major language of south-western India. It belongs to the Dravidian family, making it a relative of Telugu, Tamil and Malayalam. A great deal of English words can be traced back to Dravidian, such candy and orange, but sadly I don’t (yet!) have the know-how to discern between them and to pinpoint which language of this large family may be a specific source. One word though for which the OED explicity mentions Kannada as a possible origin is this:


From Portuguese or Dutch, from Malay, possibly in turn a loanword from Kannada.

L is for: Latin

Latin is an obvious choice for L, as the Italic language has flooded English with its lexicon. Despite my mission to avoid the obvious, it does remain the only option that I can think of; I did have a potential loanword from Lakota, a language of the Siouan family, but I know too little about Native American languages to say anything confidently!

It’s nigh impossible to express in one paragraph the debt that English owes Latin. Its words have entered English both through direct borrowing and via French and other Romance languages; in fact, English vocabulary even includes a wealth of Latinate doublets – pairs of words from the same Latin ancestor that have come into English by different routes and thus have become different words. To name but a few, we have: poison and potion, count and compute, piety and pity, tradition and treason, camera and chamber, fragile and frail, senior and sir, rabies and rage, and abridge and abbreviate.

With such an influence, choosing one Latin loanword over all the others is very tricky! I’ve decided to go with an extremely early borrowing, one of the earliest in fact, which may have even happened before English (or what became it) first arrived in Britain.


From Old English, from a Germanic borrowing from Latin cāseus ‘cheese’.
A fragment of a Roman cheese press, found in Kent. From here.

M is for: Mandarin

Mandarin, also known as Mandarin Chinese, is today a global language, and is the official language and a lingua franca of China. In terms of its genealogy, it is one of the Sinitic family, themselves forming a branch of Sino-Tibetan. Along with Cantonese and Hokkien, Mandarin is a source of various Sinitic loanwords in English, such as the pinyin system of spelling, kung fu, soy and tofu (from Mandarin dòu ‘bean’ and ‘rot, spoil’, via Japanese). Chai is well known for being a doublet of tea, both words having a Sinitic origin, but with one coming via Mandarin and the other via Hokkien. The phrase long time no see may even have Chinese heritage, since the Mandarin expression hǎo jiǔ bù jiàn has the same meaning and composition, and would explain the unusual grammar of the English expression.

My pick for one notable borrowing, which I believe comes specifically from Mandarin, serves to illustrate how much the meaning of a word can change between languages, and how little it can end up having in common with the meaning of the source material!

gung ho

From Mandarin gōng hé, apparently meaning ‘work together’ but really the shortened form of gōngyè hézuòshè – Chinese industrial cooperatives!

N is for: Nahuatl

My choice for N is an easy one: Nahuatl, an Uto-Aztecan language, today predominantly spoken in central and southern Mexico. Thanks to this Mesoamerican language, English has culinary words like tomato, avocado, guacamole and chilli. Nahuatl loanwords such as these tend to show the influence of the intermediary language, Spanish. One further example that has taken the world by storm is:


From Spanish chocolate, from Nahuatl xocoatl, from xoco ‘bitter’ and atl ‘water’.

O is for: Occitan

The expansion of French within France has been at the expense of various other languages and of greater linguistic diversity. The older language of much of the south of France is Occitan, although really this is an umbrella term for numerous distinct varieties, such as Gascon and Provençal. The most famous common feature of Occitan is that its speakers traditionally say oc for ‘yes’ – hence the name!

Some Occitan words have travelled north and into Standard French, and a few from there into English. For instance, Occitan has played a part in the derivations of the north-westerly mistral winds, garlicky aioli and accolade, which comes from the unattested Latin verb *accollāre, originally meaning ‘to put something on/around (ad) the neck (collum)’. It’s also the supposed origin of the following:


From French charade, from Occitan charrada ‘chatter’, from charrar ‘to chat’.
Two maps of Occitan, one with a unified linguistic area, the other showing the different divisions within Occitan. From here.

P is for: Persian

Persian is yet another name that refers to a lot of language. It’s widely known as Fârsi by native speakers, and those varieties of it spoken in Afghanistan and Tajikistan may also be called Dari and Tajik. Due to the power and prestige of Persia, Persian has exerted an influence on other languages, including Hindi-Urdu and Arabic, through which its words have then reached English; items of clothing like the cummerbund, khakis and the shawl have travelled this way.

The ruler of Persia, the Shah, is famously the origin of chess, check and checkmate, while Persian also passed on the name of the rook chess-piece. Various commodities were also first acquired from or associated with Persia; lâjvard, a blue colour and stone, has led to both azure and lapis lazuli, just as the yâsaman flower is the source of jasmine, and peach comes from Latin mālum Persicum, the Persian apple. Yet there’s one loanword that charms me even more than these:


From Urdu, from Persian pâjâma ‘trousers’, from ‘leg’ and jâma ‘garment’.

Q is for: Quechua

Quechua, spoken in South America throughout the Andes mountain range, is really a language family, with Quechua I and Quechua II as the two major subgroups. It was the principal language of the Incan Empire, and its varieties considered altogether today constitute the largest minority language of Peru. Quechua is the source of animal names like condor and llama, of quinoa seeds, and of the fancy name for bird poop, guano. For me at least, the most surprising of English’s Quechua crew is:


From Spanish charqui, from Quechua ch’arki ‘dried meat’.
A map showing the rough distribution of Quechua (I and II) in South America. From here.

R is for: Romani

Yet again, Romani is one name that really embraces multiple languages. It is the language of the Roma, who live in communities across Europe, North and South America, and beyond. Consequently, Romani, which traces its genealogy back to the Indo-Aryan family and India, has been in contact with and influenced by a vast array of different languages. The relationship went the other way too; Romani words have also entered those languages. Romani in Britain has been suggested as the possible source of posh, as well as one very friendly word:


From Romani phal ‘brother, friend’, related to Sanskrit bhrātṛ ‘brother’.

S is for: Swedish

While Sanskrit is tempting, I did share some good Sanskrit loanwords two blogposts back, so my choice for S is Swedish. Some of English’s Swedish borrowings are commonly known to have come from that North Germanic language, such as smorgasbord or ombudsman, a word considerably more fun in its pronunciation than its meaning. There’s also dahlia plants, named after the Swedish botanist Anders Dahl.

My personal favourite is a chemical element. I like it in particular because it’s a technical term that is made up of two very ordinary Swedish words, and that the Swedish themselves don’t actually use (preferring Volfram for the element instead).


From Swedish tung ‘heavy’ and sten ‘stone’.

T is for: Tamil

As mentioned already, Tamil is another major Dravidian language, spoken in south-east India and Sri Lanka. The language has well over two thousand years of attestation, and has been in contact with Indo-Aryan languages for even longer. Its status has meant that a few words of Tamil origin have entered the English lexicon, such as pariah, catamaran and corundum minerals like sapphires and rubies, as well as a word much loved in Britain today:


From Tamil kari ‘sauce’.
A map to show the approximate location of Tamil first-language speakers in India and Sri Lanka. From here.

U is for: Umbrian

A language for U has turned out to be a challenge. The two U-languages that first come to mind are Urdu and Ukrainian, but identifying distinctly Ukrainian loanwords is beyond my skills and sources (perhaps balaclava?), as is finding Urdu words that don’t also have a connection to either Persian or Hindi. So, I have chosen Umbrian. Classicists, bear with me.

Umbrian was a language of Italy spoken up until its ousting by Latin towards the end of the first millennium BC. It’s a member of the Sabellic language family, and beyond that the larger Italic family, making it a cousin of Latin. The two had much in common, which must have facilitated bilingualism as the Roman state grew in influence and came to dominate the Italian peninsula. Consequently, some words in Latin may have an Umbrian origin, a fact that I am relying on to find a candidate for U.

According to the sound laws and theories of Indo-European languages, the Latin word for ‘cow’ should not begin with a B. However, it does: Latin has bōs, from which we get bovine. We do though expect it to begin with a B in Sabellic languages, such as Umbrian, and sure enough, our sources for Umbrian include words like bum for ‘cow’ (stop giggling). Perhaps therefore Latin’s bōs could be specifically an Umbrian influence? If so, this would in turn result in:


From Old French boef, from Latin bōs, a partial or full borrowing from Sabellic.

V is for: Vietnamese

Now this letter is difficult, perhaps the most difficult. I’m aware of only a few languages that start with V; only Vietnamese, Venda (a Bantu language) and Volapük (a constructed language) come to mind. From these, alas, the loanwords I have to offer are all still associated with Vietnam and Vietnamese culture, such as the dong currency or banh mi. My word to highlight is simply thus:


From Vietnamese phở, a noodle and meat soup.

W is for: Welsh

To the immediate west of England is the beautiful country of Wales. Such geographical proximity between nations and languages has meant that English and Welsh, a Celtic language, have influenced each other in various ways over the centuries. This includes some vocabulary; Welsh may be the source of flannel and, somewhat surprisingly, penguin. The latter seems to be a compound of Welsh pen ‘head’ and gwyn ‘white’, but how it came to be applied to famously southern-hemispherical birds is less clear. We are on surer ground though with another kind of animal:


From Welsh corgi, from cor ‘dwarf’ and ci ‘dog’.
A corgi puppy, because you deserve this for reading this far. From here.

X is for: Xhosa

Xhosa is the only language to my knowledge that’s spelled with an initial X in English; the XH digraph is Xhosa’s way of representing the aspirated click sound /kǁʰ/ in Latin letters. It’s a Bantu language and a cousin of Zulu within the smaller Nguni family, and it has official status in South Africa and Zimbabwe. No doubt Xhosa has given words to the Englishes of southern Africa, but loanwords used further afield are not particularly forthcoming. With my limited knowledge, it’s difficult to locate words that could not also be borrowings from Zulu.

For one, there’s ubuntu, an abstract noun referring to humanity, or rather the essence, interconnectedness or proper virtues of humanity, which derives at least from a Nguni language, such as Xhosa. Likewise, Bantu, which is used in the Anglophone world to refer to the aforementioned huge family of languages, may have come from the Xhosa word for ‘people’. With the same hesitancy, here’s my pick for one loanword to highlight:


From either Xhosa or Zulu imamba ‘black/green mamba snake’.

Y is for: Yiddish

Yiddish is, like English, a Germanic language by origin, though it shows influences of Hebrew and Slavic languages. These various ingredients of Yiddish are reflected in its loanwords; it’s the source of chutzpah for example, itself a word of Semitic origin, while glitzy and schmaltz are Germanic words that probably came into English via Yiddish. Perhaps its most famous contribution to English is the following:


From Yiddish beygl, itself from Old High German.

Z is for: Zulu

Representing Z, Zulu ends my humble list. It’s another Bantu language, whose speakers are concentrated in the west of what is now South Africa. As with Xhosa, I lack the knowledge and resources to pinpoint specifically Zulu loanwords in English. Impala, the species of antelope, is listed by the OED as derived from Zulu, as is my final loanword of note, which rose to global renown during the 2010 World Cup:


From Zulu -vuvuzela, from an imitative element and the frequentative suffix -zela.
A South Africa fan with a vuvuzela. From here.

So, that’s your lot! If you’ve kept reading this far, I thank you greatly for your attention and interest. I hope that you’ve enjoyed this month’s rather copious etymological content, and that you think I’ve successfully completed the challenge I was set, and have also demonstrated the enormous debt that the modern English lexicon owes to other peoples and their languages. Truly, to steal from John Donne, no language is an island.

