(Hopefully) Halfway to Doctorhood

Reading time: 15-20 minutes

Passing a milestone on a long journey, be it a metaphorical or a literal one, is always a moment for reflection. Recently, after fifteen months, I crawled over the finish line of a large part of my PhD project, and promptly ran off to the pub.

The beer of victory.

The morning after, I awoke to an awful feeling – not a hangover, but rather the awful lightness that comes with being untethered from one’s commitments. Every day for the past year, the same task has been lodged at the back of my brain, a programme constantly computing, but now… what?

The way forward is not my decision to make alone. This week I’ve been discussing with my supervisors how to proceed and how to make use of my completed work. Until a plan is in place, I have time to look back as well as forward, to consider fully how far I have travelled. To that end, I thought I might put down in blog-form an overview of what I’ve been up to so far. The topic of my doctoral project isn’t something I often discuss publicly, as it needs considered and considerable explaining, but many people have been curious and kind enough to ask about it.

If you’re one of those curious people, then this blog post is for you. Here’s where I’m at and where I’ve been on this PhD journey. This is an introduction to my studies, findings and feelings at this turning point on the long road to doctorhood. It’s just a personal piece for anyone who may be interested; normal service will resume next month.

The Big Idea

The shape of a PhD programme differs from country to country. In the UK, doctorates in a humanities subject tend to be around three or four years in duration, and are pretty much unstructured. You attend only what courses you want to attend, teach as much as you want to teach, and work to the beat of your own drum. One necessary step is to apply to a university with a proposal. This contains an idea for a project that you wish to undertake, which will culminate in a dissertation of original research. Once you’re in though, you don’t have to stick to the proposal; most PhD-ers I know diverged drastically from what they first set out to do. For me at least, I have stuck pretty close to the proposal. That’s neither a good nor a bad thing; the idea has just worked out so far.

In a nutshell, that idea is all to do with very, very old grammar. In a slightly larger nutshell, it is about the question of reconstructing the grammar of a language so old that it predates the written record, and is only presumed to have once existed.

Linguistic reconstruction is an idea that’s been around for a couple of centuries now. It relies on the premise that the languages we know of, modern and ancient, did not emerge out of nothing at the time when our records begin, but rather have a long, undocumented prehistory. For example, our sources for the English language may begin in the fifth century AD, but we shouldn’t believe that English sprang into life in that century. There are generations of its speakers and forms of pre-English English that are now almost lost to time.

Almost lost. This is where reconstruction steps in. It involves the proposing of prehistoric stages of a language’s history and specific features of that earlier language. Reconstruction works through comparison. By thoroughly comparing languages that are so similar in so many ways that they must originate in one common source, we can theorise something of what that common source looked like.

We can do this with people; if you meet a set of siblings and cousins, by comparing common features of their appearance, you can guess reasonably that their parents and grandparents possessed them too, despite not having seen the older generations.

Perhaps the most famous language that has resulted from reconstruction is Proto-Indo-European. This is the hypothetical single origin of a geographically grand spread of languages, including English, Irish, French, Polish, Greek, Persian, Nepali and myriads more. It’s sounds and vocabulary that have led the charge in reconstructing Proto-Indo-European; we’ve been able to formulate a sizeable lexicon for this prehistoric language, relating to family, society and the natural world, while honing in on the system of sounds it must have had.

The spread of the Indo-European language family in Europe and Asia. From here.

But what about its grammar? What about its word order, its syntax, the rules by which its vocabulary was combined into longer strings of words and full sentences were formed? The reconstruction of syntax has lagged behind the reconstruction of other aspects of language, and there’s been a great sense of pessimism around whether it is even possible. Sounds and words have a tangible concreteness to them that the ‘invisible’ operations of word order do not.

It depends ultimately on what you think syntax actually is. Does it exist as rules somewhere in our mind? How do we identify and describe these rules? And once we have done so, how do we compare them across languages? It’s long been said that Proto-Indo-European had a basic word order of Subject-Object-Verb, based on what we see in its oldest daughter languages, like Ancient Greek and Sanskrit. Aside from the tricky issue of whether that assessment of the later languages is even accurate, the fact remains that many unrelated languages seem to have a default SOV order too. What if it’s all just coincidence? What, if anything, can we reasonably say about the syntax of Proto-Indo-European?

A handful of people have dedicated themselves to answering these questions. I have the great luck to be one of them.

The Big Plan

The ultimate goal is to put the idea of syntactic reconstruction to the test, and to propose a plausible reconstruction of parts of Proto-Indo-European grammar. I am studying the word-order patterns of both the clause and the noun phrase. However, studying these two key domains of grammar in their entirety would be a gargantuan task, so my focus is limited to the start of the clause and the noun phrase – what’s conventionally referred to as their left periphery.

My understanding of what syntax and word order actually are is rooted in generative grammar. Yes, I know, this is but one school of thought in the study of syntax today, but I believe that its insights are especially useful for the business of language reconstruction. In this foundation, my work is profoundly indebted to the work and ideas of George Walkden, who quite literally wrote the book on how generative syntax may be used to reconstruct prehistoric grammar.

To attempt that reconstruction, I need to have some ancient languages to compare. The first step therefore was to identify a set of Indo-European languages (the descendants of PIE) that I could study. To make the cut, the languages had to be:

  • A member of the Indo-European family
  • Ancient, or at least very old, working on the assumption that older languages preserve more of an ancestral stage.
  • Not too difficult to study, in terms of understanding the primary data and accessing the previous scholarship written about it.
  • From a different branch of the family tree as the other languages, to try to include a broad spread.

With this in mind, my supervisors and I settled on a set of seven languages. They are, in order of study:

  • Classical Latin
  • Ancient Greek
  • Vedic Sanskrit
  • Old Church Slavonic
  • Old English and Old Norse (Yes, I know, these are two Germanic languages. I have my reasons)
  • Old Irish

These are my seven darlings, and studying them is what I have been up to for the last fifteen months. To people familiar with Indo-European studies, there may be surprises here, some notable absentees. Hittite for example is a very big deal in the field, but my supervisors and I agreed early on that including Hittite would be too complicated. My personal knowledge of Hittite is, safe to say, a bit too basic at present, but it remains a language I want to return to in future. I was also banned, for my own good, from including Classical Armenian.

For each language, I have worked through both the source material and previous scholarship to build a full picture of the patterns of their word order. Having grasped this, I have then striven to identify elements of their ‘underlying’ syntax. By this, I mean that all the reading, researching and arguing have permitted me to propose a single, stable abstract structure behind the word orders which could be responsible for their arrangement in each language. I have delved into Cicero and Caesar to dig up the structure of their Latin, hounded Herodotus for the grammar of Greek and raided the Ṛgveda for Sanskrit syntax.

Now I am left with fourteen abstract structures, an in-depth knowledge of these seven languages, about two hundred pages of writing, and a permanent pain in my right wrist. The matter now at hand is turning all of this into a PhD thesis.

Photograph of me as of February 2023 .

Some Small Thoughts

I am thrilled to say that even by the third of these languages, similarities and a particular pattern had already started to emerge. Investigating this pattern further and strengthening my explanation for it will surely be a major task of the next part of the PhD. Until that’s finished, I’ll be keeping my cards close to my chest and seeing if this nascent theory holds up. In the meantime, I’d like to finish with some opinions that come first to mind at this doctoral milestone.

1: Working with historical data has its own challenges

As in any scientific endeavour, linguistics needs access to data in order to make claims and theories about language. Obtaining data is fraught with difficulties; the mere fact of investigating people’s language can distort the very thing we aim to study, a problem noted by William Labov in his observer’s paradox. I feel though that the complications with studying living languages are better understood and respected than those specific to historical languages. Naming no names, I have come across work by respected and talented linguists that overlooks crucial qualities and complicating factors in the historical evidence for their language of study. This has in turn led to unfounded and sometimes outrageous claims.

When our linguistic evidence comes via historical texts, not directly from people, we scholars are separated from a language ‘as it really was’ by a barrier of documentation. As any historian will tell you, a historical document can’t be taken at face value. First, you must enquire into the purpose, the date, the genre, the motivation, the context and the author of the text. What were the author’s motivations in creating this thing? In what tradition were they writing, and what other texts did they hope to emulate? Immersion in the world of a text is the key first step to using it as a source for language. I am pleased at least to have developed a healthy sense of caution, a suspicion even, of historical sources, and I am very keen to promote this sense among other linguists.

2: Concluding that little or nothing can be said is still a conclusion

For one of my languages in particular, my analysis of the available evidence culminated in a very limited conclusion. This anti-conclusion was namely that very little can be confidently claimed about the syntax of Old Church Slavonic. This language is the oldest documented member of the Slavic family, known to us from a handful of medieval texts. Its antiquity means it is highly prized by linguists, and rightly so, but in the matter of its word order, its authors stuck far too close to the Greek that they were translating. The two languages mirror each other closely, often word for word.

To claim anything about the ‘native’ grammar, you therefore have to hunt out and hone in on the rare disparities, and make theoretical mountains out of historical molehills. Old Church Slavonic verbs for instance cannot match the complex verbal endings of Greek, and so small yet separate words are brought in to make up the difference. For instance, the reflexive pronoun (‘oneself’) is used to translate Greek passive verbs and, importantly for my purposes, is fussy about word order.

Since Old Church Slavic offers us so little sure evidence of its grammar directly, teasing out the native Slavic from underneath the Greek influence led me to some brain-aching levels of abstraction. I accept as a basic assumption that Old Church Slavonic translators, while striving to follow the Greek, never went beyond the boundaries of grammatical language. If they had done so, producing Greek-like gibberish, it would have defeated the purpose of translation. It follows therefore that the word-order patterns we see, while determined by Greek, were still permitted by Old Church Slavonic. That is to say, Greek could influence what Slavic syntax readily allowed it to influence.

This I thought was a mature and solid conclusion to reach. What we can know about a historical language can come to us both directly and indirectly. Other linguists, I’ve been delighted to find, have been interested in my conclusion.

The title page of the Gospel of Mark in the Codex Zographensis, from here. The language is Old Church Slavonic, the script is Glagolitic. I’ve worked with this text so much it feels like an old friend.
3: Different components of language differ in their usefulness from language to language

At the outset, I made the decision to cast the net wide and study a wide array of word-order phenomena. So, in the terms of the noun phrase, for each language I have looked at the behaviour of all the things you may find in a noun phrase: adjectives, dependent genitive phrases, prepositional phrases, numerals, determiners, quantifiers and the noun itself. What has surprised me is how one of these components may be uninteresting in one of the seven languages, yet vital in another. For instance, describing dependent genitives in Latin (e.g. femina urbis magnae ‘a woman of the big city‘) made for pretty dull work, yet for Sanskrit, I barely wrote about anything else in my review of its noun phrases. Specifically, genitive pronouns in Sanskrit (such as nas ‘of us’) have a rare stability among the word-order chaos that I gratefully latched onto in order to say anything at all about that language’s syntax. This makes me feel justified in adopting a wide scope of enquiry, despite what it has done to the word count.

4: Old Irish is awesome

Old Irish loomed large on the horizon ever since we first selected the set of seven, as the representative of the Celtic languages. It is a famously fiendish language, and on the surface seems bizarre and unique. Yet appearances are deceiving; so much of the linguistic jungle that we observe arises from reasonable and familiar processes, as I have previously attempted to demonstrate. Its word order is no different. So much becomes clear with a little digging into the history of the language, and such linguistic archaeology is good fun. So I must recant any past denigration of Old Irish. I am a complete convert.

Part of a parchment page of Cod. Sang. 904, p. 176, from here. An Irish scribe has written in Old Irish uit mo chrob (‘alas, my hand!’). With sources like this, it’s hard not to be charmed.
5: Just finding examples in the sources takes up as much time as analysing and writing about them

Oh my God, why does it take so long?

6: This PhD means everything to me

Finally, a simple and personal thought: this PhD is still very much a dream come true. I will forever be indebted to the University of Edinburgh for making this project possible, and for allowing me the resources to immerse myself in what I love and get to know it so well. What will come after, I simply do not know, and I have little confidence in any one post-PhD plan. Yet the morrow shall take thought for the things of itself. Today at least, I have this degree to do and an amazing opportunity. Good God, I just love language so much.


Cover picture: the path up to Ben Vrackie, in Perthshire, Scotland, masterfully edited by me.

4 thoughts on “(Hopefully) Halfway to Doctorhood

  1. Hi Danny, cool stuff. I’ve been following your work for awhile & I’m really in awe of your research. I wanted to ask: if you’re going for a Germanic language, why not go for Gothic?

    Liked by 1 person

    1. Hi there! Thank you so much for this! You ask a great question, one I will have to answer fully in the finished thesis too. In a nutshell, it’s to do with language interference and influence. Gothic is another old language for which our sources are nearly all translations of texts in another language, namely Greek, and very close translations at that. So, while Gothic is superb for its antiquity and for what its words can tell us, the spectre of Greek hangs over the ordering of those words. This makes the syntactician’s job very difficult. There is a lively debate over the extent of the Greek influence, and a lot does seem to be genuinely Gothic, but it’s a quagmire of scholarship.

      Meanwhile, Old English and Old Norse offer original works in those languages, largely removing the issue of interference. They’re also extremely interesting in terms of their syntax, with phenomena like the Verb Second ‘rule’ and definite articles. So, I had to weigh up the pros and cons of the early Germanic languages, and I reckoned that studying OE and ON comes out best.

      Hope this makes sense and answers your query? Thank you for the question!


  2. Hi Danny, thanks for writing these blogs! As a linguistic “layperson,” I find them very interesting and edifying. My question is whether, in addition to studying the seven Indo-European languages, do you need to have a basic understanding of syntax in several non-Indo-European languages as well, in order to better understand what the Indo-European languages have in common?

    Liked by 1 person

    1. Hi Irene! Thank you for the thanks, and for the interesting question. In short, no, I don’t believe that such understanding is strictly necessary to carry out this PhD project. In less short, no but also yes, because my work draws on a huge established toolkit of theories and concepts, which other scholars have come up with and developed on the basis of many languages, not just Indo-European ones. So, while I don’t need to study a few non-Indo-European languages for comparison (which I do regardless, but for fun), everything I might theorise is part of a framework of ideas that has many sources and aims to be language-neutral. I hope this makes sense?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: