Week 3: The Starostin Perspective
March 21, 2025
The world of linguistic taxonomics can generally be divided into two opposing camps: the ‘lumpers’ and the ‘splitters.’ Lumpers generally try to propose large, expansive language families while splitters try to argue for more granular classifications. Within this warzone of academia, however, there exists a man who toes the line between these conflicting factions, a man by the name of Georgiy Starostin. Starostin’s father, Sergei Starostin, was also a linguist, one who firmly planted his feet among the ranks of lumpers. He was one of the primary proponents of the Altaic hypothesis, the concept that the Turkic, Mongolic, and Tungusic language families were all part of a larger, Altaic family. This is a theory widely considered fringe by modern linguists, though it remained quite popular for quite a long time. Georgiy is not unlike his father in his tendency to lump things together. He himself is a supporter of the Proto-World language hypothesis, the idea that since all modern humans descended from a common ancestor, that naturally all spoken languages must ultimately be derived from one Proto-World language. Georgiy does not, however, believe that Proto-World can be proven within our present research capabilities. The idea is speculative, adored by many lumpers for its global connectivity but not in any way provable even by those operating on the fringe.
It would naturally seem to many that, being the lumper that he is, Georgiy Starostin would be attracted to the Nilo-Saharan theory, a concept developed by a fellow lumper which has since rocketed its way into the mainstream. Though, the interesting thing about Georgiy Starostin, is that while he is a lumper on the large scale, he can be quite the splitter in the realm of the minute. He may support the concept of a Proto-World language, but he would also like to ensure that one is constructing it correctly.
In order to analyze the validity of the Nilo-Saharan family, Starostin utilized a process known as ‘lexicostatistics.’ This process relies on a linguistic tool known as the ‘Swadesh List,’ a list of 207 words composed by Morris Swadesh which are common enough to have an equivalent synonym in effectively every spoken language. Words in the Swadesh list tend to be some of the most commonly spoken words in a language, including simple nouns, such as ‘eye,’ simple verbs, such as ‘die,’ words used in questions, such as ‘what’ and ‘who,’ as well as proximal words, like ‘this’ and ‘that.’ Since all of these words are very commonly spoken, it is expected that these words will not have arrived into a language as a result of borrowing from another language, since linguistic borrowing often occurs when one language lacks a preexisting word for a given concept, but since all of the words in the Swadesh list are very simple, there isn’t much of a reason for borrowing. This makes the Swadesh list very useful for comparing languages, because if two languages show similarities in vocabulary on their equivalent Swadesh lists, it means that both languages likely derived from an older language. Occasionally, a language might have many similarities to another in its general vocabulary but show effectively zero similarities within the Swadesh list, like how Hungarian has many Latin influences but has a base vocabulary which is entirely different from Latin’s. This difference in base vocabulary proves that the two languages do not share a language family, as Hungarian is a Uralic language which has been historically influenced by Latin due to the spread of Catholicism.
Lexicostatistics uses the Swadesh list to compare large swaths of possibly related languages. For example, if a linguist notices that a significant number of the words in the Swadesh list of Language A have a similar phonemic structure to their equivalent words in the Swadesh list of Language B, they might come to the conclusion that both languages evolved from a Proto-Language AB, and that the phonetically similar words are ultimately cognates derived from the vocabulary of Proto-Language AB. Further, if Language A and Language B lack a significant number of phonetically similar Swadesh words, they may still be related if both languages have phonetically similar Swadesh words in Language C. Oftentimes, different languages will change at different rates, so while Language A and B may have diverged quite significantly over time and no longer show their similarities with one another, they may still show similarities with the more conservative counterpart of Language C, which still bears many features of the Proto-Language ABC which all three languages derived from. Generally, linguists will avoid comparing just two languages with lexicostatistics, preferring to analyze a larger number, as occasionally there will be specific languages, such as the theoretical Language C, which can be very useful in connecting otherwise disparate languages.
In his lexicostatistical analysis, Georgiy Starostin approaches the Nilo-Saharan phylum with a multi-pronged methodology. First, partly as a proof of concept, he used lexicostatistics to back the already well-supported individual subfamilies of Nilo-Saharan. By doing so, he could prove the validity of his strategy, which has also in the past been used to connect the disparate families of Indo-European. He then lays out comparative evidence for the validity of four specific families: Saharan, Koman-Gumuz, which he chooses to classify as a singular group though one with many differences, Central Sudanic, which he groups together with the minor Kreish and Sinyar families, and Eastern Sudanic, which he argues is a united family.
However, Starostin has some difficulty in connecting these various families to one another through Swadesh-backed lexicostatistical methods. After developing the connected vocabularies of these families into proto-vocabularies, he finds effectively zero solidly potential cognates between Saharan and Koman-Gumuz with Eastern and Central Sudanic and only finds a small and tenuous connection between some of the words in Eastern Sudanic and Central Sudanic. The words for ‘ashes,’ ‘name,’ ‘two,’ and ‘who,’ all share phonetic similarities across the two families, but Starostin suspects that any relationship between the two families would require them to diverge at least 12,000 years ago. Otherwise, he asserts that Saharan and Koman-Gumuz should be separated from Nilo-Saharan under the present state of research.
Finally, Starostin tried to see if he could use lexicostatistics to group any of Nilo-Saharan’s smaller families into the four families he established. He suspected that Fur, Kadu, and Berta, which is often considered an isolate, could fit into the Eastern Sudanic family, while also suspecting that Maban and Kunama could hold a relation, and that the family they hold together could be grouped into Central Sudanic. However, Songhay, Kuliak, and little Shabo were unable to be sorted alongside any of the preexisting families within Starostin’s model.
Ultimately, Starostin proposes a more conservative model of Nilo-Saharan, which at its maximum, contains the established families of East Sudanic, Central Sudanic, Fur, Kadu, Berta, Kreish, Sinyar, Kunama, and Maban, while excluding the families of Koman, Gumuz, Saharan, Songhay, Kuliak, and Shabo.
Leave a Reply
You must be logged in to post a comment.