Speech Surrogates: An Overview

A speech surrogate (or surrogate speech/language) is a system in which the content of a (usually) spoken language is transferred onto some non-linguistic form.

Let’s break that definition down a bit. First, “non-linguistic form”. Human languages come in two varieties: spoken and signed. These are systems of communication that, with sufficient input, we learn as children growing up in our communities without the need for explicit instruction. (Contrary to popular belief, parents do not “teach” their children to speak.) Literate individuals may think of the written form as inherently linguistic as well, but in fact, writing was only independently invented three times in the course of human history, and most languages remain purely oral (i.e. unwritten) to this day.

By this definition, then, writing is a speech surrogate, since there is nothing inherently linguistic about little marks on paper, tablets, or screens, and yet we use these marks to represent our speech. If you are reading this, then congratulations, you are a proficient user of a speech surrogate!

Sign language, on the other hand, is not a speech surrogate. Signing is an inherently linguistic modality, equivalent in every meaningful way to speech, and further, sign languages like ASL are not the signed equivalent of English or another spoken language. That is, they aren’t “encoding” English through sign. Rather, they are fully-fledged linguistic systems in their own right with their own vocabulary, grammar, and conventions of use. Nevertheless, there do exist some “secondary sign systems”, wherein spoken languages are translated into a series of signs or gestures; these will not be our focus here.

Second, in the definition above, the term “content” of language was carefully chosen. This is because speech surrogates differ in what kind of linguistic information is encoded. Some systems encode the sounds of the spoken language, while others encode concepts directly. Continuing with the familiar example of writing systems, the English alphabet encodes the sounds of English (imperfectly in a number of ways), such that you could pronounce a made-up word like “framp” even though don’t know what it means, whereas a written system like Chinese characters encodes concepts with little reference to the way those words are pronounced (so the character 猫, meaning ‘cat’, could be pronounced as māo in Mandarin Chinese but neko in Japanese). Systems that encode sounds of the language have been referred to in the literature as “abridging” systems (Stern 1957; alternatively, “emulated speech”, Seifart et al. 2018), while those that encode concepts directly have been referred to as “lexical ideogram” systems.

Finally, our definition refers to “(usually) spoken language”. This is simply because, to our knowledge, surrogate systems are far more common for spoken languages than signed languages. The kinds of surrogate systems we focus on here are usually auditory systems, and sign used by deaf communities is, of course, unlikely to be encoded in this way. There are, however, some writing systems developed for sign, which could be considered “sign surrogates”, but writing will not be our focus here.

This site concentrates primarily on musical surrogate speech, the encoding of speech through musical instruments. On percussive instruments, these systems have often been referred to as “talking drums”, though musical surrogates are found on a diverse range of instruments all around the world. Closely related to musical surrogate speech, we also find whistled surrogates. We will touch on these from time to time as they relate to musical surrogate speech, though we welcome guest contributions on the topic.

The blog will certainly expand on various parts of this definition and on the range of speech surrogates we find around the world, so stay tuned for more!