Laura McPherson
Keywords: Africa, Burkina Faso, Balafon, Linguistics, Phonology, Syllable structure, Phonetics
Much of my work on the Sambla balafon surrogate language—and surrogate languages more generally—has focused on the encoding of tone. It is true that this carries the highest functional load in the surrogate system, and that individual consonants and vowels are not encoded, but the segmental skeleton of syllables does play a role. In this post, I will try to tease apart which aspects of syllable structure are categorical, which are gradient, and which are variable. I’ll then consider why this might be the case and what it can tell us about the organization of phonological structure.
As context for this blog post, it is important to know a little bit about Sambla syllable structure. The maximal syllable shape is as in (1), with optional elements in parentheses:
(1) (Cə)(C)V(V)(ː)(N)
In other words, the only obligatory element in a syllable is a single vowel nucleus V. However, almost all Sambla syllables (outside of a few pronouns) have onsets, making CV far more common. The nucleus can be simplex (a monophthong, as in CV) or complex (a diphthong, as in CVV). Both monophthongs and diphthongs can be either short or long (CV, CVː, CVV, CVVː). Syllables can be “sesquisyllabic”, meaning there is a short half syllable before the full syllable (CəCV, CəCVː, CəCVV, etc.). Finally, the only possible coda is a nasal whose place of articulation is non-contrastive and which is subject to much variability in its realization (McPherson 2020a, 2020b).
The following bullet points summarize how these different aspects of syllable structure (and segmental structure more broadly) are encoded, with cases discussed more in depth under the headings of Categorical, Gradient, and Variable.
· Vowel quality: Not encoded
· Diphthongs: Not encoded
· Vowel length: Categorical/gradient
· Consonant type: Not encoded
· Sesquisyllables: Categorical
· Coda nasals: Variable
Categorical
As the list lays out above, both vowel length and sesquisyllabicity are categorically encoded. In fact, both are categorically encoded in the same way, resulting in ambiguity. For vowel length, if the vowel is short, it is encoded with a single strike; if it is long, it is encoded with a double strike, otherwise known as a flam. This is a categorical opposition: one strike or two.
The same holds for sesquisyllabicity. If the syllable is a regular CV syllable (i.e. non-sesquisyllabic), it is encoded with a single strike. If it is sesquisyllabic (CəCV), it is encoded with a flam. One strike vs. two.
Note that this is a binary opposition: Long vowels result in two strikes, and so do sesquisyllables, but a syllable type with both complexities like CəCVː still falls into the two-strike category. Likewise, the presence of any one complexity (like sesquisyllabicity) will put the syllable into the two-strike category. That is, even though CəCV has a short vowel, it is encoded with two strikes due to its sesquisyllabic nature.
While it is unsurprising that individual vowel and consonant qualities are not distinguished in the balafon surrogate system, it is somewhat surprising that diphthongs are not encoded differently from monophthongs. After all, this is typically understood to be an aspect of syllable structure (simplex vs. complex nucleus), yet it is the only aspect to have no influence in surrogate encoding. It may be that because vowel length is contrastive even among diphthongs, length takes precedent in categorizing the syllable as involving one strike or two.
A final note here: The categorical contrast between one strike and a flam is only operable with a level-toned syllable, since the flam in that case is on just one note/key of the balafon. If the syllable carries a contour tone, regardless of its length, the two tones of the contour are obligatorily realized as a flam.
Variability
Interestingly, while the encoding of vowel length and sesquisyllabicity is 100% consistent, nasal codas are subject to variation between the two categorical realizations (one strike or two). That is, any given token is categorically one or the other, but the very same syllables can be played either way, in seeming free variation.
To give an example, consider three back-to-back tokens of a request for 10,000 francs (wɛ̋n dán sóen). In the first, dán ‘10,000’ is played with just a single strike, while wɛ̋n ‘money’ and sóen ‘one’ are both played with flams.

The musician then repeats this amount twice more (given my lack of comprehension the first time!). In both of these repetitions, dán is played with a flam.


What is going on? Could it be just an error that’s been corrected? While certainly possible, I have never seen an “error” of this kind for the encoding of long vowels. And we find similar variation for other words with a nasal coda, such as sóen ‘one’ played on a single strike as opposed to the flams illustrated above.
So why should this aspect of syllable structure be subject to variation? It turns out that nasal codas in Seenku are weak phonological elements, whose realization is variable even in the spoken language. In phrase-final position, they can be as weakly articulated as to be just late nasalization on the preceding vowel (possibly with a weak tongue body gesture, like a nasal approximant). Before the approximants /l/ and /w/, they can be realized on that following consonant, nasalizing them to [n] and [m]. Before nasals, they disappear completely, perhaps being absorbed into the following nasal consonant. It is only before obstruents, and particularly plosives, that they are realized as true nasal stops. See McPherson (2020b) for further discussion and examples.
Given this weak status and variation in the spoken language, it is little surprise that musicians also vary in their encoding. More data are required to see if phonological context affects the surrogate realization; for example, are nasal codas more likely to be encoded as a flam in an environment where they are fully realized, i.e. before obstruents? Preliminary data suggests that context may play a role. For instance, nasal codas are found after both oral and nasal vowels in Seenku; after nasal vowels, the coda is typically only audible when it affects a following consonant (its nasality otherwise blending with that of the vowel). Consequently, on the balafon, these syllables (e.g. sȁ̰n ‘buy’ or dzḭ̏n ‘put’) are almost exclusively played with one strike, the one exception being when a vowel-initial question marker âa was played immediately after.
This raises interesting questions about the level at which segmental material is encoded. As I’ve written elsewhere, tone seems to be encoded at a deeper level, before postlexical processes, but the same does not seem to hold for syllable structure. If we take the phonology to be a single component regardless of the kind of phonological elements (segments or prosody), how can this be the case?
Gradience
So far all of the encodings discussed have been categorical, even if there is variation between the two categories. However, it appears that gradience can also play a role in segmental encoding. Clearly, with fixed keys, there is no gradience in pitch or notes on the instrument, but durations can be gradient. In particular, I’ll be looking here at the inter-strike interval, the amount of time (in milliseconds) between two note strikes.
We saw above that vowel length is a categorical contrast on the balafon—one strike for a short vowel, two strikes for a long vowel—but that this contrast is obscured by contour tones, since these are also encoded with two strikes regardless of vowel length. It turns out that vowel length may still be encoded in this case, but subcategorically.

If we measure the total duration of the flams, i.e. the sum of the inter-strike intervals from the first strike of the flam to the first strike of the following syllable, the mean duration of long vowels is significantly different from either short vowels or sesquisyllables (which do not differ significantly from each other).
In other words, even though from a categorical perspective, short vowels and long vowels with contours are encoded in the same way (and a musician may tell you that they are ambiguous, or a certain phrase played on the balafon could represent either a word like kâ or kâa), from a gradient perspective there are still differences; the length contrast is (subconsciously?) preserved.
A quick and dirty comparison of total duration of (short) monophthongs and diphthongs shows that diphthongs are longer (221ms compared to 199ms on average), which is not quite statistically significant (p = .11) but trending that way. It should be noted that the sample size for diphthongs is currently quite small (n = 11).
In both of these cases, we are dealing with a contrastive phonological feature (vowel length or nucleus complexity) encoded gradiently on the balafon. An open question, and the focus of future work, is whether non-contrastive length differences are also encoded in this way. For example, rising tones are cross-linguistically longer than falling tones due to the laryngeal effort involved in raising vs. lowering pitch. Would such a length contrast be encoded on the balafon, even though the mechanism for lengthening (the larynx) is not involved? If so, this suggests that the encoding of duration is tied to (oral) surface phonetics rather than tapping into the phonological representation. Alternatively, it would mean that phonological representations contain phonetic information, along the lines of what might be predicted by an exemplar model.
Closing thoughts
It is one thing to describe the correlations between encoding strategies and linguistic structure, but to me a bigger question is: What are musicians doing while encoding language onto an instrument? What are they tapping into? Which decisions are conscious and which are subconscious?
For the Sambla balafon, syllable structure appears to be a good place to look for evidence in answering these questions. We might expect phonemic contrasts to be consciously encoded in a categorical way, and to a certain extent that is true (for contrastive vowel length, sesquisyllabic structure, or nasal codas, when they are encoded). But it is clear that subtleties of phonetic realization also play a role, with some of those like vowel length clearly tied to a contrastive feature. What interface ties these details of phonetic realization to the hands rather than the vocal apparatus?
Balafon musicians report humming the words along as they play, suggesting that the vocal tract is to some extent activated and that signal may control the hands. Even without this activation, the hands may simply reflect “inner speech”, and as such, systems like speech surrogacy stand to shed light on how much phonetic detail is part of that inner speech or how much we have internalized durational differences that may arise largely from articulatory properties.
Leave a Reply