A while ago, this blog had a discussion of short English words that have no rhymes. We’ve all heard of “purple” (which, in fact, rhymes with the esoteric but real word hirple) and “orange” in this context, but there are others. This seems a bit odd, which I guess is why some of these words are famous for having no rhyme. Naively, and maybe not so naively, one might expect that at least some new words would be created to take advantage of the implied gaps in the gamut of two-syllable words. Is there something that prevents new coinages from filling the gaps? Why do we have blogs and and vegans and wikis and pixels and ipods, but not merkles and rilvers and gurples?
I have a hypothesis, which is more in the line of idle speculation. Perhaps some combinations are automatically disfavored because they interfere with rapid processing of the spoken language. I need to digress for just a moment to mention a fact that supposedly baffled early workers in speech interpretation technology: in spoken language, there are no pauses or gaps between words. If you say a typical sentence — let’s take the previous sentence for example — and then play it back really slowly, or look at the sound waveform on the screen, you will find that there are no gaps between most of the words. It’s not “I — need — to — digress,” it’s “Ineed todigressforjusta momento…” Indeed, unless you make a special effort to enunciate clearly, you may well use the final “t” in “moment” as the “t” in “to”: most people wouldn’t say the t twice. But with all of these words strung together, how is it that our minds are able to separate and interpret them, and in fact to do this unconsciously most of the time, to the extent that we feel like we are hearing separate words?
My thought — and, as I said, it is pure speculation — is that perhaps there is an element of “prefix coding” in spoken language, or at least in spoken English (but presumably others too). “Prefix coding” is the assignment of a code such that no symbol in the code is the start (prefix) of another symbol in the code. Hmm, that sentence only means something if you already know what it means. Try this. Suppose I want to compose a language based on only two syllables, “ba” and “fee”. Using a prefix code, it’s possible to come up with a rule for words in this language, such that I can always tell where one word stops and another word ends, even with no gaps between words. (“Huffman coding” provides the most famous way of doing this.) For instance, suppose I have words bababa, babafee, feeba, bafee, and feefeefee. No matter how I string these together, it turns out there is only one possible breakdown into words: babafeefeebabafeefeefeefeefeebabababa can only be parsed one way, so there’s no need for word breaks. In fact, as soon as you reach the end of one word, you know you have done so; no need to “go backwards” from later in the message, to try out alternative parses.
English doesn’t quite work like this. For example, the syllable string see-thuh-car-go-on-the-ship can be interpreted as “see the cargo on the ship” or “see the car go on the ship”. But it took me several tries to come up with that example! To a remarkable degree, you don’t need pauses between the words, especially if the sentence also has to make sense.
So, maybe words that rhymes with “circle” or “empty” are disfavored because they would interfere with a quasi-”prefix coding” character of the language? Suppose there were a word “turple” for example. It would start with a “tur” sound, which is one of the more common terminal sounds in English (center, mentor, enter, renter, rater, later…). A string of syllables that contains “blah-blah-en-tur-ple-blah” could be split more than one place…maybe that’s a problem. Of course, you’ll say “but there are other words that start with “tur”, why don’t those cause a problem, why just “turple”? But there aren’t all that many other common “tur” words — surprisingly few, actually — turn, term, terminal. “Turple” would be the worst, when it comes to parsing, because its second syllable — pul — is a common starting syllable in rapidly spoken English (where many pl words, like please and plus and play, start with an approximation of the sound).
So…perhaps I’m proposing nonsense, or perhaps I’m saying something that has been known to linguists forever, but that’s my proposal: some short words tend to evolve out of the language because they interfere with our spoken language interpretation.