A while ago, this blog had a discussion of short English words that have no rhymes. We’ve all heard of “purple” (which, in fact, rhymes with the esoteric but real word hirple) and “orange” in this context, but there are others. This seems a bit odd, which I guess is why some of these words are famous for having no rhyme. Naively, and maybe not so naively, one might expect that at least some new words would be created to take advantage of the implied gaps in the gamut of two-syllable words. Is there something that prevents new coinages from filling the gaps? Why do we have blogs and and vegans and wikis and pixels and ipods, but not merkles and rilvers and gurples?
I have a hypothesis, which is more in the line of idle speculation. Perhaps some combinations are automatically disfavored because they interfere with rapid processing of the spoken language. I need to digress for just a moment to mention a fact that supposedly baffled early workers in speech interpretation technology: in spoken language, there are no pauses or gaps between words. If you say a typical sentence — let’s take the previous sentence for example — and then play it back really slowly, or look at the sound waveform on the screen, you will find that there are no gaps between most of the words. It’s not “I — need — to — digress,” it’s “Ineed todigressforjusta momento…” Indeed, unless you make a special effort to enunciate clearly, you may well use the final “t” in “moment” as the “t” in “to”: most people wouldn’t say the t twice. But with all of these words strung together, how is it that our minds are able to separate and interpret them, and in fact to do this unconsciously most of the time, to the extent that we feel like we are hearing separate words?
My thought — and, as I said, it is pure speculation — is that perhaps there is an element of “prefix coding” in spoken language, or at least in spoken English (but presumably others too). “Prefix coding” is the assignment of a code such that no symbol in the code is the start (prefix) of another symbol in the code. Hmm, that sentence only means something if you already know what it means. Try this. Suppose I want to compose a language based on only two syllables, “ba” and “fee”. Using a prefix code, it’s possible to come up with a rule for words in this language, such that I can always tell where one word stops and another word ends, even with no gaps between words. (“Huffman coding” provides the most famous way of doing this.) For instance, suppose I have words bababa, babafee, feeba, bafee, and feefeefee. No matter how I string these together, it turns out there is only one possible breakdown into words: babafeefeebabafeefeefeefeefeebabababa can only be parsed one way, so there’s no need for word breaks. In fact, as soon as you reach the end of one word, you know you have done so; no need to “go backwards” from later in the message, to try out alternative parses.
English doesn’t quite work like this. For example, the syllable string see-thuh-car-go-on-the-ship can be interpreted as “see the cargo on the ship” or “see the car go on the ship”. But it took me several tries to come up with that example! To a remarkable degree, you don’t need pauses between the words, especially if the sentence also has to make sense.
So, maybe words that rhymes with “circle” or “empty” are disfavored because they would interfere with a quasi-”prefix coding” character of the language? Suppose there were a word “turple” for example. It would start with a “tur” sound, which is one of the more common terminal sounds in English (center, mentor, enter, renter, rater, later…). A string of syllables that contains “blah-blah-en-tur-ple-blah” could be split more than one place…maybe that’s a problem. Of course, you’ll say “but there are other words that start with “tur”, why don’t those cause a problem, why just “turple”? But there aren’t all that many other common “tur” words — surprisingly few, actually — turn, term, terminal. “Turple” would be the worst, when it comes to parsing, because its second syllable — pul — is a common starting syllable in rapidly spoken English (where many pl words, like please and plus and play, start with an approximation of the sound).
So…perhaps I’m proposing nonsense, or perhaps I’m saying something that has been known to linguists forever, but that’s my proposal: some short words tend to evolve out of the language because they interfere with our spoken language interpretation.
I'm not sure that argument about "ter-pul" specifically is so good; wouldn't that argue against it occurring in the middle of words (e.g. interpolate)? As I understand it, though, when one looks at the evolution of languages, avoidance-of-confusion does affect how the pronunciations of specific words change. A professional linguist would know.
English is a particularly nasty language to try to answer this kind of question for, given how readily it imports words from other languages.
There is a literature in cognitive psychology on reading that follows a similar logic regarding read language and orthographic codes… see, eg (and many others) http://www.psych.unimelb.edu.au/people/staff/McKa…
Your idea of linguistic survival of the fittest has a distinguished pedigree. I think it was Herschel, the astronomer, who thought of applying Darwin to see how language evolves, referring to the "laws of verbal corruption".
I guess you won't allow Terkel (as in Studs) as a rhyme for circle?
There is a fairly big literature on lexical processing, and indeed, prefixes play a prominent role. The same thing happens with spelling and spelling mistakes (typos and brainos), by the way — many more errors beyond the first couple of characters.
Evolutionary models that take into account phonemic lexical coding efficiency are relatively newer, though it's something people have speculated about for a long time. I hadn't seen any decent work on the topic until Partha Niyogi dove into the problem:
http://people.cs.uchicago.edu/~niyogi/
He wrote an entire MIT Press book on the subject, The Computational Nature of Language Learning and Evolution. It's actually more about statistical dynamics than computation per se.
You can also check out his papers, under the "Language Evolution" section of his publications page:
http://people.cs.uchicago.edu/~niyogi/papers.htm
The "Quantifying the Functional Load" paper directly addresses the point Phil's speculating about.
Cam and Kevin: I'm not surprised this isn't a new idea, but it's one of those things that is a bit hard to Google for. (Which brings me to another subject, people who think Google is totally great and practically unimprovable. I'm a good googler, but I often have trouble finding what I want; this is an example. What would you search for?). Cam, thanks for the reference.
Anne: I don't have a complete theory to offer, but I guess my thought is that the language would "try" to avoid problems with more common words and phrases, more than with uncommon ones. Not that it would be totally strict. (And, with your specific example, at least there's no word that starts with "polate", so the syllable string "in-ter-pol-ate" can only be parsed as that word. But I'm sure you could find others.)
Kevin: I, too, thought of Studs Terkel, and Angela Merkel too! I'd call it a rhyme, sure.
Hey this means one can write a limerick:
A politician called Angela Merkel
Was invited to read Studs Terkel
When asked if she would
Replied "I doubt if I should,
It will send me around in a circle"
Kevin: Haven't you been following the blog? We're doing knock-knock jokes, not limericks!
This is an example of the type of problem that Google isn't intended to solve, at least not directly. You don't know the terminology that would be used to talk about this idea. If you want Google to help you, you have to think about who might know something about this issue. You correctly identified the right group: linguists.
If you google the word "linguist", you'll find a link to "Ask a Linguist". Believe it or not, I didn't know about that page before writing this answer. I was actually hoping that by googling "linguist" that the Language Log would show up because that's where I would ask the questions you have. But "Ask a Linguist" sounds even better. I hope you go ask them the question.