Linguisics often involves finding and explaining patterns in languages, even if speakers of languages aren't consciously aware of the patterns. Yesterday while reading Robert Sheckley's short story "Protection", I noticed such a pattern in English. In the story, Sheckley (the same author who wrote "Shall We Have a Little Talk?") makes up a bunch of nonsense words to represent words in an alien language. One of them is feeg, and it immediately struck me as odd-sounding. Is feeg a phonetically possible English word?
English orthography, for a change, makes it clear how feeg ought to be pronounced: [fiːɡ]. That's a single syllable, consisting of the onset /f/, the nucleus vowel /iː/, and the coda /ɡ/. Are there words with similar phonetic content in English? To find out, I did a little scripting to produce a list of candidate words consisting of a legal English syllable onset followed by a the various English vowels and /ɡ/. By examining the resulting list, I came up with the following English words (and a few familiar names), grouped the vowel they contain. Note that I've transcribed the phonemes using their most common surface phonetic values in my dialect—your pronunciation may vary. I've also included the traditional "short/long" vowel names after the corresponding vowels.
/æ/ (short A)
many, including bag, jag, lag.../ɛ/ (short E)
many, including beg, leg.../ɪ/ (short I)
many, including big, jig.../a/ (short O)
many, including bog, jog, log.../ʌ/ (short U)
many, including bug, jug, lug.../eʲ/ (long A)
(The) Hague, plague, vague/iː/ (long E)
klieg (light), blitzkrieg, league, Queeg/aʲ/ (long I)
none/oʊ/ (long O)
brogue, drogue, Hoag, Moog, pogue, rogue, vogue/u/ (long U)
droog/ʊ/
none/aʊ/
none/oʲ/
none
Do you see the pattern? I've probably given it away by the way I've arranged the data. It seems to be this: word-final /ɡ/ occurs very commonly after a short vowel, but in few or no words does it occur after a long vowel, a diphthong, or /ʊ/. This isn't true in general of either velars or voiced stops; there are plenty of words that end with /k/ (bake, beak, bike, poke, duke) or with /d/ (bade, bead, bide, bode, dude) or /b/ (babe, plebe, jibe, robe, rube).
What's more, the relatively few words where /ɡ/ follows a long vowel are, almost without exception, loan words. Here are the etymologies of the words above containing long vowels, variously collected from the OED, the American Heritage Dictionary, and other sources like Wikipedia:
(The) Hague from the Dutch Den Haag
plague from Latin via French, with reinforcement from Germanic
vague from French
klieg (light) from Kliegl, a German surname
blitzkrieg, also from German
league from French
Queeg, a surname whose origin I don't know
brogue, possibly from an Irish word meaning 'shoe'
drogue, etymology unclear
Hoag, a surname
Moog, a brand of synthesizers, from a surname I believe is Dutch
pogue from the Irish word for 'kiss'
rogue with the following interesting etymology in the OED: "One of the numerous canting words introduced about the middle of the 16th cent. to designate the various kinds of beggars and vagabonds, and perhaps in some way related to ROGER n. There is no evidence of connexion with F. rogue arrogant."
vogue from French
droog, Burgess's fictional borrowing of the Russian word for 'friend'
With the possible exception of rogue and drogue, then, all the words in English containing a long vowel or diphthong followed by /ɡ/ are borrowed from French, Irish, German, or Dutch. So does that imply the English sound system has a constraint barring such words? I'm not sure. On the one hand, many of these borrowings are old, well-established English words and don't contain an obviously rare sequence like the onset clusters of gnu or sphere, for example. On the other hand, the long-vowel-plus-/ɡ/ words must still be somehow exceptional, at least in my mental grammar, because when I encountered feeg it struck me as odd.
If there is a constraint against final /ɡ/ after long vowels, it's apparently weak enough to allow the borrowing of exceptions, but strong enough to discourage the coining of new words. That surprises me. When children are acquiring language, they of course acquire the current contents of the lexicon, but if the constraint described here exists, they must also infer that constraint subconsciously from the non-existence (or low frequency) of words violating it. If the constraint exists, it strikes me as a challenge to arguments for Universal Grammar from the poverty of the stimulus. It's can't be that some innnate part of the language faculty disallows /ɡ/ after the English long vowels, since words with that sequence occur in other languages. It must, therefore, be the case that children acquiring language are sensitive to the patterned non-existence of words in the lexicon—that is, the stimulus isn't so poor after all, and they're able to infer the negative evidence for themselves.
See? Language has patterns. Finding and explaining those patterns is my job. I'm a linguist. I carry a badge.
On the one hand, many of these borrowings are old, well-established English words and don't contain an obviously rare sequence like the onset clusters of gnu or sphere, for example.
Moreover, plague and vauge acquired their /ej/ nuclei in English—either they were borrowed before the Great Vowel Shift (which seems to be the case with plague, judging by the dates in the OED) or they were borrowed later but were anglicized to reflect its effects (which may be the case with vague).
The apparent willingness of English to adopt /V:g/ sequences in loanwords, or even to create new ones, inclines me to believe that the paucity of such forms is basically an accidental gap (synchronically, that is; I expect there's a historical explanation for it, although I can't call it to mind off the top of my head). (And the same goes for /V:ʃ/, for that matter.) So that raises the question of whether your sense of the peculiarity of feeg is based solely on unconscious statistical generalization over the lexicon, or whether there's something more structural involved. (Well, for that matter, there are people who would ask the same question about all of phonology....)
The other thing I wonder about is the tendency of at least Ottawa Valley English to raise and diphthongize /æ/ and /ε/ before /g/ (and maybe also before /ŋ/, but not before /k/). Like the data you give above, it suggests the neutralization of the tense-lax contrast in this environment, but the preference seems to go the other way.
Posted by: Q. Pheevr | September 01, 2007 at 12:43 PM
I don’t think that generalization from the lack of data is necessarily an argument against UG’s poverty of the stimulus hypothesis. After all, in linguistics the lack of data can be as important if not more than the data one has already. Lack of data would be pretty easy in some cases for children to infer from, but this wouldn’t invalidate the idea that there are some facts of language that are determined by preexisting P&Ps. I’m sure if we looked enough we’d find situations where children could infer some odd things from data gaps, but they don’t in fact which could be explained by some facet of UG.
Not that I’m a great believer in everything about UG, but I think it’s still pretty sound even when inference from lack of data is included.
Posted by: James Crippen | September 01, 2007 at 02:09 PM
On a completely unrelated note, why does Firefox render the length colon <ː> as a double-wide character? It seems to be preferentially selecting it from some CJK font. Safari doesn’t do this, and I imagine IE doesn’t. Does Firefox behave this way on Windows or Linux?
Posted by: James Crippen | September 01, 2007 at 02:12 PM
I don’t think that generalization from the lack of data is necessarily an argument against UG’s poverty of the stimulus hypothesis.
I agree. I think it would be relatively easy for a wee LAD to internalize the fact that there are far fewer words with /V:g/ than with /Vg/, but that doesn't mean that negative evidence in general is sufficient to explain what kids manage to do in acquiring language. Most (convincing) poverty-of-stimulus arguments seem to be based in syntax (perhaps because we have more occasion to combine words into new sentences than to combine phonemes into new words), and to turn on cases in which we produce sentences with structures we haven't heard before (i.e., for which we have no positive evidence), or in which children systematically avoid doing something that would make perfect sense if they were working entirely by analogy. (E.g., children will produce things like *Where does this goes? but not *Was the man who t here has left?; how do they decide which sentences they haven't heard are okay and which ones aren't?)
Posted by: Q. Pheevr | September 01, 2007 at 03:23 PM
/ʊ/: Shug Night? boog (short for booger?)
/aʊ/: dawg (Southern AE)
/oʲ/: iceboyg (Northeast AE)
/aʲ/ (long I): hmmm...
If pre-velar raising is involved in this pattern could it be a case of the long/short (alongside lax/tense?) alternation being more easily confused before the voiced segment? The /æ/ to [e] raising follows a tighter regional pattern than the /ɛ/ to [e] raising that causes so many of my students to hear the initial vowel of [ɛksɪt] and [egzɪt] as the same vowel.
Often when I compare the pronunciations [egzɪt] and [ɛgzɪt] they still confuse them. Of course those who usually say [ɛgzɪt] are more likely to catch the difference.
Posted by: michael covarrubias | September 01, 2007 at 03:46 PM
William Steig, the cartoonist.
Tige, Buster Brown's dog (presumably short for "Tiger").
"Fugue" is better than "droog."
I used to think "segue" was pronounced "seeg," so I doubt the constraint is that strong.
Posted by: johnshade | September 01, 2007 at 07:07 PM
Q. Pheevr:
The other thing I wonder about is the tendency of at least Ottawa Valley English to raise and diphthongize /æ/ and /ε/ before /g/ (and maybe also before /ŋ/, but not before /k/). Like the data you give above, it suggests the neutralization of the tense-lax contrast in this environment, but the preference seems to go the other way.
Ah, I actually deleted a paragraph about that dialect feature because I thought the post was getting overly complex. The discussion above is (intentionally) entirely in terms of phonemes rather than surface phones. Having surface vowels occur before /g/ that sound the same as the long vowel phonemes due to a phonological process clearly complicates the issue, but unless dialects with that process actually neutralize the two phonemes involved (which I believe they don't), then I think the (soft) constraint on the lexicon is best described in terms of phonemes.
James Crippen:
I don’t think that generalization from the lack of data is necessarily an argument against UG’s poverty of the stimulus hypothesis. After all, in linguistics the lack of data can be as important if not more than the data one has already. Lack of data would be pretty easy in some cases for children to infer from, but this wouldn’t invalidate the idea that there are some facts of language that are determined by preexisting P&Ps.
Let me be a little more clear about what I think. As I understand the poverty of the stimulus argument, it goes like this: The stimulus presented to language learners is not rich enough to determine the languages they learn. In particular, they are not presented with negative evidence that would exclude hypotheses that are consistent with the positive data but linguistically odd. Nonetheless, learners don't acquire such odd languages. The explanation is that there is an innate Universal Grammar that excludes these bad hypotheses.
I think that this argument underestimates the capacity of language learners to notice and reason from the gaps in the stimulus. If a learner has been presented with a set of utterances that conform to some constraint (positive evidence) but is never told that non-conforming utterances are ungrammatical (negative evidence), I don't think there's any need to resort to UG to explain why they learn not to produce non-conforming utterances. I think learners are capable of forming hypotheses not only about patterns in the positive evidence, but also hypotheses about why they've never heard the utterances in the gaps, as long as those gaps can be characterized by the same sorts of rules as the positive evidence, and that should be the case, since the stimulus is produced by language speakers with rule-based grammars in their brains. Learners can form these hypotheses because they evaluate their own production using the same mechanism they use to evaluate incoming utterances—that is, they're just as capable of thinking, "Hmm, I've never heard a sentence like that", whether they're hearing the sentence or about to say it.
James Crippen again:
On a completely unrelated note, why does Firefox render the length colon <ː> as a double-wide character? It seems to be preferentially selecting it from some CJK font. Safari doesn’t do this, and I imagine IE doesn’t. Does Firefox behave this way on Windows or Linux?
The stylesheet for this blog uses the font Doulos SIL for ranges marked as IPA. If you don't have that font installed, Firefox finds some other font to get the character from, and is presumably picking one of the CJK fonts. Installing Doulos SIL should fix it. BTW, you imagine incorrectly about IE. Not only does it show the double-wide ː when Doulos SIL isn't installed, it still shows it even if you install the font. (At least, that's how IE 7 on Windows behaves—I seem to recall that IE 6 was even worse, failing to find that character at all if no font containing it was specified.) This is one of the reasons I finally switched to Firefox.
Posted by: The Tensor | September 02, 2007 at 10:11 AM
an obviously rare sequence like the onset clusters of gnu
Huh? The onset of gnu is /n/. You're surely not suggesting that the g- is pronounced in any variety of English?
The Irish name Tadhg is pronounced /taʲg/.
Posted by: language hat | September 03, 2007 at 06:58 AM
Huh? The onset of gnu is /n/. You're surely not suggesting that the g- is pronounced in any variety of English?
Oh. I guess I was thinking of GNU. Is "geek" a dialect of English?
Posted by: The Tensor | September 03, 2007 at 04:24 PM
Oh. I guess I was thinking of GNU. Is "geek" a dialect of English?
Well, I have to do enough concious code-switching between “geek” and “nongeek” to make me think that it is.
Not only does it show the double-wide ː when Doulos SIL isn't installed, it still shows it even if you install the font.
This is exactly the problem that I have with Firefox, even when [ʔaɪpʰiʲeː] is explicitly marked up and CSS provides an appropriate font. I’ve searched Bugzilla for complaints and found none. Submitting a bug report is probably fruitless since nobody really cares about IPA when there are bigger issues like getting Thai or Kannada displaying correctly.
Posted by: James Crippen | September 03, 2007 at 05:07 PM
The Irish name Tadhg is pronounced /taʲg/.
Tadhg tends to be anglicized as Taig /teʲg/ or Teague /tiːg/ (or identified with Thaddeus, whence Thady, but that's a whole other kettle).
Your data seems only to consider morpheme-final /g/ rather than syllable-coda /g/: are there further examples in mid-morpheme syllable codas?
Posted by: molly mooly | September 04, 2007 at 05:08 AM
Tadhg tends to be anglicized as Taig /teʲg/ or Teague /tiːg/
Maybe, but Tadhg itself is pronounced /taʲg/; at least that's the only pronunciation given in Daniel Jones' English Pronouncing Dictionary.
Posted by: language hat | September 04, 2007 at 08:02 AM
You've left out the THOUGHT vowel, also long and also scarce. A mid-morpheme instance before /g/ is augment. A possible anglicized pronunciation of zugzwang is /'zʊgswæŋ/.
Posted by: molly mooly | September 04, 2007 at 08:11 AM
Tadhg itself is pronounced /taʲg/
That's probably the closest approximation to the Gaelic for most English accents, although in in Hiberno-English the initial voiceless plosive T would be dental as in [the H-E pronunciation of] "thin", rather than alveolar as in "tin". How often the word is pronounced in other accents I know not.
Posted by: molly mooly | September 04, 2007 at 09:19 AM
I have noticed the same thing about words ending with the "judge" sound and the "church" sound, the hushing affricates, or whatever they're called. In particular, there are few words with [aj] that end in either of these consonants.
I had a vague theory that something was going on with syllable weight, and that some codas were "heavy" in some sense, and that made them unfriendly to long vowels because the resulting syllables would "weigh" too much. But I don't think any theory of this form can really stand, because there are too many gross counterexamples. [kt], for instance, is surely heavy, but "biked" is a perfectly cromulent word.
Posted by: ACW | September 06, 2007 at 05:29 PM
ACW: I reckon you need to discount [s] [z] [t] [d] due to inflections. And maybe the nonproductive suffix [θ] in fifth, sixth, strength, warmth, etc.
Posted by: molly mooly | September 07, 2007 at 04:16 AM
I think the origin of the gaps is historical.
Old English g (well, "yogh", looks like a curly "z") was fricative after vowels word internally unless doubled.
The gh-fricative became y,w in Middle English, e.g
daeg "day" dagas "dawes" later remodelled to day(e)s from the singular.
So a post-vowel "g" stop in the non-loan vocabulary would go back to a geminate -gg-; Late Old English IIRC shortened originally long vowels in this context.
Similar phenomena affected the Norse-derived vocabulary, but not later loans.
Posted by: David Eddyshaw | September 11, 2007 at 02:50 PM
"Fugue" is fine, but /fug/ seems as bad as /fig/ to me. (Sorry for lack of IPA--I'm at an internet cafe in Bishkek, and so don't have my precious IPA keyboard.)
I have a script (on my computer, not here atm) that produces all possible mono-syllabic words of English. It relies on some data I got on what constitutes a heavy *nucleus* (not coda, mind you), and what codas work with the heavy nucleus. This system, mind you, though active at some level in modern English, isn't based on properties of modern English, and makes more sense if you trace stuff back to earlier forms, as with the /g(g)/ stuff pointed out by David.
Just the same, I'm not sure if nucleus weight is relevant here--as you said, it may have something to do with historical gaps.
Perhaps a similar phenomenon, [mono-syllabic words and words starting with] "wash" I pronounce with a /V/ vowel, and they seem ungrammatical to me when pronounced with /a/. The same is true of "squash" words. What were the /S/ exceptions that Q. had in mind?
Posted by: firespeaker | October 03, 2007 at 12:44 AM