Tuesday March 23, 2004

Phonemic Stress

English words have lexical stress. That is, one (or more) of the syllables will have some combination of higher pitch, greater loudness, or a less reduced vowel, when compared with the other syllables in the word. Example: release is [ɹəˈlis], with a reduced vowel in the unstressed first syllable, rather than *[ˈɹilɪs] (or something like that). In the case of release, there's only one word spelled that way, but there are a few cases in English where stress is phonemic—that is, it distinguishes words.

The case I've seen mentioned before is the two words that we spell subject. With the stress on the first syllable, it's a noun, but with stress on the second, it's a verb. This is a pretty good example, but it's got two problems. First, subject is a member of a closely-related class of word pairs that have the same stress change, including reject, project, and object, so maybe this isn't really lexical stress, but the result of a morphological rule for deriving nouns from some verbs (or maybe verbs from nouns—flip a coin) by shifting stress. Second, in all these cases, the vowel qualities also change when the stress changes. This is obvious in reject ([i] to [ə] in the first syllable) and in object and project ([a] to [ə] in the first syllable). Even in subject, though, the vowel quality in the first syllable changes, from a stressed [ʌ] to an unstressed [ə].

Today I stumbled across what I think is a better example: concrete. It's both a noun pronounced [ˈkaŋkɹit] and an adjective pronounced [kaŋˈkɹit]. I don't think the vowel in the first syllable reduces to a schwa, although your mileage may vary, and more importantly, the shift in stress doesn't seem to be the result of a productive rule—we don't have similar pairs of pronunciations for accrete, discrete, or secrete, for example.

[Aside: All transcriptions are according to my (Western American) dialect, and I don't have [ɔ], but feel free to mentally insert it above as necessary if you're one of those people.]

Having random revelations about this sort of thing is a result of reading Steven Pinker's Words and Rules for fun. I'll get back to some linguistics-related science fiction soon, I promise—Ted Chiang's "The Story of Your Life" is up next.

[Now playing: "Jerry Was a Racecar Driver" by Primus. Go!]

I am The Tensor, and I approve this post.
10:40 PM in Linguistics | Comments (3) | Submit: | Links:

Sunday March 21, 2004

Underdots

Mark Liberman is struggling with fonts, encodings, and browser incompatibilities. I feel his pain—I've been trying to figure out a straightforward way to get IPA and other extended characters to look right in the various browsers, and there doesn't appear to be a perfect solution. But, I have made a little progress.

In particular, Mark is trying to get underdots to show up beneath some vowels in Chuvash using combining diacritics, because, he says, a-with-underdot and e-with-underdot aren't in Unicode. In fact, those characters are in Unicode, hex 7841 and 7865, respectively. Hẹrẹ is ạn ẹxạmplẹ (in Georgia, assuming you have it), and hẹrẹ is ạnothẹr ẹxạmplẹ (in Times New Roman), so you can see what your browser does with those characters. They're in the Latin Extended Additional range, which I believe is the "tricked out Latin characters for use in Vietnamese" range, and they have glyphs in a reasonable number of fonts. But, to my disappointment, not in Georgia, the font I [used to] use for this blog—I love those old style figures.

That might be enough to solve Mark's problem, but there's so much more to talk about. The two browsers I have installed on my (Windows XP) machine are IE 6 and Mozilla Firefox 0.8. IE has the ability to detect the various character ranges, and you can customize (in Tools, Options, General, Fonts...) which font is used to display a character in a particular language range if there is no font applied to the character. That's a big if, though, since most web sites have a font applied (using style sheets, these days) to every character on every page. More troublesome for linguists is the fact that the IPA characters aren't in any language range (since they're not a part of any one language) and the IE interface only lets you set up these fonts based on a language.

Mozilla appears to be bit more aggressive—even if you have a font applied, Mozilla checks, for each character displayed, to see if the current font has a glyph for that character, and if it doesn't, it finds a font that does. This is a nice idea—it prevents you from ever seeing those little squares that mean "no glyph for this character", which no page author has ever intended you to see. It doesn't appear to be possible (at least, I can't find the setting) to tell it which font to fall back on for a given range, so it's falling back on Thryomanes (one of the SIL IPA fonts) on my machine—possibly because it's the alphabetically last font with that character, but who can say?

This feature of Mozilla means that you often don't need to apply a font to a character in an odd range, as long as your audience is guaranteed to be using Mozilla. Of course, as Mark points out, about 55% of people (including me) seem to be using IE, so relying on this feature of Mozilla isn't much of a solution. (In fact, Language Log has had several posts in the past that don't show up right in IE, like this one. I was trying to think of a polite way to point it out, and here's my chance.) One way to get around this is to make an IPA style that applies various fonts that have glyphs for the IPA range, and apply it to the ranges of characters you know some browsers will mess up on. My stylesheet currently contains:

.ipa {
	font-family: "Gentium", "Arial Unicode MS", "Lucida Sans Unicode"
}

...and I wrap this HTML:

	<span class="ipa"></span>

...around text that I want to be in IPA. This solution works with all browsers that support stylesheets as long as one of those fonts is installed. (And I'm always thinking of adding more fonts to the style. Does anyone know what the most widely installed IPA font is on the Mac, btw?) Note, however, that the available glyphs in the various fonts aren't exactly the same, so I have to be careful—some of the combining diacritics are in Gentium, but not in the others, for example.

That's the solution I'm using so far. It means that my IPA snippets are readable by people who have a reasonably popular IPA font installed (or who are using Mozilla). I'd love to hear about a solution that makes IE behave like Mozilla (finding a font with a defined glyph for every character), or at least lets me specify a fallback font for the IPA range.

Oh, and Mark, about the problem you were having with a combining diacritic sometimes going on the following character in Mozilla: if you remove Verdana from the fonts in your .blogbody style, it goes away. It only seems to happen, in my tests, when the first two fonts in the list are Georgia and Verdana. This must be a bug in Mozilla's stylesheet code.

[Now playing: "Rush" by Yoko Kanno]

[Update: On further investigation, I take it back: it's not a bug in Mozilla, it's a bug in the Verdana font that Mozilla's fallback behavior exposes. The metrics on the combining-dot-below character are broken, somehow, in Verdana—it won't properly position itself under a character in any font except Verdana. So, if there's a font switch from Georgia to Verdana right between the vowel we're trying to put a dot under and the combining-dot-below, the dot will sit way off to the right. Here's a snippet of HTML that causes the bug in both Mozilla and IE:

<span style="font-family: Georgia">-sa</span><span style="font-family: Verdana">&#803;r</span>

...and here's the result it produces in your browser:

-sạr

(Note that the dot should be under the 'a'.)

On the Language Log page, this font bug surfaces in Mozilla but not IE because the font-family on Language Log is a list: Georgia, Verdana, Arial, Sans-Serif. In CSS, this means display using Georgia if it's available, otherwise if Verdana is available use it, and so on. IE checks once, finds Georgia, decides to use it for all the characters, and so shows those rectangles because Georgia doesn't have a combining-dot-below. Mozilla tries to be smarter: it uses the first font on the list (Georgia) if it's installed, but if it finds a missing character in that font, it will go down the list checking each succeeding font until it finds the character (in Verdana, in this case). So, in the example that started this whole business, Mozilla effectively infers a font change from Georgia to Verdana between the vowel and the dot, and this doesn't format right because Verdana is broken somehow.

The only solution that I can see, at this late hour, is not to use Verdana if you expect to be using Unicode character 803 (hex).]

[Another update: Edited this post to reflect the fact that Georgia is no longer the default font for posts on this blog.]

I am The Tensor, and I approve this post.
12:54 PM in | Comments (1) | Submit: | Links: