English to American Sign Language Machine Translation of Weather Reports

Angus B. Grieve-Smith

University of New Mexico

Introduction

Machine translation is one of the oldest applications of computing to language. Until now, most attempts at machine translation into a signed language have been relatively word-for-word, producing sign that has none of the syntax of the target language. Not much effort has been put into producing fluent, idiomatic sign, in part because all of the attempts to date have also tried to deal with the task of sign synthesis at the same time.

It is possible to abstract away from the problem of sign synthesis by using a writing system. I have developed a prototype English to American Sign Language (ASL) machine translation application using Don Newkirk’s (1986) Literal Orthography, a system that uses the Roman alphabet for writing signs. The translation program makes use of functionalist principles of lexical and grammatical description to produce fluent translations of National Weather Service forecasts.

I have chosen weather for the same reasons that make it a favorite testing domain of prototype natural-language processing applications: the National Weather Service genre consists primarily of a relatively small set of words and stock phrases. Albuquerque weather is particularly well-suited for this, because there is less variability than in many other cities.

Translating into ASL

What is "text" for a signed language?

Translation, whether human or machine, prototypically involves reading a text in one language and writing it in another. But like !Xóõ, Fulani and a large number of other spoken languages, it has no standard written form. Unlike these languages, ASL is so different from the prototypical written language that the possibility of writing it has not occurred to most of its speakers.

Because of this, sign synthesis and machine translation are often confused. Many of the prototype sign synthesizers (e.g. Ohki et al. 1994) have treated synthesis as an integral part of machine translation. This is different from the spoken language situation: we don’t expect a French-English translator to produce synthesized speech. We would not even expect a French-Fulani translator to produce speech, even though very few people write Fulani.

For this project, I have adopted a short-term solution, to use the "Literal Orthography" (Newkirk 1986), one of the nine writing systems that have been developed for signed languages. The system is based on the Roman alphabet, which allows easy integration with ASCII-based Unix systems. It is more complete than ASCII-Stokoe, the other ASCII-based writing system, and can represent close to the full range of lexical and classifier signs of ASL. The following sign can be translated as "wind" or "windy" in English:

    1. so-bles
    2. The prefix "so" refers to the hands making "contrary movement" meaning that while one hand (the dominant hand, in this case) is moving away from its side, the other is moving towards its side. "Bl" indicates that the handshape is the one used for the number 5 in the American Manual Alphabet. "E" means that the hands move towards the non-dominant side of the body, and "s" means that the movement is reduplicated.

      The system as Newkirk developed it in 1986 did not represent facial and body gestures or fingerspelling, two integral components of ASL, so I developed ad hoc conventions for representing these. The facial and body gestures are indicated by a short word following the word for a lexical or grammatical sign, that can be easily distinguished from a sign by the number and combination of letters used. Examples of this notation is given below:

    3. a. so-bles w
      b. so-bles r
    4. Example (2a) indicates that the word "so-bles" is modified by the diminutive facial adverb "w" (pursing the lips) to translate the English "breezy." The English word "gusty" is translated with "so-bles" modified by the intensifying adverb "r" (squint) in (2b).

      Fingerspelling is simply represented by an "@" sign placed before the Roman version of the fingerspelled word, as in the following example for "miles per hour":

    5. @mph

This is a workable solution for the short term. In the long term there are two possible solutions. One is the adoption of a writing system by the American Deaf Community, which would have other desirable effects such as greater status for ASL and increased literacy. Another possibility is integration with an eventual sign synthesis application.

Corpus Planning

With translation, as with any instance of language contact, issues of power come into play. What happens when a translator wants to write about a topic that is not usually discussed in the target language? Two possibilities exist: that the translator could borrow ad hoc from the source language or another language of prestige, or that someone could invent new words to be used for this topic. The first possibility involves a loss of prestige for the language, since it is deemed incapable of expressing this topic. The second is a more grassroots procedure, and allows for community involvement in the development of the language.

A well-known example of this is that of computer terminology in French. Since most computer equipment was invented either in the United States or in Japan, there were no French words for concepts like "software" and "the Web." In the case of "the Web," the word "web" has been borrowed from English, while in the case of "software, a group of French community leaders decided to invent a word, "logiciel," which has been adopted by French speakers and even clipped to "log" on occasion. A coinage may be rejected by the users of the language; for example, the invented word "hambourgeois" has been ignored in favor of the English borrowing "hamburger."

The situation for weather description in ASL is similar. Deaf people do talk about the weather, so the conversational vocabulary exists already. What is not available, according to informal discussions with native signers and interpreters, is a formulaic register of jargon and expressions comparable to that used by the National Weather Service, as seen in this example from January 20, 1999:

(4) Tonight: Mostly cloudy with a slight chance of rain showers. Breezy and mild. West to southwest winds 10-20 mph.

The short-term solution adopted for this project has been informal consultation with experts who are native speakers of ASL, to determine the conversational vocabulary used to discuss weather and adapt it for the translation of National Weather Service forecasts. A long-term solution would involve corpus design by a group of interested community leaders.

The application

As is common with machine translation systems, the application consists of four components: a lexical analyzer, a parser, a transfer module and a generation module. In addition, there is an initial module that obtains the weather reports from the World Wide Web. Several of the components use freely available Perl modules, packages designed to assist in those particular tasks for spoken or computer languages.

Retrieval and lexical analysis of weather reports in English

The weather reports are published in English by the National Weather Service on the World Wide Web at <http://iwin.nws.noaa.gov>. Geo::WeatherNOAA is a Perl module developed by Mark Solomon to assist in downloading the reports from the National Weather Service site. The report for January 20, 1999, as produced by Geo::WeatherNOAA looks like this:

(5) Today: Partly cloudy. Increasing west to southwest winds 15-25 mph this afternoon.
Tonight: Mostly cloudy with a slight chance of rain showers. Breezy and mild. West to southwest winds 10-20 mph.
Thursday: Windy and slight cooler with a chance of rain showers. West to northwest winds increasing to 20-30 mph with higher gusts.

Parse::Lex is a lexical analysis module developed by Philippe Verdret. It allows the user to define an arbitrary set of lexical categories, and then apply those categories to a given text. For this application, Parse::Lex was configured to tag the text with the lexical category for each word; for example, "today" was tagged with "<day>."

Pawley and Syder (1983) argue that nativelike fluency in a language can best be explained by postulating that the lexicon is composed of chunks that can be larger than typical words or morphemes. I decided to configure the lexical analyzer based on these principles, so any sequence of words that seemed to be a set phrase in National Weather Service terminology was tagged as a single word; for example, "rain showers" was tagged as a single unit with "<precip>." For this pilot study, the chunks were chosen based on my intuitive judgments as a native English speaker; in the future, this lexical analysis could be based on actual counts of token frequency within National Weather Service texts.

The following example shows our forecast from January 20, 1999, after being tagged by the lexical analyzer:

(6) today <Day> : <Punc> partly cloudy <Sky> . <Punc> increasing <Change> west <Direct_Locat> to <Preposition> southwest <Direct_Locat> winds <Wind> 15-25 <NumRange> mph <Degree> this <Demonstrative> afternoon <Time> . <Punc>
tonight <Time> : <Punc> mostly cloudy <Sky> with <Preposition> a <Determiner> slight <Degree> chance <Chance> of <Preposition> rain showers <Precip> . <Punc> breezy <Sky> and <Conjunction> mild <Heat> . <Punc> west <Direct_Locat> to <Preposition> southwest <Direct_Locat> winds <Wind> 10-20 <NumRange> mph <Degree> . <Punc>
thursday <Day> : <Punc> windy <Sky> and <Conjunction> slight <Degree> cooler <Heat> with <Preposition> a <Determiner> chance <Chance> of <Preposition> rain showers <Precip> . <Punc> west <Direct_Locat> to <Preposition> northwest <Direct_Locat> winds <Wind> increasing <Change> to <Preposition> 20-30 <NumRange> mph <Degree> with <Preposition> higher <Degree> gusts <Wind> . <Punc>

Parsing English weather reports

The tagged reports are then parsed using Parse::RecDescent, a Perl module developed by Damian Conway. RecDescent is a recursive descent parser that can be configured to produce a parse tree based on an input text in a particular language. Since the weather reports are formulaic and do not rely on complicated syntactic structures, it was possible to directly create a semantic representation, without the intermediate step of a syntactic tree.

Every National Weather Service forecast can be divided into four semantic components, corresponding to the real-world weather domains of sky, precipitation, wind and heat. This division is often reflected in the syntactic structure of the reports: often each of the components has its own sentence. Within each component, the structures can be as simple as a single lexical item ("partly cloudy"), or more complicated, with phrases specifying probability, wind speed or change of state. These were all reflected in the parser. What follows is an excerpt from the parse tree for the January 20, 1999 forecast, showing a "wind phrase" for the first sub-forecast:

(7) [1][1]{windP}{time}[1]{day}=
[1][1]{windP}{time}[1]{time}=afternoon
[1][1]{windP}{time}[1]{prep}=
[1][1]{windP}{time}[1]{adv}=
[1][1]{windP}{degree}=
[1][1]{windP}{conj}=
[1][1]{windP}{direct1}=west
[1][1]{windP}{direct2}=southwest
[1][1]{windP}{main}=winds
[1][1]{windP}{change}=increasing
[1][1]{windP}{speedP}{total}=15 - 25
[1][1]{windP}{speedP}{lo}=15
[1][1]{windP}{speedP}{hi}=25
[1][1]{windP}{locatP}[1]{main}=
[1][1]{windP}{locatP}[1]{degree}=
[1][1]{windP}{locatP}[1]{prep}=
[1][1]{windP}{locatP}[2]{main}=
[1][1]{windP}{locatP}[2]{degree}=
[1][1]{windP}{locatP}[2]{prep}=
[1][1]{windPlus}{total}=
[1][1]{windPlus}{degree}=
[1][1]{windPlus}{main}=
[1][1]{windPlus}{prep}=
[1][1]{windPlus}{speed}=
[1][1]{windPlus}{locatP}[1]{main}=
[1][1]{windPlus}{locatP}[1]{degree}=
[1][1]{windPlus}{locatP}[1]{prep}=

Transfer of the parse tree into ASL

The transfer into an ASL-based tree is accomplished by a simple lookup table. The table is based on the items in the lexical analyzer, each with a translation in idiomatic ASL, written in Newkirk notation. I developed a Perl script to read each line of the English parse tree and look it up in the table. It then recreates the parse tree, replacing the English with the ASL. The following example shows the transferred version of the excerpt above:

(8) [1][1]{windP}{time}[1]{day}=
[1][1]{windP}{time}[1]{time}=byy:yd
[1][1]{windP}{time}[1]{prep}=
[1][1]{windP}{time}[1]{adv}=
[1][1]{windP}{degree}=
[1][1]{windP}{conj}=
[1][1]{windP}{direct1}=woo:o
[1][1]{windP}{direct2}=sooa woo:o
[1][1]{windP}{main}=so-bles
[1][1]{windP}{change}=so-bray c
[1][1]{windP}{speedP}{total}=15 - 25
[1][1]{windP}{speedP}{lo}=15
[1][1]{windP}{speedP}{hi}=25
[1][1]{windP}{locatP}[1]{main}=
[1][1]{windP}{locatP}[1]{degree}=
[1][1]{windP}{locatP}[1]{prep}=
[1][1]{windP}{locatP}[2]{main}=
[1][1]{windP}{locatP}[2]{degree}=
[1][1]{windP}{locatP}[2]{prep}=
[1][1]{windPlus}{total}=
[1][1]{windPlus}{degree}=
[1][1]{windPlus}{main}=
[1][1]{windPlus}{prep}=
[1][1]{windPlus}{speed}=
[1][1]{windPlus}{locatP}[1]{main}=
[1][1]{windPlus}{locatP}[1]{degree}=
[1][1]{windPlus}{locatP}[1]{prep}=

Generation of fluent ASL

The ASL generation module uses the notion of "sentence stems" proposed by Pawley and Syder (1983) to generate fluent ASL. The Perl script first takes an inventory of the kinds of information present in the semantic representation, and generates a formulaic phrase for each one. These formulas all use ASL grammar, including topic-comment structure and nonmanual grammatical morphemes. The content that is output by the transfer module is then plugged in to the formulas, producing fluent ASL. The translated weather report for January 20, 1999 thus looks like this:

(9) s-yya b so-blhihyeuws r. woo:o sooa woo:o so-bles so-bray c 15 25 @mph byy:yd.
byyhayri b so-blhihyeuws rm. si-byays si-seeis s-blhasas w. so-bles w. aeexion m. woo:o sooa woo:o so-bles 10 20 @mph.
nyms b so-bles. s-blix's areeixx' w. si-seeis s-blhasas w. woo:o hooy woo:o so-bles so-bray c 20 30 @mph.

Future work

There are several possibilities for the extension of this work. The output needs to be cross-checked with a native signer expert to ensure that it is indeed fluent, idiomatic ASL. A double-blind method could be employed, whereby a signer who knows the Literal Orthography reads the translation of a randomly chosen forecast, and a native-speaker evaluator is then tested on their understanding of the translation.

The lexical analyzer and parser are still not completely adjusted to the full range of weather reports in English; since much of the training was done during a mild winter in Albuquerque, words for snow have only recently been added to the lexicon. A frequency analysis of the corpus would determine whether the "chunks" currently used in the lexicon correspond to chunks used in these reports. Additional corpus planning should eventually be undertaken with community leaders to develop standard ASL phrases corresponding to the set expressions of the National Weather Service.

Conclusion

The production of ASL by this prototype system shows that machine translation from English into ASL is feasible. The process is relatively straightforward, if we abstract away from the problem of the form of output. There are several projects underway to produce both user-friendly writing systems for signed languages, and sign synthesis applications, so machine translation that is usable by most signers seems likely to appear soon.

References

Newkirk, Don E. 1986. Outline of a proposed orthography of American Sign Language. <http://users.home.net/dnewkirk/signfont/orthog.htm>.

Ohki, Masaru, Hirohiko Sagawa, Tomoko Sakiyama, Eiji Oohira, Hisahi Ikeda and Hiromuchi Fujisawa. 1994. Pattern recognition and synthesis for sign language translation system. ASSETS 10, 1-8.

Pawley, Andrew, and Frances Hodgetts Syder. 1983. Two puzzles for linguistic theory: Nativelike selection and nativelike fluency. In Jack C. Richards and Richard W. Schmidt, eds. Language and communication. New York: Longman.