RERC logo
   Rehabilitation Engineering Research Center
   on Hearing Enhancement

divider between banner and body
spacer for menu buttons spacer for menu buttons spacer for menu buttons
spacer for menu buttons design for top of left side menu
spacer for menu buttons spacer for menu buttons
spacer for menu buttons Home
spacer for menu buttons About the RERC
spacer for menu buttons Projects
spacer for menu buttons Publications
spacer for menu buttons Dr. Ross Says...
spacer for menu buttons Recruitment
spacer for menu buttons Downloads
spacer for menu buttons Links
spacer for menu buttons The RERC Staff
spacer for menu buttons Contact Us
spacer for menu buttons spacer for menu buttons
spacer for menu buttons design for bottom of left side menu
spacer for menu buttons spacer for menu buttons
spacer for menu buttons

            Gallaudet University Logo
     Gallaudet
     University



spacer for menu buttons

spacer for menu buttons spacer for menu buttons

Dr. Ross on Hearing Loss

Speechreading

by Mark Ross, Ph.D.
This article first appeared in the
IFHOH Journal (1998)

Hearing-impaired people wear hearing aids because they want to hear better. Even with hearing aids, however, many if not most, of them will still have problems understanding speech, particularly in noisy surroundings. Additional help is available for these people if they are able to use their eyes to supplement the information obtained through the ears, that is, by speech-reading. (I'll explain later why I'm using this term rather than the more familiar term, "lipreading"). Most people, both hearing-impaired and normal hearing, and whether or not they know it, are able to speechread to some extent. This is why even normal hearing people appear to "hear" better when they can see the face of the person talking, and why people often feel that they need to put their eyeglasses on in order to "hear" better.

Definition

There's no big secret to speechreading. Unless the person we're talking to is a ventriloquist, it is normal to produce visible movements of one's lips, tongue and jaw (the "articulators" of speech) while talking. Because of the physical constraints imposed by the articulatory apparatus, we generally produce the same speech sounds with a consistent pattern of physical movements. These motions then become associated with specific sounds. But it is important to realize that we move our articulators to produce acoustically distinct sounds and not visually contrastive movements. The degree of visibility of these movements is simply a fortuitous by-product of the speech transmission process. This explains why there is no one-to-one relationship between a specific movement and a specific sound. We cannot see the sounds and words produced in the back of the mouth (i.e. /key/), the vibrations of the vocal cords that differentiate voiced from unvoiced sounds (i.e. /b/ versus /p/) or the opening of the nasal passage during voicing ( the difference between the /b/ and /m/ sounds). The point is that many acoustically distinct sounds can be made with superficially very similar visual movements of the articulators. That is, they will look the same but be perceived differently. For these reasons, it is estimated that no more than 30% or 40% of the sounds of speech can be identified using vision alone. Is it possible to fully understand a spoken message if one can only perceive 30 or 40% of the sounds of speech? Yes it is, if one is a "natural" and one has an excellent command of the language. Most of us, however, are not "naturals" - in the same way that most of us are not born with an innate proficiency to play some musical instrument. Fortunately, most of us can improve our ability to speechread to some extent (or improve our musical skills). Actually, this possibility of improvement is implicit in how we define "speechreading" as opposed to "lipreading".

The term "lipreading" conveyed the idea that one can understand speech by focusing only on the movements of the lips. While some comprehension of speech is possible with just a lip focus, in reality good lipreaders observe much more than just the lip movements. They observe, maybe not consciously, all aspects of non-verbal communication (facial expressions, body-stance, etc.) and they process (again, probably not consciously) all the available linguistic, situational, and auditory cues. There are lots of these cues and we'll be reviewing the basic ones in this paper.

The term "speechreading", on the other hand, recognizes that speech comprehension is a global process, in which a listener, hearing-impaired or not, employs all of this information to comprehend a message. This is the goal, and to reach this goal, it is not necessary for a "listener" to consciously identify any specific lip movement, or indeed any other element present in a communication exchange. In speechreading, one focuses on the meaning of the message and not its details (though, of course, the more "details" that can be rapidly and automatically processed, the greater the likelihood the message will be comprehended.

Over the years, there has been a great deal of research focused on speechreading. While we still don't really understand why some people are "stars", some just mediocre, and some few people unable to go beyond the most basic identifications, we can make some valid generalities about the speechreading process that should help us understand both its promise and limitations. I should begin this discussion with a word of comfort for the real "duds" out there; there has never been any research that has shown a correlation between intelligence and speechreading ability! In other words, in this skill area you're not stupid if you don't quite "get it". (But neither would you be showing much intelligence if you just gave up when you were capable of doing better!)

Seeing the Lips

The first, quite obvious generality that can be made about speechreading is that one has to see a talker's lips in order to speechread. While quite obvious, it is surprising how many hearing-impaired people "look you right in the eye" when talking to someone (or worse, not look at you at all!). While this may be a social or cultural norm, it doesn't help speechreading. Visual acuity is best at the point of focus, and you'll need all your visual skills to be able to perceive the rapid, fleeting movements of the articulators. While focusing on the lips, you should be able to perceive broader facial expressions and other non-verbal cues through your peripheral vision, The reverse won't necessarily be true. If you have difficulty looking someone "right in the lips", moving your eyes somewhere around the nose may be an acceptable compromise. Quite obviously, if seeing the lips is a prerequisite to speechreading, you won't do very well in the dark, a dim light, or from the back of a room. You'll have problems if the face of the person you're talking to is in shadow (and the light is in your eyes), or if you can only see an angle of the lips because of your position (this is much worse when if the persons sports a full mustache and beard!). Your ability to speechread will also be affected if your eyeglasses are not currently suitable for you. In one study, it was found that about 30% of the people who depended upon speechreading to some extent wore eyeglasses with prescriptions that were obsolete. As I've already commented, while all of these reasons are perfectly obvious it is surprising how often they occur without a hearing-impaired person taking "assertive" action (like moving one's position, getting one's eyes checked, or turning on a light).

Linguistic Predictions

The second obvious generality it is possible to make is that a person must have a "reasonable" command of the language in order to speechread. If a person doesn't know a language at all, he/she is not going to be able to speechread it, no matter how superior their natural speechreading skills may be. With a "reasonable" command of the language comes the ability to predict the presence of sounds and words that cannot be seen, either because they are invisible or because other sounds and words are formed with the same motions of the articulators (i.e. /man/, /pan/, and /bad/ all look alike on the lips; the correct one can be identified by the situational or linguistic context - if the "listener" knows the language well enough).

How to define "reasonable command" of the language is not obvious. Undoubtedly, all native speakers of a language possess the necessary linguistic background so that their language facility would not be a factor in their speechreading ability. (This doesn't mean that all native speakers would be good speechreaders, only that their limited skills would have to be attributed to factors other than their linguistic ability). We really don't know how much competence in a language is necessary before speechreading skills become independent of language ability. As a rather broad generalization, we can assume that language skills moderately influence speechreading ability until some, currently unknown, "threshold" of language competency is reached. The contribution that linguistic predictions make in speechreading can be seen in the following examples (for the purpose of these examples, consider only the visual component of the message, and not possible auditory or situational cues):

  1. John and Mary had to walk to school today. They missed the bus. Tomorrow they'll be sure to catch it.

    In this short paragraph, the prepositions "to", pronouns "they" and "it", and the article are all predictable from the context to a certain extent. Since the words "to" and "it" and "the" can hardly be seen (try looking in a mirror while saying it), this prediction is the major way that these visual verbal gaps can be filled. While the word "they" is clearly visible, without an instinctive "feel" for pronoun formation, it would be more difficult for a person with a hearing loss to identify the nouns they refer to. This short paragraph also illustrates how later sentences in an utterance become easier to predict as the earlier ones are comprehended; in other words, the linguistic constraints of the earlier sentences limit the possible later alternatives.
  2. That speechreading necessitates a "reasonable" degree of competency can be seen in the multiple uses and meanings of words like "way" and "run". Consider the following sentences:
    1. (Method). Do you know the way to cook these vegetables?
    2. (Road/path). I hope that Harry knows the way.
    3. (Distance). He's still a long way from home.

    The dictionary I'm consulting gives about 80 ways (!) the word way can be used. Similar examples can be given for the word "run", another very visible word whose meaning varies widely with context.
  3. Other kinds of example of prediction:
    1. Three strikes and you're __________.
    2. The __________ is now in his court.
    3. It's going to rain; don't forget to take your __________.
    4. Would you like a __________ of water?

    I don't want to belabor this point with more examples. Linguistic redundancy ensures that messages can be understood by normal hearing people even in unfavorable acoustic situations. A hearing loss can certainly be considered an unfavorable situation and this same redundancy can be used by people with hearing loss to get at the meaning of spoken messages.

Narrowing the Probabilities

The conditions and location under which conversations takes place can also help in the comprehension of spoken messages. One goes into a bank, post office, supermarket, travel agent, etc. with different expectations of the thrust of the conversation. You don't expect the supermarket clerk to talk about your forthcoming cruise (though you may mention it to your banker when trying to finance the trip!). You don't talk the same way (there's that word again) on the same topics with children, co-workers, friends, spouse, neighbors, etc. You don't consciously modify your utterances; it comes perfectly natural to you. It's the same when people speak to you; the content of the verbalizations will vary depending upon who it is and the specific conditions of the discourse. This does not make what they say perfectly predictable - of course not - but it does narrow the probabilities considerably. In a way (oops!), this is exactly what speechreading is about: narrowing the probabilities. We do it by using our knowledge of the language (grammar and meaning) to fill in the invisible verbal gaps and by predicting (not necessarily consciously) the topic and expressions of different people in different situations. Said differently, we have to know a lot more than we see. Therefore, the more you know (about current events, "hot" topics, anything that's likely to come up in a conversation), the better speechreader you're likely to be. In my work with people with hearing loss, I use this concept to demonstrate that we can all speechread to some extent. What I do is tell the group that I'm going to say a month of the year, and then I'll say one of the 12 months without voice. If one or more of the people can't identify the month, I switch to a day of the week (seven rather than twelve possibilities). And in an extreme case (this happened only once that I can recall), I'll ask a person to differentiate between the numbers "one" and "five". Success! I don't do this is a parlor trick. It is a serious demonstration to that their eyes can provide information to supplement what they receive through their ears.

Complement versus Supplement

Up to now I've talked about speechreading supplementing the information provided through audition. It does this indeed, but in a very fortunate way for people with hearing loss. It turns out that many of the sounds that are most difficult for people with hearing loss to hear (most of the voiceless consonants) are the easiest to see; conversely, those sounds which are the most difficult to see (or to distinguish visually) are the easier to hear. Vision and audition, therefore, are considered complementary channels; each of these separate channels is more efficient in transmitting different verbal information. This is a very important concept that deserves elaboration. Most people with hearing loss show a greater degree of hearing impairment in the higher frequencies than in the lower frequencies. It is, however, just at these high frequencies that most of the energy of the consonants of speech are located, particularly the voiceless consonants (/t/ rather than /d/, /f/, rather than /v/, or /s/ rather than /z/). Now we have long known that it more important to hear the consonants than the vowels in order to fully understand speech. This is why people with high frequency hearing losses complain that they can "hear" but not "understand" (because they can indeed hear the vowels, but have difficulty perceiving the consonants). But if you consider how most of these consonants are produced, you can see that many of the ones that are quite difficult to hear can be seen (and vice-versa, which I'll get to below).

Consider the visibility of such voiceless consonants as /f/ as in /fast/, /th/ as in /thought/, /sh/ as in shut, /p/ as in /pan/, /s/ as in /street/ (the /s/ in a blend with other sounds is easier to identify than without a blend, i.e. /seat/). Try saying these sounds and words in front of a mirror; you'll see that they are perfectly visible.

Now trying saying /ban/, /pan/ and /man/. You'll see that these three words look exactly alike. Or say /van/ and /fan/; or /dan/, /tan/ and /nan/. These, too, look exactly alike on the lips. So how are we to distinguish them? Fortunately, hearing-impaired people can use their low frequency residual hearing to distinguish these clusters. For example, the look-alike /b/, /p/ and /m/ sounds are produced differently in respect to using voice (the /b/ and /m/ as opposed to the /p/) or the presence of nasality (the /m/ compared to the other two sounds). It is precisely the presence of voicing (and some timing differences between the voiced and voiceless sounds) and nasality that help us distinguish between sound clusters that look exactly alike on the lips. What all this means is that better speech comprehension is possible when people with a hearing loss put together what they hear with what they see, rather than depending on either modality alone. The two modalities serve complementary functions in speech recognition. Now put your "head" into the equation (using the available linguistic and situational cues) and you can see that improvements in speech comprehension may be more feasible than you realized. People with poor hearing can improve their functional communication skills when they engage in speechreading. As in any other skill area, however, there is no substitute for practice and persistence.

Practice

  1. Practice need not be formal or even conscious. At its most fundamental, you are practicing each time you engage in a conversation - as long as you can see the person's lips and you focus on the meaning of the message (as opposed to trying to identify and analyze it's component parts). This is referred to as the "synthetic" approach to speechreading. It takes energy and directed attention, but it is truly speechreading "practice".
  2. Speechreading practice does not require you to discard your hearing aids. Whatever auditory information you can obtain, in quiet or in the noisiest place, can and should be integrated into the total communication exchange. While practicing speechreading without voice may be an interesting exercise, it is not natural. That's not the way people talk and it is not the way you should listen. Because of the complementary roles of audition and vision, you are almost always going to do better when you use both channels than when you use either one. This is an area where two plus two may, figuratively, exceed four. For example someone may obtain a "pure" speechreading score of 30% (no audition) and a hearing score of 20% (no vision). Put both together, and it is likely that the person's combined score may exceed 70% or more.
  3. Some people have found it useful to engage in formal speechreading exercises with a cooperative partner (family member or close friend). One easy way to do this is for the partner to read in a "normal fashion" one sentence at a time from the daily paper in the presence of some background noise (the best would be a recording of three or four people talking simultaneously). The noise should be loud enough to make it difficult to comprehend the sentence through listening alone.

The task of the speechreader is to identify the complete sentence. When errors are made (not if, errors are always going to be made), the role of the partner is to modify the missed part of the sentence (repeating, emphasizing, rephrasing, etc) until the speechreader identifies the entire sentence. When he/she does, the partner then repeats the sentences already identified and goes on to a new sentence. The experience can be made more intellectually challenging for both the speechreader and the partner if they were to analyze the nature of the errors to determine why they are being made (visibility, poor linguistic or situational context, etc.).

Summary

For most of us, speechreading is a natural and unconscious process. People with normal hearing do it all the time when the acoustical conditions become poor enough to interfere with vocal communication. For them, it is icing on the cake. For many people with hearing loss, speechreading is a crucial component of the communicative process. They routinely use their eyes to supplement the verbal information received through their ears. In this paper, I described some of the elements underlying the speechreading process, and tried to make a case that all of us are able to improve our skills - to some extent. And with practice and persistence, we can probably do better.

Acknowledgements

This article was supported, in part, by grant #RH133E30015 from the U.S. Department of Education, NIDRR.

divider between body and bottom of page
RERC brand logo

Copyright 2011 by the RERC on Hearing Enhancement -- All Rights Reserved
Last modified: 07/01/2013

For more information, email info@hearingresearch.org
For technical support with this website, email webmaster@hearingresearch.org

Valid HTML 4.01 TransitionalThis site is W3C HTML 4.01 Transitional Compliant.