Published by the Students of Johns Hopkins since 1896
April 26, 2024

Voice recognition technology used to record Holocaust victims' stories

By Jonathan Grover | November 1, 2001

In 1994, filmmaker Steven Spielberg established the Shoah Visual History Foundation, a center dedicated to videotaping and saving the testimonies of Holocaust survivors and witnesses. Since its inception, the Shoah Foundation has amassed over 116,000 hours of digitized Holocaust testimonies from over 52,000 interviews in 32 different languages.

Nonetheless, while achieving these tremendous statistics was no small feat, cataloguing those same videos has proven to be equally difficult. Thus far, $8 million has been spent in order to index less than 10 percent of the video library, with no end in sight.

That was, of course, until the National Science Foundation intervened, with aid totaling $7.5 million in a research grant. The grant will jointly fund efforts by IBM, Johns Hopkins University and the University of Maryland to develop speech recognition technologies which will help to accomplish the indexation task.

Currently, it can take up to 35 hours to index a two-and-a-half hour interview, according to Sam Gustman, executive director of technology at the Shoah Foundation. The challenge, will be a daunting undertaking for these scientists over the next five years.

Scientist plan to employ computerized voice recognition software to index the interviews. Voice recognition software allows the computer to digitally analyze a person's voice and convert that sound into information the computer can understand. Voice recognition software is mainly used as a transcription device, allowing people to dictate to their computer and have the computer turn their voice into digital text.

Current voice recognition technologies are best utilized in controlled environments and settings, such as for dictation in an office or transcription of a television program.

The technologies required for this project must be much more dynamic and complex. Not only do they have to handle very emotional Holocaust survivors, whose tone and accent may change at times during the interview, but also handle the 32 languages spoken, which include obscure Gypsy dialects.

Voice recognition software has three key stages. The first is the comprehension of the logic or grammar underlying the language being spoken. The second is taking a sound and measuring how similar it is to a word that the computer can recognize. In the final stage the computer attempts to find the correct combination of words and grammatical structure of the text.

While the scientists involved have reservations about the success rate possible over the next five years, perfection may not be necessary.

"The major goal is getting the words as accurate as possible, which may only be 60 percent of the time. That's good enough for retrieval, but not for transcription," said Dagobert Soergel, a University of Maryland professor.

The joint effort, will, however, take a three pronged approach. The University of Maryland researchers will focus on cataloging and indexing efforts, while IBM researchers will work at developing software that uses algorithms to improve recognition.

Johns Hopkins researchers, on the other hand, will concentrate their efforts on speech recognition technology for foreign languages, Czech, then Russian, Hungarian, Polish and Slovak, according to Dr. Bill Byrne, an associate research professor at Johns Hopkins.

Though trying to reinvent the wheel, so to speak, is no simple task, the research will contribute greatly to current speech recognition technologies and ultimately benefit the end-user.


Have a tip or story idea?
Let us know!

Comments powered by Disqus

Please note All comments are eligible for publication in The News-Letter.

Podcast
Multimedia
Be More Chill
Leisure Interactive Food Map
The News-Letter Print Locations
News-Letter Special Editions