The Synthesis of Speech: Part 1 The Vocoder and Voder

Who doesn’t remember changing their voice as a kid by talking into a fan? Or sneaking off with baloons at a party or dance to inhale the helium and try to talk like a character from a cartoon? One year for Halloween I got a cheap voice changer toy that had three settings and I remember playing with it for hours. But voice changers weren’t always so cheap, and the original was room-sized instead of hand held. The initial reason behind its development had nothing to do with keeping kids amused and was not driven by aesthetic concerns. It was only after Ma Bell and the military had wrapped up their use for the Vocoder that it came to be appreciated for its musical qualities, first by experimental electronic musicians, and later pop, rock and rap artists. The next few editions of the Music of Radio series delves into the story of electronic speech synthesis, from the Vocoder, to the Voder and on to the first text-to-speech computer programs written for gargantuan mainframes. It takes us deep into the stacks of the Bell Laboratory Archives and into the belly of WWII crypto communications before emerging in the 1960’s and ’70’s when the stage was set for mind melting explorations in sonic psychedelia. Just as the Vocoder is still be used for artistic effects the original ideas behind it, compression and bandwidth reduction, continue to be used in new hardware and software applications for radio and telecommunications.

vocaltractHomer Dudley, the inventor of the Vocoder, was an electronic and acoustic engineer whose primary area of focus revolved around the idea that human speech is fundamentally a form of radio communication. In his white-paper The Carrier Nature of Speech he wrote that “speech is like a radio wave in that information is transmitted over a suitably chosen carrier.” This realization came to Dudley in October of 1928 when he was otherwise out of commission in a Manhattan hospital bed. Discoveries are often made from playfully messing around with things, either in horseplay or boredom, and Dudley was keeping himself entertained just as a kid might by making weird sounds with his voice through changing the shape of his mouth. He had the insight that his vocal cords were acting as a transmitter of a periodic waveform. The nose and throat were the resonating filters while the mouth and tongue produced harmonic content, or formants to use linguistic lingo. He also observed that the frequencies of his voice vibrated at a faster rate than the mouth itself moved.

These insights went on to have implications for the work he pursued at Bell Laboratories, a true idea factory, where money and resources were thrown at any old project that might bear the AT&T monopoly some form of fruit or further advantage in their already sprawling playground of wires and exchanges. Once recovered and back at work Homer thought his discovery might have an application in the area of compression and he made it his ambition to free up some of the phone companies precious bandwidth hoping to pack in more conversations onto the copper lines. He was given a corner and allowed to go work in it, devoting himself to his obsession.

He exploited his research in the invention of the Vocoder, or VOice CODER, first demonstrated at Harvard in 1936. It works by measuring how the spectral characteristics of speech change over time. The signal going into the mic is divided by filters into a number of frequency bands. The signal present at each frequency gives a representation of the spectral energy. This allows the Vocoder to reduce the information needed to store speech to a series of numbers. On the output end to a speaker or headphone the Vocoder reverses the process to synthesize speech electronically. Information about the instantaneous frequency of the original voice signal is discarded by the filters giving the end result it unique robotic and dehumanized characteristics. The amplitude of the modulator for each of the individual analysis bands generates a voltage that controls the amplifiers in each corresponding carrier band. The frequency components of the modulated signal are mapped onto the carrier signal as discrete amplitude changes in each of the frequency bands. Because the Vocoder does not employ a point-by-point recreation of the wave, the bandwidth used for transmission can be significantly reduced.

There is usually an unvoiced band or sibilance channel on a Vocoder for frequencies outside the analysis bands for typical talking, but still important in speech. These are words starting with the letters s, f, ch or other sibilant sounds. These are mixed with the carrier output for increased clarity, resulting in recognizable speech but still roboticized. Some Vocoders have a second system for generating unvoiced sounds, using a noise generator instead of the fundamental frequency.

To better demonstrate the speech synthesis ability of the decoder part of his invention Dudley created another instrument, the Voder (Voice Operating Demonstrator). This was unveiled during the World Fair in New York in 1939 where Ray Bradbury was among the attendees who witnessed it firsthand. ┬áThe Voder synthesized speech by creating the electronic equivalent of a vocal tract. Oscillators and noise generators provided a source of pitched tone and hiss. A 10-band resonator filter controlled by a keyboard converted the tone and hiss into vowels, consonants and inflections. Another set of extra keys allowed the operator to make the plosive sounds such as “p” and “d” as well as affrictive sounds of “j” in “jaw” and “ch” in “cheese”. Only after months of practice with this difficult machine could a trained operator produce something recognizable as speech.

At the world fair Mrs. Helen Harper, who was noted for her skill, led a group of twenty operators in demonstrations of the Voder where people from the crowd could come up and ask the operator to make the Voder say something.

Homer Dudley had great success in his aim of reducing bandwidth with the Vocoder. It could chop up voice frequencies into ten bands at 300 hertz, a significant reduction of what was required for a phone call back in the day. Yet it never got used for that purpose. The large size of the equipment was impractical to install in homes and offices across the country even if it created more channels on the phone lines. For a time Dudley worked at marketing the Vocoder to Hollywood for use in audio special effects. It never made much of an impact there as other voice changing devices such as the Sonovox started being used in radio jingles and in cartoons. Before it could be discovered by musicians Homer Dudley’s tool for voice compression had to eb put into service during America’s efforts in WWII where it was used as part of the SIGSALY encryption program. The details surrounding the coding of the voices of MacArthur and Churchill will be explored in next months column.


[This article originally appeared in the December 2016 issue of the Q-Fiver]

