When you’re a world-class researcher for the likes of Carnegie Mellon, a good day usually means a long time in the lab. That’s not always true for Alex Waibel.
On July 14, his research took him 13,000 feet beneath the North Atlantic to the wreck of the Titanic to test his text-to-video technology.
“Well, the stress and danger of it all was surprising — it’s still very experimental,” says Waibel, who has professorships at Karlsruhe Institute of Technology in Germany and at CMU. “The middle of the Atlantic — weather is always an issue. It’s 13,000 feet, dangerous in every way. The submarine has to withstand the pressure. Then you have to find it down there, right? I mean, these expeditions — it took years to find. It’s just debris and metal pieces scattered.
“And there’s no communication. There’s no radio (that works) underwater. And that was really what generated the idea.”
It’s just an amazing and surreal environment at the bottom of the ocean atop the world’s most legendary shipwreck.
“One thing that’s cool is the bioluminescence of the critters that follow the sub on the way down for a while, to see if we’re edible enough,” Waibel says.
“Then it’s pitch black from 300 meters all the way down to 4,000 meters. Then (at the bottom) it’s like having the pieces of the wreck come out of the darkness when you shine the light.”
Waibel’s family lives in Seattle, where the OceanGate expedition crew exploring the Titanic is based. He visited them and asked about their operations, and found that they had a particular problem — it’s really hard to communicate through thousands of meters of salt water with a submersible. All that can get through is basically text messages via sonar to the surface.
“And so I said, ‘Well, we have technology that I have been spending my lifetime on building, speech translation systems where we take speech and have it translated into another language, and speak it out loud on the other side. So we should be able to overcome this problem.”
Waibel is a world-renowned expert in translation systems, developing a simultaneous lecture interpretation service and the first commercial speech translation system on a mobile phone. He has also developed dialogue translators for humanitarian missions and interpretation support for the European Parliament. Zoom acquired the translation company, Kites, which he co-founded in 2021. Waibel is a Zoom research fellow who advises the company in AI and language tech.
What Waibel did for the Titanic expedition was a process with only a few (very difficult) steps. Put simply, the process reconstructs actual video from texts using a synthesized voice that adapts to sound like the speaker.
“We have a neural network that learns how to move the lips appropriately,” explains Waibel. Based on just a photo of the speaker, the AI creates the video.
“At the surface, we take the text, and we turn it back into audible speech by speech synthesis, and then combine it with one of our latest inventions, which then generates video of the same person reading lips synchronously.
“It is as if we can now carry out video conferences from the abyss,” notes Waibel.
This isn’t the only use for this technology. It’s helpful for scenarios where video conferencing’s required over low-bandwidth transmission, or when poor quality video feeds keep cutting out. It could also help to synthesize videos in different languages.