Google has today introduced ‘Translatotron’ — a new speech-to-speech translation tool that translates one language into another without involving text. This tool is a result of several years of research and development and is still in the development phase.
Unlike other translation tools, Tranlatotron skips the conventional steps of translating speech to text and then translating it back to speech for the final result. Instead, it translates the speaker’s voice directly into speech in another language.
According to Google Research’s GitHub page, it is an “attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation.”
Google briefs that the tool forms a spectrogram — a visual representation of frequencies of the input speech and then generates a new spectrogram of the output language. The translated speech is a bit robotic but still carries some elements of the speaker’s voice.
Since the additional steps including translation of speech-to-text and then text to speech are not involved, it is a fairly fast tool that translates speech directly to speech.
Google tested the Translatotron on a variety of languages. You can check out the samples here.
The tool is far from perfect but it is a major breakthrough in translation techniques. We hope that the system will hone with time and will be less robotic soon.