It may be quite hard to trust the authenticity of any recorded speech in the not too distant future.
Adobe revealed what they’re calling Photoshop for the human voice. The project is currently in development as part of a collaboration between members of Adobe Research and Princeton University (published research). Like Photoshop, Project VoCo is designed to be a state-of-the-art audio editing application.
This user interface belies an incredibly powerful speech manipulation engine. Not only can you edit dialogue by changing text, you can actually generate words that didn’t exist in the original recording.
In 2014, Andy Moorer shared his Visual Speech Editor project, which laid some of the groundwork for Project VoCo.
Adobe Audition has been featuring synthesized speech technology in the Generate Speech function since last year, which enables any TTS-compatible voice installed and licensed on the system to be used for generating speech directly in the waveform and multitrack environments.
Project VoCo builds on these concepts to provide what could be an incredible dialogue editing tool, which has really caught the attention of a lot of the industry, for a variety of reasons.
The demo presented at the Adobe Max Conference generated new words using a speaker’s recorded voice alone. Essentially, the software can understand the makeup of a person’s voice and replicate it, Photoshop ushered in a new era of editing and image creation… this tool will transform how audio engineers work with sound, polish clips, and clean up recordings and podcasts.
Don’t Talk Too Long
Project Voco can’t just generate convincing dialogue out of thin air – it needs around 20 minutes of the subject talking, in order to form some kind of “voice print”. This all sounds very 60’s science fiction I know, but it’s absolutely incredible, as this video from Adobe MAX (Sneak Peeks) shows –
Google’s DeepMind division showed off a rival voice-mimicking system called WaveNet via their website. While not nearly as amazing… it’s a close second as of now.
Obviously technology of this power could actually be very dangerous in the wrong hands. As Jordan Peele says in the video, just as they are working very hard making it sound perfect, they’re also working equally as hard to try and make it detectable, through some form of watermarking.
This factor has caused Adobe to release the following statement –
Project Voco gave us a first forward looking look at technologies from Adobe’s research labs… and may or may not be released as a product or product feature. No ship date has been announced.
I’m blown away by this leap in research, but we’ll probably have to wait a while before I’m writing any hands-on reviews!