The Rise of Ethical Voice Cloning in the Deepfake Voice Wars

Deepfake voice technology has experienced a dramatic evolution over the past decade. These advancements have given way to the technology’s growing popularity in multiple industries including entertainment, movies, marketing, healthcare, and customer service.

Such rapid growth and demand are always accompanied by active discussions about the ethical use of new technologies. The notorious fame of deepfake videos did not contribute to the debate. Nevertheless, today, voice cloning software is considered safe and ethical. This article will explain how and why.

The rise of voice cloning technology

Not long ago, the technology got its start with simple speech synthesizers — programs capable of converting text to human speech. Even today this technology is one of the most widespread. For example, Google Translator can read a text in a foreign language after translating it.

Text-to-speech voice conversion reached its peak in products like Descript's Overdub — ultra-realistic text-to-speech voice cloning widely used in podcasting and radio. Services like Overdub help create pieces of audio content so that producers never have to reach out to voice actors.

After realistic voice generators, the AI deepfake voice technology made its way onto the market. Using machine learning and AI algorithms, Respeecher was able to create a unique technology capable of cloning one person's voice into the voice of someone else. We’ve examined in detail how this technology is changing content production for the better in a series of articles on our blog:

In short, you can convert the voice of any person (gender does not matter) into the target voice of a person. There is only one requirement: the algorithm requires an hour-long, high-quality recording of the target's voice, allowing the AI to generate its model correctly. Once the model has been generated, you can clone unlimited speech to your target voice without sacrificing the source voices’ intonations, cadence, particular vocal emphasis, etc.

Ethical doubts around the voice cloning process

As you can see, there is nothing unethical about voice cloning technology itself. And although it uses the same AI technology as video deepfakes, there are significantly fewer examples of defamatory deepfake voices.

However, it is becoming more common for deepfakes to combine audio and video with the goal of deceiving as many people as possible. Here are the most famous examples.

Voice scammers

Every human being’s voice is unique. This is why some government and financial institutions use voice authentication to access private assets. In everyday life, most people also rely on their natural ability to distinguish the voices of friends and family when they cannot see them.

All this creates ideal circumstances for those with bad intentions to gain access to people's personal information or money.

Law enforcement agencies in many countries are busy establishing proper regulations for producing and using artificially synthesized voices. The United States has already passed a law called The Defending Each and Every Person from False Appearances by Keeping Exploitation Subject (DEEP FAKES) to Accountability Act in 2019.

Fake news

In 2020, fake news was estimated to have cost the global economy up to $78 billion. In 2019, cybersecurity company Deeptrace reported that the number of deepfake videos circulating online had surpassed 15,000. And this number would continue to double each year.

Deepfakes are widely used in the political arena — to mislead voters and manipulate facts. All this can create financial risks and damage the very fabric of our society.

Controversial media applications

Aside from malicious intent, some deepfake applications in media don’t quite qualify for compliance with ethical standards.