Choose a person’s photo. Upload the image to the website Creative Reality Studio, along with a text. Wait one moment. Ready. Now, the person in the photo appears in a video speaking the text written by you. It sounds like witchcraft, but it’s more of a feat of artificial intelligence. It is under Israeli control. D-ID.
The company claims that the service is designed to generate videos for training, corporate education, marketing campaigns, internal communication, and so on.
This must be why D-ID avoids using the term deepfake. The word is strongly associated with the negative use of technology (for political manipulation, for example).
You don’t need to be an expert in video editing to enjoy Creative Reality Studio. The site is intuitive and allows anyone to create a video with a person talking, in seconds or minutes.
All you have to do is choose one of the presenters available on the site and type some text in the field to the side.
The default language is US English, but there are 119 languages available, including Brazilian Portuguese. You can choose from a range of male and female voice options. A voice style associated with an emotion can also be chosen: sad, friendly, hopeful, angry, among others.
It is also possible to upload an audio file with the person’s voice to give more realism to the expected content.
You already know what happens next. All this data is used by the artificial intelligence system to generate the video. Then, just download and publish the video on corporate pages, social networks and so on.
As already mentioned, it is also possible to generate a video by uploading a simple photo. Here is the result with an image of mine:
But does it work?
It works. The result is not always immediate, however. The wait time for the video to be generated depends on the duration of the speech, the language and even the chosen presenter. In any case, the process does not usually take more than a few minutes and lasts only a few seconds if the material is short.
Language is a critical factor here. There are numerous voices for American English, for example. On the other hand, there is only one female voice and one male voice for Brazilian Portuguese. At least the pronunciation is done correctly, almost always.
Overall, the result is convincing, although it’s easy to see that it’s a deepfake. Note, for example, that the head has a patterned movement, as if it were a choreography. In addition, it is possible to perceive that the movement of the lips does not always match the words pronounced.
In fact, these are the most obvious signs that you can analyze for find out if a video is deepfake.
Filters against malicious deepfakes
Gil Perry, CEO of D-ID, made it clear to the TechCrunch that Creative Reality Studio is designed for legitimate use cases, i.e. without malicious purposes. As an example, the executive explained that technology can be used for a company’s CEO to send a message to his employees in multiple languages.
But D-ID knows that deepfakes have been used for political manipulation or to damage the image of public figures, for example. That’s why the company’s system has some filters.
Algorithms can stop swearing and racist expressions, for example. In addition, the technology imports an API from Microsoft Azure that eliminates sexual or offensive lines in videos.
There is also an image recognition system that prevents — or at least tries to prevent — the use of images of famous people. I tried to upload a picture of Bill Gates and it didn’t work.
On the other hand, in the tests I did, the profanity filter worked in English, but not in Brazilian Portuguese.
There is a policy against misuse, however. D-ID explains that, in case of violation of the rules, the user can be banned from the platform and have their content deleted.
Service is paid, but has a free trial
If you want to test it, just register on Creative Reality Studio website. The service currently offers two paid plans. The first costs $49 per month and entitles you to 60 credits. Each credit corresponds to 15 seconds of video. The second is an unlimited plan, but whose value must be negotiated.
There is also a free plan, for trials, with 20 credits and duration of 14 days. In this, the generated videos have watermarks over the entire image.
For paid accounts, there are additional features such as a PowerPoint plugin, email support, and presenters with more realistic facial expressions.