Future Media Hubs , New Technology Hub

DW tried different ways to create a virtual presenter – here's what they've learned

DW's virtual presenter "Avatario"

What do you need to consider when creating an AI avatar? Which workflow is the best? Here's what the DW Lab "Avatario" team learned while creating a very special host for vertical video content.

Let's start this post with an important disclaimer: DW doesn't intend to replace human hosts with artificial ones. And they are fully aware that AI-driven applications carry a significant risk of manipulation, discrimination and malfunction. There's also the question of general journalistic credibility. That's why DW guidelines state that "artificial intelligence is only used where it contributes to the fulfillment of their mission: to deliver high-quality, independent, impartial, diverse and trustworthy journalism."

With that in mind, they wanted to know: Is there a way to develop a trustworthy, virtual host for their tech magazine DW Shift? Because it would come in really handy, for two reasons:

  1. Moderated reels fare much better on social media than unmoderated ones.
  2. Smaller DW newsrooms often lack the resources to provide a host for a translated/localized version of a DW Shift Story–and thus have to post less successful standard clips.

So can an AI-based avatar fix that problem?

 

 

A non-human look

After an initial discussion, it quickly became clear that the AI avatar presenter would have to look non-human and non-photorealistic–and rather like a robot or digital entity. That's because a realistic avatar might have accidentally resembled a real person, potentially violating their personal rights. This is a concern that has been discussed extensively in the video game industry and, more recently, in relation to Generative AI. They didn't want to take any risks. Apart from that, they also wanted to make their avatar look "techy" enough, making sure it would be recognized as a part of the tech world. After all, its focus was on tech reporting, right? So this was an editorial decision.

 

 

2D or 3D? May the best workflow win!

Since the final prototype would result in a "flat" medium – i.e.: a vertical video – DW wasn't sure if they should create a 2D or a 3D avatar. Consequently, they decided to try both, aiming to develop a simple workflow for editors with lots of daily tasks and limited time/resources. 

 

 

Workflow 1: Creating a 2D avatar with Midjourney, Blender, and D-ID  

In this test, DW started out with creating two images of a robot. The first one was put together with Midjourney (with a prompt à la "portrait of a friendly humanoid robot"), then remixed with another robot picture, then altered many times to change specific parts of the robot's face. The second image was designed from scratch in Blender–without the use of any AI tools. The designs were then fed into D-ID, an image-to-video tool that creates an animated clip based on an image and a script (and integrated speech synthesis). Well, both the look and the animation of the 2D character didn't convince them. The avatar's 'lip movements' seemed out of sync at times, resulting in an unwanted uncanny valley effect. Furthermore, using non-human faces as source material messed up the animation.  Ultimately, the D-ID test failed for legal reasons: The platform's general terms and conditions didn't meet DW standards for copyright, usage, and data protection. Therefore they decided not to pursue this workflow any further.

A little creepy, technically flawed, and not compliant with DW regulations: The D-ID/Midjourney Avatar

 

Workflow 2: Creating a 3D avatar with Blender, After Effects, ElevenLabs, and Adobe Character Animator 

This time, DW initiated the process in Blender, where they designed, animated and rendered the main avatar look. Then they exported the 3D data (.FBX) to After Effects for camera and facial positioning. To give the avatar a synthetic voice, they also created a sound file with ElevenLabs. With the help of Adobe Character Animator, they analyzed the sound file and generated visemes (i.e. specific facial expressions used by animators to create the illusion of speech), which were subsequently exported to After Effects where they replaced the avatar's mask face. DW also added eye animation there. Finally, they rendered the avatar clip with alpha for further editing in Adobe Premiere.

So what about the results? The good news: They were flawless. The bad news: The process was very time consuming.

 

Polished, popular, and rather difficult to create: The DW Blender/Adobe/ElevenLabs bot.

 

Workflow 3: Creating a 3D avatar with Blender and a virtual control room with Unreal

In this case, DW imported the .FBX as described above to Unreal.

Then they set up a 3D level as a stage, placed virtual cameras and virtual lighting, and turned the avatar into a playable character–including a character blueprint, an animation blueprint, an idle animation, and a visme face. Done.

In this workflow, they eventually had to deal with very different kinds of problems: First of all, video export was only possible via an external command-line encoder (FFMpeg) which didn't support sufficiently high video quality and later on led to a very slow rendering process. Secondly, video export from Unreal didn't support visemes, leaving the avatar without facial expressions – and thus making it useless.

 

 

Professional, metaversy, and impossible to export in video form: Our bot avatar in the Unreal Engine.

 

 

Workflow 4: Creating a 3D avatar and a video generator with Blender and XCode

Once again, DW used the .FBX avatar put together with Blender (s. Workflow 2), but then went in a new direction.

They created a prototype app for MacOS and iOS:

Users type up or import a script, which the app will then turn into synthetic audio. Subsequently, the software uses the generated audio to create face animations, applied to the 3D model's face texture. The renderer combines face animations with predefined body animations. In the end, users look at an AV clip (with a clean background) that can be exported to any external video editing app.

The MacOS/iOS apps already work pretty well, so this can be considered a solid proof of concept.

 

Allows users to create an avatar-hosted clip in no time: The Avatario iOS app.

 

Can DW's AI avatars act as web video hosts?

DW is currently testing their 3D prototypes with selected users. The avatars fare relatively well, but people also tell them that there's room for improvement:

Some users find the avatar's movements too robotic and repetitive and suggest they should work on its general appearance. Some say they should make it more recognizable as a DW host ("Maybe it could wear a tie?", "This one looks like a character from a children's program"). Others express dissatisfaction with its voice ("too cold", "too  impersonal", "not smooth enough").

So their answer to the questions is: No, DW's AI avatars can't act as web video hosts – not yet. However, 3D tech and AI voice synthesis is getting better every day, and with a revamp of the avatar's look and animation, who knows what will happen? Maybe DW's avatars will really come to live in the future.

 

Special thanks to: Andy Giefer, Philip Kretschmer, Lars Jandel, Jens Röhr, Juan Gomez Lopez, Marie Kilg and everybody else who supported the Avatario project.
Written by Daniela Späth & DW innovation