First Order Motion Model in action

Seeing used to be believing. Thank AI, we finally have to farewell this cute and naive, but dangerous faith. Because it actually never was. In the XXth century, photos were retouched by repressive regimes. With Deep Learning, we experience new ways to re-illustrate reality. It is not a danger; it’s a chance.

Among various methods, the framework and paper First Order Motion Model for Image Animation by Aliaksandr Siarohin et al. captivates through its brilliant idea:

Image animation consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video. Our framework addresses this problem without using any annotation or prior information about the specific object to animate. Once trained on a set of videos depicting objects of the same category (e.g. faces, human bodies), our method can be applied to any object of this class. (Source, my emphasis)

The key points are set along with transformations (similar to a puppet tool in Photoshop, or like sensors on motion capture suit) — and so the trained movement can be transferred to a target image.

The requirement is the same object category.

Shortly put, the unsupervised learning approach analyzes the motion data in source footage, universalizes it and, applies to target footage.

Movement sample from the paper (Source) 1)

Face-Swap: Deepfakes?

It also allows face-swap in quite a different way than face2face-approach. While face2face engages the face detector and applies the facial features on the target image, the framework “First Order Motion Model” goes another way:

Motion is described as a set of keypoints displacements and local affine transformations. A generator network combines the appearance of the source image and the motion representation of the driving video. In addition, we proposed to explicitly model occlusions in order to indicate to the generator network which image parts should be inpainted (source).

And it works astonishingly well.

You can try it out either using the GitHub repository or Colab Notebook.

I tried my luck on Nefertiti using the footage of AI pioneer Geoffrey Hinton. This footage is delivered with Notebook. You can use another video material. It has to fit into specific requirements and sizes.

The result was more than convincing: