Hey Guys!! I am back with a new blog on creating deepfakes using First Order Motion Model. Creating deepfakes in the past was not an easy task, however with recent advances it became a five-minutes job. In this blog, we will explore how deepfakes are created and how we’ll apply a First Order Motion Modeling method, which allows us to create deepfakes in a matter of minutes.

### What are DeepFakes ?

Deepfakes are synthetic media in which a person in an existing image or video is replaced with someone else’s likeness. While the act of faking content is not new, deepfakes leverage powerful techniques from machine learning and artificial intelligence to manipulate or generate visual and audio content with a high potential to deceive. The main machine learning methods used to create deepfakes are based on deep learning and involve training generative neural network architectures, such as autoencoders or generative adversarial networks (GANs).

DeepFakes are realistic-looking fake videos, in which it seems that someone is doing and/or saying something even though they didn’t.

### How Deepfakes are Created?

The basis of deepfakes, or image animation in general, is to combine the appearance extracted from a source image with motion patterns derived from a driving video. For these purposes deepfakes use deep learning, where their name comes from (deep learning + fake). To be more precise, they are created using the combination of autoencoders and GANs.

Autoencoder is a simple neural network, that utilizes unsupervised learning (or self-supervised if we want to be more accurate). They are called like that because they automatically encode information and usually are used for dimensionality reduction.

Generative Adversarial Networks or GANs are composed of two networks that are competing against each other. The first network tries to generate images that are similar to the training set and it is called the generator. The second network tries to detect where does the image comes from, training set, or the generator and it is called – the discriminator.

## First Order Model for Image Animation

The whole process of First Order Model is separated into two parts: Motion Extraction and Generation. As an input the source image and driving video are used. Motion extractor utilizes autoencoder to detect keypoints and extracts first-order motion representation that consists of sparse keypoints and local affine transformations. These, along with the driving video are used to generate dense optical flow and occlusion map with the dense motion network. Then, the outputs of dense motion network and the source image are used by the generator .

It also has features that other models just don’t have. The really cool thing is that it works on different categories of images, meaning you can apply it to face, body, cartoon, etc. This opens up a lot of possibilities. Another revolutionary thing with this approach is that now you can create good quality Deepfakes with a single image of the target object, just like we use YOLO for object detection.

We will be using pre-trained model and use our source image and driving video to generate deepfakes.

#### Importing necessary libraries

import imageio
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from skimage.transform import resize
from IPython.display import HTML
import warnings
warnings.filterwarnings("ignore")

#### Cloning the repository and Mounting Google Drive

We need to do is clone the repository and mount your Google Drive. Once that is done, you need to upload your image and driving video to drive. Make sure that image and video size contains only face, for the best results. Then all you need is to run the below piece of code.

!git clone https://github.com/imvansh25/first-order-model.git

cd first-order-model

drive.mount('/content/gdrive')

#### Load driving video and source image

#crop your video
!ffmpeg -i /content/gdrive/My\ Drive/first-order-motion-model/p1.mp4 -ss 00:08:57.50 -t 00:00:08 -filter:v "crop=600:600:760:50" -async 1 p1.mp4

#Resize image and video to 256x256
source_image = resize(source_image, (256, 256))[..., :3]
driving_video = [resize(frame, (256, 256))[..., :3] for frame in driving_video]

def display(source, driving, generated=None):
fig = plt.figure(figsize=(8 + 4 * (generated is not None), 6))

ims = []
for i in range(len(driving)):
cols = [source]
cols.append(driving[i])
if generated is not None:
cols.append(generated[i])
im = plt.imshow(np.concatenate(cols, axis=1), animated=True)
plt.axis('off')
ims.append([im])

ani = animation.ArtistAnimation(fig, ims, interval=50, repeat_delay=1000)
plt.close()
return ani

HTML(display(source_image, driving_video).to_html5_video())

Now, we’ll create a model and load the checkpoints.

from demo import load_checkpoints
generator, kp_detector = load_checkpoints(config_path='config/vox-256.yaml', checkpoint_path='/content/gdrive/My Drive/first-order-motion-model/vox-cpk.pth.tar')

## Performing image animation

from demo import make_animation
from skimage import img_as_ubyte

predictions = make_animation(source_image, driving_video, generator, kp_detector, relative=True)

#save resulting video
imageio.mimsave('../generated.mp4', [img_as_ubyte(frame) for frame in predictions])

HTML(display(source_image,driving_video, predictions).to_html5_video())

## Conclusion

Deepfakes have garnered widespread attention for their uses in fake news, frauds, scams, and many other illegal activities. It is getting harder and harder to understand what is truth and what is not. It seems that nowadays we can not trust our own senses anymore. So, be careful while using deepfakes.

You can try it out either using the GitHub repository or Colab Notebook.

-Vansh Gupta

Categories: Deep LearningPython

$${}$$