Deepfake Detection of Media using Deep Neural Networks

3 different neural networks are used to detect any deformity/irregularity in media based on the person’s face, audio and body language.

Face Deepfake Detection

The face deepfake model uses a Maximum Margin Object Detector (to extract the face) followed by a Temporal Neural Network for classification. Face

Voice Deepfake Detection

Input audio from media is converted into a spectrogram using the librosa library, and then fed to the model which comprises of ResNet50V2 followed by a Temporal Convolutional Network, which predicts whether the given audio is deepfake or not.

Voice

Body Lanugage Deepfake Detection

Frames are extracted from the input video at the rate of 5 fps which is passed to YOLOv3 to detect full body persons in it. The full body persons is cropped out of the frame to a size of 300x300 pixels. This serves as input to the TCN model that predicts if the frame is a deepfake or not.

Body
Dheeraj Gharde
Dheeraj Gharde
MS CS - USC

My interests include fullstack development and software infrastructure engineering.