Page 798 - Emerging Trends and Innovations in Web-Based Applications and Technologies
P. 798
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
(HQ), and heavily compressed (LQ)—with HQ being the processing time by a factor of 10×. In practical applications,
default unless specified otherwise. we integrate OST within the MAML framework to maximize
efficiency.
To assess the generalization capability of the proposed
method, we evaluate it across four additional Impact of Multiple Gradient Descent Steps
benchmark datasets: To determine whether additional gradient descent steps
enhance detection accuracy, we perform ablation studies by
DeepfakeDetection (DFD): Comprising 363 real varying the number of updates during evaluation. The model,
videos and 3,068 deepfake videos generated using an trained on FF++, is tested on DFDC, DFD, and DF1.0
enhanced DeepFake approach.
datasets. As shown in Table 7, increasing the number of
Deepfake Detection Challenge (DFDC): Includes over gradient descent steps does not yield substantial
1,000 real and more than 4,000 fake videos, improvements in accuracy, while computational cost scales
manipulated through various deepfake, GAN-based, and proportionally. Additionally, using multiple updates
traditional non-learned techniques. significantly increases memory consumption in an MAML-
based framework. To balance efficiency and accuracy, we opt
DeeperForensics-1.0 (DF1.0): Consists of over 11,000 for a single gradient descent step in our final method.
deepfake videos created using the DFVAE method.
Conclusion and Discussions
CelebDF: Contains 408 real and 795 synthetic videos, In this research, we present a novel learning paradigm
produced with an improved DeepFake method.
tailored specifically for the generalizable deepfake detection
Notably, the fake videos in the training and test sets do not challenge. To summarize, we recommend finetuning the
share the same content or generation techniques, ensuring a pretrained detector with a pseudo-training sample, which is
fair evaluation of model performance. created by blending the test samples with a randomly picked
template picture, prior to the classification phase. We
Implementation Details empirically demonstrate that the proposed online training
The Xception network is employed for face alignment, with strategy allows the pretrained model to adjust to sample-
aligned faces resized to 256 × 256 pixels. The model weights specific statistics, hence improving generalizability. We
are initialized using pretrained ImageNet parameters, and implement our method in a MAML-based framework to allow
DLIB is utilized for face extraction. for rapid adaption to varied test samples, and it outperforms
For training, we optimize the model using the Adam state-of-the-art methods for generalization to previously
optimizer with β1 = 0.9 and β2 = 0.999, and a meta-batch encountered forgeries and other postprocessing processes.
size of 20. The learning rates for the inner update (γ) and Limitations and future work. Because the pseudo-training
meta update (λ) are set to 0.0005 and 0.0002, respectively, examples are synthesized using existing deepfake pipelines,
for both the offline and online training phases.
our method cannot be applied to scenarios where the fake
Further Analysis images are made using different protocols, such as when fake
During the forgery generation process, a training sample is images are completely synthesized using GAN-based
randomly chosen and blended with the test sample. To methods. Our next work will focus on developing approaches
evaluate the effectiveness of this selection method, we for deepfakes as well as GAN-synthesized fake images.
conduct ablation studies comparing it against two alternative Meanwhile, DLIB is employed in our forgery synthesis
strategies: workflow to identify and extract facial landmarks. Given that
1. Nearest Neighbor (NN) Sampling – The training there are instances in which DLIB may fail simultaneously
sample with the closest feature distance to the test with OST. Thus, a more effective facial detection system
sample is selected for blending. promotes OST. Ethical statement. This initiative aims to
assist people in combating the exploitation of deepfake
2. Average (Avg) Sampling – The performance of multiple technology. It does not.involve any human or animal
training samples is averaged for evaluation. subjects, and there is no infringement of personal privacy
These configurations are tested on four types of data within during the experiment. We do not expect any possible
the FF++ dataset and their performance is further validated negative implications for our efforts. We believe that our
on the DFDC, DFD, and DF1.0 datasets to ensure research and the release of our code will increase scientific
generalizability. and society awareness of the subject of generalizable
deepfake detection. Acknowledgements. Liang Chen is
OST Adaptation
financed by the China Scholarship Council (CSC Student ID:
The proposed learning framework is based on Model- 202008440331).
Agnostic Meta-Learning (MAML), which facilitates rapid
adaptation to new tasks. To assess the role of One-Shot References
Test-Time Training (OST) within this framework, we [1] Isao Echizen, Junichi Yamagishi, Vincent Nozick, and
compare its performance against a conventional training Darius Afchar. Mesonet: a small network for detecting
scheme while keeping the dataset and Xception backbone face video forgeries. In WIFS, 2018.
unchanged. Results, presented in Table 6, show that OST [2] Koki Nagano, Yuming Gu, Mingming He, Hany Farid,
with standard end-to-end training performs comparably to Hao Li, and Shruti Agarwal. defending global leaders
MAML. However, when the OST process is removed, from deepfakes. 2019's CVPR Workshops.
detection performance declines, indicating that the method
enhances conventional training configurations and [3] Alberto Del Bimbo, Leonardo Galteri, Roberto Caldelli,
strengthens widely used network architectures. While both and Irene Amerini. Deepfake video detection using a
approaches are effective, the MAML framework accelerates CNN based on optical flow. In Workshops on ICCV,
adaptation during inference—reducing test sample 2019.
IJTSRD | Special Issue on Emerging Trends and Innovations in Web-Based Applications and Technologies Page 788