Page 798 - Emerging Trends and Innovations in Web-Based Applications and Technologies
P. 798

International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
             (HQ), and heavily compressed (LQ)—with HQ being the   processing time by a factor of 10×. In practical applications,
             default unless specified otherwise.                we integrate OST within the MAML framework to maximize
                                                                efficiency.
             To assess the generalization capability of the proposed
             method,  we  evaluate  it  across  four  additional   Impact of Multiple Gradient Descent Steps
             benchmark datasets:                                To  determine  whether  additional  gradient  descent  steps
                                                                enhance detection accuracy, we perform ablation studies by
               DeepfakeDetection  (DFD):  Comprising  363  real   varying the number of updates during evaluation. The model,
                videos and 3,068 deepfake videos generated using an   trained  on  FF++,  is  tested  on  DFDC,  DFD,  and  DF1.0
                enhanced DeepFake approach.
                                                                datasets.  As  shown  in  Table  7,  increasing  the  number  of
               Deepfake Detection Challenge (DFDC): Includes over   gradient  descent  steps  does  not  yield  substantial
                1,000  real  and  more  than  4,000  fake  videos,   improvements in accuracy, while computational cost scales
                manipulated through various deepfake, GAN-based, and   proportionally.  Additionally,  using  multiple  updates
                traditional non-learned techniques.             significantly increases memory consumption in an MAML-
                                                                based framework. To balance efficiency and accuracy, we opt
               DeeperForensics-1.0 (DF1.0): Consists of over 11,000   for a single gradient descent step in our final method.
                deepfake videos created using the DFVAE method.
                                                                Conclusion and Discussions
               CelebDF: Contains 408 real and 795 synthetic videos,   In  this  research,  we  present  a  novel  learning  paradigm
                produced with an improved DeepFake method.
                                                                tailored specifically for the generalizable deepfake detection
             Notably, the fake videos in the training and test sets do not   challenge.  To  summarize,  we  recommend  finetuning  the
             share the same content or generation techniques, ensuring a   pretrained detector with a pseudo-training sample, which is
             fair evaluation of model performance.              created by blending the test samples with a randomly picked
                                                                template  picture,  prior  to  the  classification  phase.  We
             Implementation Details                             empirically demonstrate that the proposed online training
             The Xception network is employed for face alignment, with   strategy allows the pretrained model to adjust to sample-
             aligned faces resized to 256 × 256 pixels. The model weights   specific  statistics,  hence  improving  generalizability.  We
             are initialized using pretrained ImageNet parameters, and   implement our method in a MAML-based framework to allow
             DLIB is utilized for face extraction.              for rapid adaption to varied test samples, and it outperforms
             For  training,  we  optimize  the  model  using  the  Adam   state-of-the-art  methods  for  generalization  to  previously
             optimizer with β1 = 0.9 and β2 = 0.999, and a meta-batch   encountered forgeries and other postprocessing processes.
             size of 20. The learning rates for the inner update (γ) and   Limitations and future work. Because the pseudo-training
             meta update (λ) are set to 0.0005 and 0.0002, respectively,   examples are synthesized using existing deepfake pipelines,
             for both the offline and online training phases.
                                                                our method cannot be applied to scenarios where the fake
             Further Analysis                                   images are made using different protocols, such as when fake
             During the forgery generation process, a training sample is   images  are  completely  synthesized  using  GAN-based
             randomly  chosen  and  blended  with  the  test  sample.  To   methods. Our next work will focus on developing approaches
             evaluate  the  effectiveness  of  this  selection  method,  we   for  deepfakes  as  well  as  GAN-synthesized  fake  images.
             conduct ablation studies comparing it against two alternative   Meanwhile,  DLIB  is  employed  in  our  forgery  synthesis
             strategies:                                        workflow to identify and extract facial landmarks. Given that
             1.  Nearest  Neighbor  (NN)  Sampling  –  The  training   there are instances in which DLIB may fail simultaneously
                sample  with  the  closest  feature  distance  to  the  test   with  OST.  Thus,  a  more  effective  facial  detection  system
                sample is selected for blending.                promotes  OST.  Ethical  statement.  This  initiative  aims  to
                                                                assist  people  in  combating  the  exploitation  of  deepfake
             2.  Average (Avg) Sampling – The performance of multiple   technology.  It  does  not.involve  any  human  or  animal
                training samples is averaged for evaluation.    subjects, and there is no infringement of personal privacy
             These configurations are tested on four types of data within   during  the  experiment.  We  do  not  expect  any  possible
             the FF++ dataset and their performance is further validated   negative implications for our efforts. We believe that our
             on  the  DFDC,  DFD,  and  DF1.0  datasets  to  ensure   research and the release of our code will increase scientific
             generalizability.                                  and  society  awareness  of  the  subject  of  generalizable
                                                                deepfake  detection.  Acknowledgements.  Liang  Chen  is
             OST Adaptation
                                                                financed by the China Scholarship Council (CSC Student ID:
             The  proposed  learning  framework  is  based  on  Model-  202008440331).
             Agnostic Meta-Learning (MAML), which facilitates rapid
             adaptation  to  new  tasks.  To  assess  the  role  of  One-Shot   References
             Test-Time  Training  (OST)  within  this  framework,  we   [1]   Isao Echizen, Junichi Yamagishi, Vincent Nozick, and
             compare  its  performance  against  a  conventional  training   Darius Afchar. Mesonet: a small network for detecting
             scheme while keeping the dataset and Xception backbone   face video forgeries. In WIFS, 2018.
             unchanged. Results, presented in Table 6,  show that  OST   [2]   Koki Nagano, Yuming Gu, Mingming He, Hany Farid,
             with standard end-to-end training performs comparably to   Hao Li, and Shruti Agarwal. defending global leaders
             MAML.  However,  when  the  OST  process  is  removed,   from deepfakes. 2019's CVPR Workshops.
             detection performance declines, indicating that the method
             enhances  conventional  training  configurations  and   [3]   Alberto Del Bimbo, Leonardo Galteri, Roberto Caldelli,
             strengthens widely used network architectures. While both   and Irene Amerini. Deepfake video detection using a
             approaches are effective, the MAML framework accelerates   CNN based on optical flow. In Workshops on ICCV,
             adaptation  during  inference—reducing  test  sample    2019.


             IJTSRD | Special Issue on Emerging Trends and Innovations in Web-Based Applications and Technologies   Page 788
   793   794   795   796   797   798   799