Page 797 - Emerging Trends and Innovations in Web-Based Applications and Technologies
P. 797
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
3.1. OnlineTest-TimeTraining
̃
In the following discussion, we assume that a deepfake detection model θhas already been trained. The OSTG process updates
̃,
the model parameters, resulting in θ which adapts to each specific test image. This adaptation is performed by first generating
a pseudo-training sample, which is then used to construct a mini-training set for a one-shot training update to refine the model
parameters.
Figure 2: Pipeline for generating pseudo training samples. Forgeries 1, 2, 3
Generatingpseudo-training samples: As showing Figure 2, For every test sample xe, we first randomly select a template image xr
from the training dataset and align these two images in geometry based on their landmarks.
The proposed test-time training approach can be interpreted as a domain adaptation technique. In this framework, each test
image is treated as a unique domain, characterized by its content, which may differ from the training data due to a domain gap.
The pseudo-training sample generated through this method is more closely related to the test image than the original training
samples, as it is synthesized based on the test image itself. By performing rapid adaptation on this generated sample, the
detector can better align with the test image, improving its performance. Additional evidence supporting this analysis is
provided.
4. Experiments
This section first presents the setups and then shows extensive experimental results to demonstrate the superiority of our
approach. Please refer to the supplementary material for more experimental results.
4.1. Settings
Training and test datasets. Following the protocols in existing deepfake detection methods, we use the data in the
Faceforencis++ (FF++) dataset for training. This dataset contains
The dataset consists of 1,000 videos, with 720 used for (DF), Face2Face (F2F), FaceSwap (FS), and
training, 140 reserved for validation, and the remaining NeuralTexture (NT), resulting in four corresponding
videos allocated for testing. Each real video undergoes synthetic videos. Additionally, the dataset is available in
manipulation using four deepfake techniques: DeepFake three different quality levels—raw, lightly compressed
IJTSRD | Special Issue on Emerging Trends and Innovations in Web-Based Applications and Technologies Page 787