Page 505 - Emerging Trends and Innovations in Web-Based Applications and Technologies
P. 505

International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470



























                            Fig 1. Training, Validation, and Test Set Workflow in Machine Learning Models
             This flowchart illustrates the process of training a machine learning model using a plagiarism detection dataset. The training
             set is used to fit the model, the validation set helps fine-tune hyperparameters and evaluate performance during development,
             and the test set is used to confirm the model's generalization ability. This process ensures that the plagiarism detection tool
             avoids overfitting and achieves reliable performance on unseen data.
             Baheti, Pragati. "Training data/ validation / test." , V7 labs, 13 SEP. 2021, https://www.v7labs.com/blog/train-validation-test-
             set. Accessed 17 Jan. 2025.
             Validation set – The tool's approach is grounded in existing research on plagiarism detection, ensuring content validity.





























                                    Fig 2. Dataset Split Proportions for Machine Learning Models
             This demonstrates various splits of a dataset into training, validation, and test sets, commonly used in machine learning. For
             plagiarism detection tools, dataset division is crucial to ensure robust performance. The training data teaches the model to
             identify patterns (e.g., common plagiarism cases), while validation and test data help measure its accuracy and prevent bias.
             Dividing a dataset into training, validation, and test sets is fundamental in machine learning. The training set is used to teach
             the model by identifying patterns and relationships within the data, such as common instances of plagiarism in the context of a
             detection tool. This stage helps the model develop an understanding of the data it will encounter. Proper training ensures the
             model is equipped to analyze new cases effectively.
             Validation and test sets play a critical role in evaluating the model's performance. The validation set is used during training to
             fine-tune parameters and prevent overfitting, ensuring the model generalizes well to unseen data. The test set, on the other
             hand, measures the model's final accuracy and robustness. In plagiarism detection, this division ensures the tool performs
             reliably across diverse scenarios, minimizing bias and improving real-world applicability.
             Source: Adapted from https://www.v7labs.com/blog/train-validation-test-set


             IJTSRD | Special Issue on Emerging Trends and Innovations in Web-Based Applications and Technologies   Page 495
   500   501   502   503   504   505   506   507   508   509   510