Page 504 - Emerging Trends and Innovations in Web-Based Applications and Technologies
P. 504
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
meaningfully land a decisive blow against academic cheating. The source database contains a vast repositories of academic
This research is leveraged by Originality guard, to offer a papers, articles, and websites.
robust means of plagiarism detection The comparison engine is the core of plagiarism detection
Most of the current available plagiarism detection tools capabilities. . Applying advanced algorithms and natural
create false-positive results and cause unnecessary penalties language processing techniques, it compares the uploaded
to the students. By using advanced algorithms, Originality document with the database of sources, identifying potential
guard can minimize false positives and ensure that real cases instances of plagiarism. These algorithms include sentence-
of plagiarism are identified, allowing students to focus on level comparison, semantic analysis, and citation detection.
learning. Originality guard is an automated tool that To ensure accuracy, Originality Guard employs a multi-stage
educators, as well as students, can rely on to make use of this verification process. Initially, the comparison engine
particular feature. The need for plagiarism detection plays an identifies potential instances of plagiarism, which are then
important role in promoting academic integrity, as several verified through a secondary analysis. This secondary
studies have suggested that [3] [4]. One such study published analysis assesses the context and relevance of the matched
in the Journal of Academic Ethics demonstrated that text, reducing false positives and improving overall accuracy.
plagiarism detection tools can massively help in reducing It's user-friendly interface and detailed reporting features
instances of academic dishonesty. Originality guard builds on make it an ideal solution for educators, researchers, and
this research and offers a very efficient plagiarism detection students. By providing a comprehensive and accurate
solution plagiarism detection tool, Originality Guard aims to promote
academic integrity and originality, while also facilitating the
Students intending to gain academic qualifications are
learning process.
expected to demonstrate appropriate levels of attainment
and ability through coursework and examinations. This Data Collection
requires students to produce submissions that meet a given For training and testing Originality Guard, a comprehensive
assignment specification which is then marked by a tutor to dataset of academic papers, articles, and websites were
confirm that the work reaches the required standard. In compiled. The dataset, termed "Academic Database," consists
many, if not the majority, of institutions students are also of approximately 500,000 documents, including:
required to confirm that the submission is the result of their 200,000 academic papers from reputable journals and
own, unaided work. Students who falsely give this conferences (IEEE xplore, ACM, Springer)
declaration are playing a part in reducing the value of the 150,000 articles from online sources (Wikipedia, news
qualifications awarded by the academic institution. Knowing websites)
that other students are cheating, but are not being punished 150,000 web pages from educational institutions and
for it, can be infuriating to other students, who may research organizations
themselves be discouraged from putting appropriate effort
The dataset was sourced from various online repositories,
into their own submissions.
including:
III. PROPOSED WORK Digital libraries like IEEE Xplore, ACM Digital Library
The proposed plagiarism detection tool, Originality Guard, Online academic databases such as Google Scholar,
employs a machine learning-based approach to detect Microsoft Academic
plagiarism in academic documents. Web crawlers including Apache, Scrapy
It is a new and innovative plagiarism detection tool designed The Academic Database dataset is diverse in terms of:
specifically for academic and professional use. It is a new Document types (research papers, articles)
tool being developed to help prevent plagiarism in academic Subjects (computer science, engineering)
writing, building on the existing tools available for Languages (English)
plagiarism detection. Formats (PDF, HTML, .DOCX)
The size and diversity of the dataset provide a
Originality Guard architecture is made up of three parts:
document analysis module, source database and comparison comprehensive foundation for training and testing
Originality Guard, ensuring its effectiveness in detecting
engine. The document analysis step involves uploaded
documents, from which important features and metadata are plagiarism across various academic disciplines and
extracted. document types.
Table 1. Summarizing the composition of the "Academic Database" dataset used for training and testing the
Originality Guard plagiarism detection tool:
Sr Source type No. of Example Formats Subjects Languages
no. documents
Academic IEEE Xplore, ACM, Springer Computer science,
1. 200,000 PDF, DOCX English
papers journals and conference papers. Engineering
Wikipedia, HTML, PDF, General,
2. Online articles 150,000 English
news websites DOCX Technical
Educational institutions, Academic,
3. Web pages 150,000 HTML, TXT English
research organizations Research topics
500,000
4. Total - - - -
The dataset comprises research papers, articles, and web pages sourced from digital libraries (e.g., IEEE Xplore, ACM Digital
Library), academic databases (e.g., Google Scholar, Microsoft Academic), and web crawlers (e.g., Apache Nutch, Scrapy). It
features diverse formats, subjects, and reputable sources, ensuring robust plagiarism detection capabilities.
IJTSRD | Special Issue on Emerging Trends and Innovations in Web-Based Applications and Technologies Page 494