Page 504 - Emerging Trends and Innovations in Web-Based Applications and Technologies
P. 504

International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
             meaningfully land a decisive blow against academic cheating.   The source database contains a vast repositories of academic
             This research is leveraged  by Originality guard, to offer a   papers, articles, and websites.
             robust means of plagiarism detection               The comparison engine is the core of plagiarism detection
             Most  of  the  current  available  plagiarism  detection  tools   capabilities.  .  Applying  advanced  algorithms  and  natural
             create false-positive results and cause unnecessary  penalties   language processing techniques, it compares the uploaded
             to the students. By using advanced algorithms, Originality   document with the database of sources, identifying potential
             guard can minimize false positives and ensure that real cases   instances of plagiarism. These algorithms include sentence-
             of plagiarism are identified, allowing students  to focus on   level comparison, semantic analysis, and citation detection.
             learning.  Originality  guard  is  an  automated  tool  that   To ensure accuracy, Originality Guard employs a multi-stage
             educators, as well as students, can rely on to make use of this   verification  process.  Initially,  the  comparison  engine
             particular  feature. The need for plagiarism detection plays an   identifies potential instances of plagiarism, which are then
             important role in promoting academic integrity, as several   verified  through  a  secondary  analysis.  This  secondary
             studies have suggested that [3] [4]. One such study published   analysis assesses the context and relevance of the matched
             in  the  Journal  of  Academic  Ethics  demonstrated  that   text, reducing false positives and improving overall accuracy.
             plagiarism detection tools can massively help in reducing   It's user-friendly interface and detailed reporting features
             instances of academic dishonesty. Originality guard builds on   make  it  an  ideal  solution  for  educators,  researchers,  and
             this research and offers a very efficient plagiarism detection   students.  By  providing  a  comprehensive  and  accurate
             solution                                           plagiarism detection tool, Originality Guard aims to promote
                                                                academic integrity and originality, while also facilitating the
             Students  intending  to  gain  academic  qualifications  are
                                                                learning process.
             expected to demonstrate appropriate levels of attainment
             and  ability  through  coursework  and  examinations.  This   Data Collection
             requires students to produce submissions that meet a given   For training and testing Originality Guard, a comprehensive
             assignment specification which is then marked by a tutor to   dataset  of  academic  papers,  articles,  and  websites  were
             confirm  that  the  work  reaches  the  required  standard.  In   compiled. The dataset, termed "Academic Database," consists
             many, if not the majority, of institutions students are also   of approximately 500,000 documents, including:
             required to confirm that the submission is the result of their     200,000 academic papers from reputable journals and
             own,  unaided  work.  Students  who  falsely  give  this   conferences (IEEE xplore, ACM, Springer)
             declaration are playing a part in reducing the value of the     150,000 articles from online sources (Wikipedia, news
             qualifications awarded by the academic institution. Knowing   websites)
             that other students are cheating, but are not being punished     150,000 web pages from educational institutions and
             for  it,  can  be  infuriating  to  other  students,  who  may   research organizations
             themselves be discouraged from putting appropriate effort
                                                                The dataset was sourced from various online repositories,
             into their own submissions.
                                                                including:
             III.   PROPOSED WORK                                 Digital libraries like IEEE Xplore, ACM Digital Library
             The proposed plagiarism detection tool, Originality Guard,     Online  academic  databases  such  as  Google  Scholar,
             employs  a  machine  learning-based  approach  to  detect   Microsoft Academic
             plagiarism in academic documents.                    Web crawlers including Apache, Scrapy
             It is a new and innovative plagiarism detection tool designed   The Academic Database dataset is diverse in terms of:
             specifically  for academic and professional use. It is a new     Document types (research papers, articles)
             tool being developed to help prevent plagiarism in academic     Subjects (computer science, engineering)
             writing,  building  on  the  existing  tools  available  for     Languages (English)
             plagiarism detection.                                Formats (PDF, HTML, .DOCX)
                                                                The  size  and  diversity  of  the  dataset  provide  a
             Originality  Guard  architecture  is  made  up  of  three  parts:
             document analysis module, source database and comparison   comprehensive  foundation  for  training  and  testing
                                                                Originality  Guard,  ensuring  its  effectiveness  in  detecting
             engine.  The  document  analysis  step  involves  uploaded
             documents,  from which important features and metadata are   plagiarism  across  various  academic  disciplines  and
             extracted.                                         document types.
                 Table 1. Summarizing the composition of the "Academic Database" dataset used for training and testing the
                                            Originality Guard plagiarism detection tool:
               Sr   Source type    No. of            Example             Formats        Subjects     Languages
              no.                documents
                     Academic                 IEEE Xplore, ACM, Springer            Computer science,
               1.                 200,000                                PDF, DOCX                     English
                      papers                journals and conference papers.            Engineering
                                                     Wikipedia,         HTML, PDF,      General,
               2.   Online articles   150,000                                                          English
                                                   news websites           DOCX         Technical
                                               Educational institutions,               Academic,
               3.   Web pages     150,000                               HTML, TXT                      English
                                                research organizations               Research topics
                                  500,000
               4.      Total                             -                   -             -             -

             The dataset comprises research papers, articles, and web pages sourced from digital libraries (e.g., IEEE Xplore, ACM Digital
             Library), academic databases (e.g., Google Scholar, Microsoft Academic), and web crawlers (e.g., Apache Nutch, Scrapy). It
             features diverse formats, subjects, and reputable sources, ensuring robust plagiarism detection capabilities.

             IJTSRD | Special Issue on Emerging Trends and Innovations in Web-Based Applications and Technologies   Page 494
   499   500   501   502   503   504   505   506   507   508   509