Page 785 - Emerging Trends and Innovations in Web-Based Applications and Technologies
P. 785
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
technical contributions, this research seeks to enhance our models. Additionally, fake news creators continuously evolve
comprehension of the fake news landscape and offer their strategies, making it difficult for static models to adapt
valuable insights for policymakers, digital platform to new patterns and narratives. Another critical challenge is
developers, and researchers dedicated to addressing addressing the propagation of fake news through social
misinformation. The overarching objective is to build a well- networks. Vosoughi et al. (2018) [14] highlighted that fake
informed and resilient society that can accurately news spreads faster and more widely than true news due to
differentiate between genuine and misleading news content its sensational nature, necessitating the development of real-
in today's digital era. time detection systems. Furthermore, ethical considerations,
such as ensuring user privacy and avoiding censorship, must
2. Literature Review
be carefully addressed to maintain public trust.
The detection of fake news has garnered significant attention
in recent years, leading to a growing body of research The existing body of work demonstrates that both traditional
exploring various techniques and approaches. This section and deep learning methods have significantly contributed to
reviews the existing literature, focusing on three main areas: fake news detection. However, traditional methods often
traditional machine learning methods, deep learning-based require extensive manual effort for feature engineering,
approaches, and challenges in fake news detection. while deep learning approaches demand large datasets and
substantial computational resources. The integration of
Early research in fake news detection predominantly relied
multi-modal data and the use of advanced models such as
on traditional machine learning techniques, leveraging transformers hold promise for improving detection
textual and metadata features. Techniques such as Naïve
performance. Nonetheless, addressing the challenges of
Bayes, Support Vector Machines (SVM), Logistic Regression,
dataset bias, evolving fake news tactics, and ethical
and Decision Trees were widely applied due to their considerations remain crucial for developing effective and
simplicity and interpretability. Rubin et al. (2015) [7]
trustworthy solutions.
explored linguistic cues such as writing style, syntax, and
readability to classify news articles, showing the This study builds upon the existing literature by exploring a
effectiveness of feature engineering in distinguishing fake range of machine learning algorithms, including traditional,
news from legitimate content. Similarly, Potthast et al. deep learning, and hybrid methods, to identify the most
(2017) [8] utilized contentbased features, including word effective approaches for fake news detection. Additionally,
frequency and sentiment analysis, combined with SVM for we aim to address some of the challenges highlighted in the
fake news detection, achieving promising results. While literature by employing diverse datasets and evaluating
these methods demonstrated moderate success, their model performance across different scenarios.
reliance on manual feature extraction posed limitations in
3. Methodology
handling the complex and evolving nature of fake news.
This study introduces a structured approach to detecting
Moreover, traditional approaches often struggled with
fake news through machine learning techniques and natural
generalization across datasets, as fake news tactics and
language processing. The methodology consists of multiple
narratives varied widely across different contexts.
interrelated phases, starting with data collection and
The advent of deep learning has revolutionized fake news preprocessing, followed by feature extraction, model
detection by enabling models to automatically learn features implementation, and performance evaluation. The dataset
from data. Recurrent Neural Networks (RNNs) and Long used comprises both authentic and fabricated news articles,
Short-Term Memory (LSTM) networks have been extensively maintaining an almost equal distribution (50.4% real, 49.6%
used to capture contextual and sequential information in fake) to ensure balanced binary classification. The data
news text. Wang et al. (2022) [9] introduced a hybrid model preprocessing workflow includes several essential steps to
combining convolutional neural networks (CNNs) and LSTMs enhance text quality and uniformity. Initially, special
to extract spatial and temporal features, significantly characters and numerical values are eliminated using regular
improving classification accuracy. Transformer-based expressions, followed by converting text to lowercase and
models, such as BERT (Bidirectional Encoder removing frequently used stopwords in English. These
Representations from Transformers), have further advanced preprocessing steps help minimize noise while preserving
the field by capturing deeper contextual relationships in text. the core meaning of the content, leading to a 21.96%
Devlin et al. (2019) [10] demonstrated the superior reduction in average text length and a 32.6% decrease in
performance of BERT in text classification tasks, including word count (from 423.04 to 285.13 words on average).
fake news detection. Researchers such as Zhou et al. (2021)
Figure 1 Text Analysis Visualization: Length Distribution
[11] have fine-tuned transformer models on fake news
and Word Clouds of Fake vs Real News
datasets, achieving stateof-the-art results. Moreover, multi-
modal approaches that incorporate textual, visual, and social To extract features, this study utilizes Term Frequency-
network data have gained traction. Qi et al. (2021) [12] Inverse Document Frequency (TF-IDF) vectorization,
proposed a model that combines textual analysis with image selecting a maximum of 5000 features to effectively capture
recognition and user engagement patterns to detect fake both word significance within individual documents and
news on social media platforms, highlighting the importance their relevance across the entire dataset. The dataset is then
of integrating diverse data sources for robust detection. divided into training (80%) and testing (20%) sets, ensuring
stratified sampling to maintain an even class distribution.
Despite significant advancements, several challenges persist
Five different machine learning models are applied: Logistic
in the domain of fake news detection. One of the primary
Regression (configured for a maximum of 1000 iterations),
issues is the lack of standardized and balanced datasets.
Random Forest Classifier, Support Vector Machine (SVM)
Horne and Adali (2017) [13] noted that many publicly
with probability estimates enabled, Multinomial Naïve Bayes,
available datasets are biased toward specific topics or
and a Neural Network (MLP Classifier) set for 300 iterations.
languages, limiting the generalizability of machine learning
IJTSRD | Special Issue on Emerging Trends and Innovations in Web-Based Applications and Technologies Page 775