Page 268 - Emerging Trends and Innovations in Web-Based Applications and Technologies
P. 268
International Journal of Trend in Scientific Research and Development (IJTSRD) @ eISSN: 2456-6470
to decode the meaning communicated by the signer. The Transformer Models: Recently, transformer-based models
advent of deep learning techniques, particularly like BERT and GPT have shown promise in translating
Convolutional Neural Networks (CNNs), has significantly gestures into sentences by capturing long-range
enhanced the ability to recognize hand gestures, even in dependencies between signs, improving the system's ability
complex scenarios. to generate fluent, context-aware translations.
Ø Convolutional Neural Networks (CNNs): III. PERFORMANCE EVALUATION
CNNs are powerful tools for processing visual data and are The method for evaluation metrics is as follows: The
extensively used to identify both static hand postures and frequency with which the classifier plays an accurate
dynamic gestures in 2D and 3D formats. These networks vaticination is referred to as accuracy.
automatically extract meaningful features from input images, It is decided via partitioning the amount of nicely grouped
such as hand contours, spatial positioning, and movement instances by means of the whole wide variety of instances.
patterns, enabling accurate gesture recognition.
Precision is a measure of how often the classifier accurately
Ø 3D Pose Estimation: predicts a effective instance.
While traditional 2D gesture recognition methods face
challenges in distinguishing gestures involving intricate hand ,
movements or spatial configurations, 3D pose estimation
addresses these limitations. By capturing the depth and Here TP is the real +ve, TN is the real -ve, FP is the fake +ve,
spatial details of hand movements, 3D pose estimation and FN is the fake -ve. It's computed through dividing the
improves recognition accuracy, making it suitable for real- entire of TP and FP via the overall quantity of real positives.
world applications where depth perception plays a critical Recall is a degree of how often the classifier effectively
predicts a +ve example out of all +ve instances.
It's decided through isolating the
amount of actual up-sides by means of the quantity of TP and
The use of CNNs for sign language detection shows
significant potential in facilitating communication between
hearing and non-hearing individuals. With continued
advancements in model robustness and dataset expansion,
sign language detection systems could become more
accurate, reliable, and scalable, ultimately supporting greater
inclusivity across different social, educational, and
professional environments.
B. Data Collection For the detection of ISL gestures, the precision and recall for
Data collection is a significant challenge in building sign the most common gestures are expected to be:
language recognition systems. High-quality, large-scale
annotated datasets are required to train machine learning Ø Precision: 85% to 90% for commonly used gestures
models effectively. However, creating these datasets is a (e.g., "thank you," "sorry," "hello").
labor-intensive task due to the complexity and diversity of Ø Recall: 80% to 85% for gestures in challenging
sign languages. Moreover, variations in regional dialects and categories (e.g., complex multi-hand gestures).
individual signing styles add to the challenge of
standardizing sign language datasets. These values are anticipated based on the CNN’s ability to
learn from the large dataset with diverse examples of each
C. Translation Models gesture.
Once gestures are recognized, the next step is translating
them into natural language, either spoken or written. Natural We expect the model to classify each gesture in less than
language processing (NLP) plays a key role in this step, as it 100 milliseconds, allowing for real-time processing of
involves converting gestures into meaningful sentences. NLP gestures, which is crucial for applications such as sign
models are trained to understand context, word order, and language translation for live communication. We anticipate
grammar to ensure that the translation is accurate and that the model will perform well under normal indoor
makes sense within the given context. lighting conditions. However, a slight drop in accuracy
(about 5% to 7%) is expected when tested under low-light
Sequence-to-Sequence Models (Seq2Seq): These models are conditions. The model might show a decrease in
widely used for translating sequences of gestures into text. performance when parts of the hands or face are occluded
They are trained on large datasets of paired sign language during gesture execution. The expected accuracy for
gestures and their corresponding written language occluded gestures is around 70% to 75%.
IJTSRD | Special Issue on Emerging Trends and Innovations in Web-Based Applications and Technologies Page 258