DEEP SEMANTIC INTELLIGENCE FOR TWITTER SPAM DETECTION USING LATENT SEMANTIC ANALYSIS

Muhammad Haroon; Shakeeb A. Khan; Muhammad Umair; Muhammad Abrar; Shoaib Ali Qureshi

doi:10.71146/kjmr766

Authors

Muhammad Haroon School of Computer Science and Technology, Xi’an University of Technology, Xi’an, 710048, China. Author
Shakeeb A. Khan Department of Computer Science & IT, University of Southern Punjab, Multan, Pakistan. Author
Muhammad Umair Department of Computer Science, National College of Business Administration & Economics NCBA&E, Sub-Campus Multan, Pakistan. Author
Muhammad Abrar Department of Computer Science & IT, University of Southern Punjab, Multan, Pakistan. Author
Shoaib Ali Qureshi Department of Computer Science, Hameeda Rasheed Institute of Science and Technology, Multan, Pakistan. Author

DOI:

https://doi.org/10.71146/kjmr766

Keywords:

Spam Detection, Twitter, Social Media Security, Machine Learning, Latent Semantic Analysis, Text Classification, Cyber security

Abstract

Social media platforms, particularly Twitter, have become integral to global communication, enabling users to share information instantly with large audiences. However, Twitter’s growing popularity has attracted malicious actors who spread misinformation, phishing attempts, and other spam content. This paper introduces a novel hybrid approach that combines Latent Semantic Analysis (LSA) with traditional machine learning classifiers to effectively distinguish between legitimate and spam tweets. We collected and processed over 5.5 million tweets using Twitter’s API, extracted key features using a statistically validated LSA technique, and implemented four supervised learning algorithms: Naïve Bayes, Support Vector Machine, Decision Tree, and Logistic Regression. The experiments were conducted using rigorous 10-fold cross-validation, and models were evaluated based on accuracy, precision, recall, and F1-score. Our LSA-enhanced approach demonstrated significant performance improvements over traditional methods, with the Naïve Bayes classifier achieving 96.82% accuracy, representing a 5.49% improvement over baseline techniques. Additional error analysis revealed that our approach is particularly effective at identifying evolving spam patterns involving promotional content and malicious URLs.

Downloads

Download data is not yet available.

Downloads

Published

2025-12-03

Issue

Vol. 2 No. 12 (2025): Dec 2025

Section

Engineering and Technology

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

KJMR publishes all articles as open access under CC BY 4.0, allowing anyone to share and adapt the work, even commercially, with proper credit, a license link, and clear notice of changes, without implying endorsement. Authors retain copyright while granting the journal non-exclusive publishing and archiving rights and may self-archive without embargo. Third-party material requires proper permission, and the journal ensures long-term free access through its website and archiving partners.

How to Cite

DEEP SEMANTIC INTELLIGENCE FOR TWITTER SPAM DETECTION USING LATENT SEMANTIC ANALYSIS. (2025). Kashf Journal of Multidisciplinary Research, 2(12), 1-23. https://doi.org/10.71146/kjmr766