Improving spam detection in Online Social Networks

Online Social Networks (OSNs) are deemed to be the most sought-after societal tool used by the masses world over to communicate and transmit information. Our dependence on these platforms for seeking opinions, news, updates, etc. is increasing. While it is true that OSNs have become a new medium for dissemination of information, at the same time, they are also fast becoming a playground for the spread of misinformation, propaganda, fake news, rumors, unsolicited messages, etc. Consequently, we can say that an OSN platform comprises of two kinds of users namely, Spammers and Non-Spammers. Spammers, out of malicious intent, post either unwanted (or irrelevant) information or spread misinformation on OSN platforms. As part of our work, we propose mechanisms to detect such users (Spammers) in Twitter social network (a popular OSN).

Our work is based on a number of features at tweet-level and user-level like Followers/Followees, URLs, Spam Words, Replies and HashTags. In our work, we have applied three learning algorithms namely Naive Bayes, Clustering and Decision trees. Furthermore, to improve detection of Spammers, a novel integrated approach is proposed which “combines” the advantages of the three learning algorithms mentioned above. Improvement of spam detection is measured on the basis of Total Accuracy, Spammers Detection Accuracy and Non-Spammers Detection Accuracy. Results, thus obtained, show that our novel integrated approach that combines all algorithms outperforms other classical approaches in terms of overall accuracy and detect Non-Spammers with 99% accuracy with an overall accuracy of 87.9%.