Otomata bahasa

2/26/2023

This article can be used to develop new approaches, methods, and models in detecting spam content on social media. Additionally, this paper also discussed spam content on Indonesian social media and provided comprehensive suggestions for possible implementation, further research direction, and a possible new approach. Discussions on the approach, research media, dataset, feature extraction & selection, the language, context-based or not, the algorithm, performance, future research direction, and challenges were carried out. This research compared the latest approaches and methods to see the gaps between these studies. Literature data are collected from 2015 to 2021 based on seven journal repository databases and filtered into 69 main articles. This paper aimed to conduct a comprehensive literature review for "spam content detection" to identify the various approaches taken and generate up to date issues, especially in the social media case study. Spam content detection is different from spammers' detection and thus requires a different approach.

The spam content detection problem is still challenging due to its complexity, feature extraction process, language, context-aware detection capabilities, performance, and evaluation method. Due to the disadvantages at using FSA method, modification is done and produce bigram's correction hit rate as high as 85.44 %. The false positive rate of unigram, bigram, and trigram has the same percentage which is 4.15 %. On the other hand, the highest percentage of correction hit rate is bigram and trigram with value 71.20 %, but bigram is superior in processing time average which is 01:21.23 min. Perplexity with the smallest value is a unigram with value 1.14. The results of this designing process are shown by perplexity evaluation, correction hit rate and false positive rate. Methods used includes Finite State Automata (FSA), Levenshtein distance, and N-gram. For the realization, 5 000 news articles have been used as training data. This design process discuss about the making of spelling correction for document text in Indonesian language with document's text as its input and a. For that reason, spelling correction is needed to solve any writing mistakes.

These days, most of the document is written with a computer. Any mistake in writing of a document will cause the information to be told falsely.

0 Comments

Otomata bahasa

Leave a Reply.

Author

Archives

Categories