Penggunaan Algoritma K-Nearest Neighbors(KNN) dalam Klasifikasi Artikel Clickbait Berbahasa Indonesia

  • Laila Isyriyah STIKI Malang
  • Adi Bayu Permadi STIKI Malang
  • Rakhmad Maulidi STIKI Malang
Keywords: clickbait detection, article classification, online content analysis, natural language processing, machine learning algorithm, K-Nearest Neighbors

Abstract

Clickbait is a strategy commonly used to attract readers' attention with promising sensational or intriguing headlines. However, often these clickbait headlines do not correspond to the actual content of the news, resulting in disappointment for the readers. Therefore, this study aims to classify clickbait news headlines in the Indonesian language using the K-Nearest Neighbors (K-NN) method. The purpose of this research is to evaluate the ability of the K-NN method to classify clickbait news headlines in the Indonesian language. Thus, it is expected to provide a better understanding of the effectiveness of this method in identifying clickbait headlines. This study utilizes the K-NN method to classify clickbait news headlines. The data consists of 800 training data and 200 test data. The training and testing processes are conducted by varying the number of neighbors (k) and using various supporting features. The results show that the best performance of the K-NN method is achieved with a number of neighbors k=11, yielding an accuracy of 80.5%, Precision of 85%, Recall of 81%, and F-measure of 80%. Testing with 20 new data also resulted in an accuracy rate of 90%. Additionally, several unique words that frequently appear in clickbait headlines are identified, such as "apa" (what), "kenapa" (why), "nih" (here), "alasan" (reason), and "wow". This research contributes to identifying clickbait news headlines in the Indonesian language using the K-NN method. The findings of this study can serve as a reference for further research and provide better insights into how the K-NN method can be applied in classifying clickbait headlines.

Downloads

Download data is not yet available.

References

N. A. Zuhroh and N. A. Rakhmawati, "Clickbait detection: A literature review of the methods used," Jurnal Ilmiah Teknologi Sistem Informasi, vol. 6, no. 1, pp. 1-10, 2020.

M. N. Fakhruzzaman, S. Z. Jannah and R. A. Ningrum, "Flagging clickbait in Indonesian online news websites using fine-tuned transformers," International Journal of Electrical and Computer Engineering , vol. 13, no. 3, 2023.

P. Meel and D. K. Vishwakarma, "Fake news, rumor, information pollution in social media and web: A contemporary survey of state-of-the-arts, challenges and opportunities," Expert Systems with Applications, vol. 153, 2020.

J. Sirusstara, N. Alexander, A. Alfarisy, S. Achmad and R. Sutoyo, "Clickbait Headline Detection in Indonesian News Sites using Robustly Optimized BERT Pre-training Approach (RoBERTa)," in 2022 3rd International Conference on Artificial Intelligence and Data Sciences (AiDAS), Ipoh, Malaysia, 2022.

Y. Fahrimal, A. Husna, F. Islami and J. Johan, "MEDIA DAN PANDEMI: FRAME TENTANG PANDEMI COVID-19 DALAM," JURNAL STUDI KOMUNIKASI DAN MEDIA, vol. 24, no. 2, pp. 169-186, 2020.

J. Zheng, K. Yu and X. Wu, "A deep model based on Lure and Similarity for Adaptive Clickbait Detection," Knowledge-Based Systems, vol. 214, 2021.

M. Dong, L. Yao, X. Wang, B. Benatallah and C. Huang , "Similarity-Aware Deep Attentive Model for Clickbait Detection," in Pacific-Asia Conference on Knowledge Discovery and Data Mining PAKDD 2019: Advances in Knowledge Discovery and Data Mining, Macau, China, 2019.

E. Uzun, "A Novel Web Scraping Approach Using the Additional Information Obtained From Web Pages," IEEE Access, vol. 8, pp. 61726-61740, 2020.

A. William and Y. Sari, "CLICK-ID: A novel dataset for Indonesian clickbait headlines," Data in Brief, vol. 32, p. 106231, 2020.

B. Naeem, A. Khan, M. O. Beg and H. Mujtaba , "A deep learning framework for clickbait detection on social area network using natural language cuesA deep learning framework for clickbait detection on social area network using natural language cues," Journal of Computational Social Science, vol. 3, p. 231–243 , 2020.

N. Kaothanthong, S. Kongyoung and T. Theeramunkong, "Headline2Vec: A CNN-based Feature for Thai Clickbait Headlines Classification," INTERNATIONAL SCIENTIFIC JOURNAL OF ENGINEERING AND TECHNOLOGY, vol. 5, no. 1, 2021.

T. Mladenova and I. Valova, "Analysis of the KNN Classifier Distance Metrics for Bulgarian Fake News Detection," in 2021 3rd International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), Turkey, 2021.

I. Ahmad, M. A. Alqarni, A. A. Almazroi and A. Tariq, "Experimental Evaluation of Clickbait Detection Using Machine Learning Models," Intelligent Automation & Soft Computing , vol. 26, no. 6, pp. 1335-1344, 2020.

W. Wang, F. Feng, X. He, H. Zhang and T.-S. Chua, "Clicks can be Cheating: Counterfactual Recommendation for Mitigating Clickbait Issue," in SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Canada, 2021.

V. Kaushal and K. Vemuri, "Clickbait—Trust and Credibility of Digital News," IEEE Transactions on Technology and Society , vol. 2, no. 3, pp. 146-154, 2021.

M. Liebenlito, A. A. Yesinta and M. I. S. Musti, "Deteksi Clickbait pada Judul Berita Online Berbahasa Indonesia Menggunakan FastText," Journal of Applied Computer Science and Technology (JACOST) , vol. 5, no. 1, 2024.

D. K. Dixit, A. Bhagat and D. Dangi, "An accurate fake news detection approach based on a Levy flight honey badger optimized convolutional neural network model," Concurrency and Computation Practice and Experience, vol. 35, no. 1, 2023.

Published
2024-06-19
How to Cite
Isyriyah, L., Adi Bayu Permadi, & Maulidi, R. (2024). Penggunaan Algoritma K-Nearest Neighbors(KNN) dalam Klasifikasi Artikel Clickbait Berbahasa Indonesia. TEMATIK, 11(1), 7 - 15. https://doi.org/10.38204/tematik.v11i1.1872