Implementasi Tunelling pada Perancangan Sistem Peringkasan dan Klasifikasi Berita Otomatis menggunakan Textrank dan KNN
Abstract
News summarization is very important in the news analysis process. However, in the summarization process, there are often obstacles such as the large number of news stories and the need for news classification. This research aims to build a simple web-based system that can be used to summarize and classify news which will be very useful in the news analysis process. The proposed summarization method is Textrank, and the news classification method that will be used is KNN. This system is expected to provide an automatic summarization function to make it easier to analyze news content. The data that will be used as the basis for classification modeling is sports news in 3 months, and the classification that will be used to determine whether the news includes sports news in three branches, namely football, rackets or basketball. Testing of the summarization model using textrank was carried out by applying ROUGE-1 and ROUGE-2, with results of 0.79 and 0.67. Meanwhile, testing the classification model using KNN with k=3 and k=5 is 0.9866 and 0.9666 so k=3 will be used. This system will be built using the web scrapping library, textrank, stopword from PySastrawi, scikit-learn for the classification module using the KNN algorithm, and ngrok for publishing web-based applications. By using ngrok, we can expose the application through internet with a temporary public url without hosting required
Downloads
References
C. Zhu, “Applications and future of machine reading comprehension,” in Machine Reading Comprehension, Elsevier, 2021, pp. 185–207.
D. Miller, “Leveraging BERT for Extractive Text Summarization on Lectures,” Jun. 2019, [Online]. Available: http://arxiv.org/abs/1906.04165.
N. Zhou, W. Shi, R. Liang, and N. Zhong, “TextRank Keyword Extraction Algorithm Using Word Vector Clustering Based on Rough Data-Deduction,” Comput. Intell. Neurosci., vol. 2022, pp. 1–19, Jan. 2022, doi: 10.1155/2022/5649994.
S. Kemahduta, “Automatic Text Summarization dengan kategorisasi pada berita online mengenai tokoh masyarakat indonesia dengan metode Fuzzy Logic,” Universitas Sebelas Maret, 2019.
H. Gupta and M. Patel, “Method Of Text Summarization Using Lsa And Sentence Based Topic Modelling With Bert,” in 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), Mar. 2021, pp. 511–517, doi: 10.1109/ICAIS50930.2021.9395976.
K. U. Manjari, S. Rousha, D. Sumanth, and J. Sirisha Devi, “Extractive Text Summarization from Web pages using Selenium and TF-IDF algorithm,” in 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184), Jun. 2020, pp. 648–652, doi: 10.1109/ICOEI48184.2020.9142938.
P. Modaresi, P. Gross, S. Sefidrodi, M. Eckhof, and S. Conrad, “On (Commercial) Benefits of Automatic Text Summarization Systems in the News Domain: A Case of Media Monitoring and Media Response Analysis,” Jan. 2017, [Online]. Available: http://arxiv.org/abs/ 1701.00728.
K. S. Thakkar, R. V Dharaskar, and M. B. Chandak, “Graph-Based Algorithms for Text Summarization,” in 2010 3rd International Conference on Emerging Trends in Engineering and Technology, Nov. 2010, pp. 516–519, doi: 10.1109/ICETET.2010.104.
A. Abdurrohman, “Evaluasi Algoritma Textrank pada Peringkasan Teks Berbahasa Indonesia,” Universitas Sumatera Utara, 2018.
Y. Marsyah and S. H. Wijaya, “Perbandingan Kinerja Algoritme TextRank dengan Algoritme LexRank pada Peringkasan Dokumen Bahasa Indonesia,” IPB University, 2013.
S. R. K. Harinatha, B. T. Tasara, and N. N. Qomariyah, “Evaluating Extractive Summarization Techniques on News Articles,” in 2021 International Seminar on Intelligent Technology and Its Applications (ISITIA), Jul. 2021, pp. 88–94, doi: 10.1109/ISITIA52817.2021.9502230.
M. Zhang, X. Li, S. Yue, and L. Yang, “An Empirical Study of TextRank for Keyword Extraction,” IEEE Access, vol. 8, pp. 178849–178858, 2020, doi: 10.1109/ACCESS.2020.3027567.
S. Mishra, M. Kuznetsov, G. Srivastava, and M. Sviridenko, “VisualTextRank: Unsupervised Graph-based Content Extraction for Automating Ad Text to Image Search,” Aug. 2021, doi: 10.1145/1122445.1122456.
Y. Chen and Q. Song, “News Text Summarization Method based on BART-TextRank Model,” in 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Mar. 2021, pp. 2005–2010, doi: 10.1109/IAEAC50856.2021.9390683.
N. A. Maghfiroh, G. W. Wicaksono, and C. S. K. Aditya, “Peringkasan Berita Online Corona Virus dengan Metode Lexical Chain dan Word Sense Disambiguation,” Komputika J. Sist. Komput., vol. 10, no. 2, pp. 145–151, Aug. 2021, doi: 10.34010/komputika.v10i2.4499.
S. Tuhpatussania and L. M. Nurkholis, “Automatic Text Summarization Artikel Berita Menggunakan Metode Maximum Marginal Relevance,” EXPLORE, vol. 12, no. 2, p. 18, Jul. 2022, doi: 10.35200/explore.v12i2.543.
G. W. Wicaksono, M. N. M. Hakim, N. Hayatin, N. P. Hidayah, and T. I. Sari, “Text Summarization on Verdicts of Industrial Relations Disputes Using the Cross-Latent Semantic Analysis and Long Short-Term Memory,” JOIV Int. J. Informatics Vis., vol. 7, no. 3, pp. 847–853, Sep. 2023, doi: 10.30630/joiv.7.3.2052.
B. Imran, M. N. Karim, and N. I. Ningsih, “KLASIFIKASI BERITA HOAX TERKAIT PEMILIHAN UMUM PRESIDEN REPUBLIK INDONESIA TAHUN 2024 MENGGUNAKAN NAÏVE BAYES DAN SVM,” Din. Rekayasa, vol. 20, no. 1, pp. 1–9, Jan. 2024, doi: 10.20884/1.dinarek.2024.20.1.27.
A. S. Ridwan, Y. H. Chrisnanto, and R. Ilyas, “KLASIFIKASI KALIMAT PADA BERITA OLAHRAGA SECARA OTOMATIS MENGGUNAKAN METODE ARTIFICIAL NEURAL NETWORK,” J. Komput. dan Inform., vol. 9, no. 1, pp. 88–97, Apr. 2021, doi: 10.35508/jicon.v9i1.3708.
G. Elisabeth, Rahma Salsa Bilah, S. N. Ardini, N. Agustina, and D. A. Rismayadi, “KLASIFIKASI BERITA PALSU KENAIKAN HARGA BAHAN BAKAR MINYAK (BBM) MENGGUNAKAN ALGORITMA SUPPORT VECTOR MACHINE (SVM),” Naratif J. Nas. Riset, Apl. dan Tek. Inform., vol. 5, no. 2, pp. 104–109, Dec. 2023, doi: 10.53580/naratif.v5i2.188.
R. M. Juwita, E. Haerani, S. K. Gusti, and S. Ramadhani, “Klasifikasi Berita Menggunakan Metode K-Nearest Neighbor,” J. Nas. Komputasi dan Teknol. Inf., vol. 5, no. 2, pp. 259–268, Apr. 2022, doi: 10.32672/jnkti.v5i2.4192.
J. Ahmed and M. Ahmed, “ONLINE NEWS CLASSIFICATION USING MACHINE LEARNING TECHNIQUES,” IIUM Eng. J., vol. 22, no. 2, pp. 210–225, Jul. 2021, doi: 10.31436/iiumej.v22i2.1662.
Nur Ghaniaviyanto Ramadhan, “Indonesian Online News Topics Classification using Word2Vec and K-Nearest Neighbor,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 6, pp. 1083–1089, Dec. 2021, doi: 10.29207/resti.v5i6.3547.
K. Munawaroh and A. Alamsyah, “Performance Comparison of SVM, Naïve Bayes, and KNN Algorithms for Analysis of Public Opinion Sentiment Against COVID-19 Vaccination on Twitter,” J. Adv. Inf. Syst. Technol., vol. 4, no. 2, pp. 113–125, Mar. 2023, doi: 10.15294/jaist.v4i2.59493.
Ngrok, “What is ngrok.” https://ngrok.com/docs/what-is-ngrok/.
A. RS, “Quickly share ML WebApps from Google Colab using ngrok for Free,” Toward Data Science, 2020. https://towardsdatascience.com/quickly-share-ml-webapps-from-google-colab-using-ngrok-for-free-ae899ca2661a (accessed Apr. 04, 2024).
G. . Santos, P.S.M, Travassos, “Action Research Can Swing the Balance in Experimental Software Engineering,” Adv. Comput., vol. 83, pp. 205–276.
M. Staron, Action Research in Software Engineering. Cham: Springer International Publishing, 2020.
S. M, “Action Research in Software Engineering: Metrics’ Research Perspective,” in Theory and Practice of Computer Science. SOFSEM 2019. Lecture Notes in Computer Science, Springer Berlin Heidelberg, 2019.
N. Davidson, R.M, Martinsons, M.G., Kock, “Systems Journal: Principles of Canonical Action Research,” J. Inf., vol. 14, pp. 65–86, 2004.
Falahah, “Summarization and Classification of Sports News using Textrank and KNN,” Int. J. Syst. Eng. Inf. Technol., vol. 3, no. 1, pp. 23–29, 2024, doi: 10.29207/joseit.v3i1.5706.