Klasifikasi Aktivitas Pengguna yang Berpotensi Menyebabkan Kebocoran Informasi Sensitif Menggunakan Algoritma Random Forest

Authors

  • Alda Amorita Azza Universitas Jenderal Achmad Yani
  • Asep Id Hadiana Universitas Jenderal Achmad Yani
  • Agus Komarudin Universitas Jenderal Achmad Yani

DOI:

https://doi.org/10.38204/tematik.v12i1.2325

Keywords:

Random Forest, SMOTE-ENN, Ancaman Dari Dalam, Kebocoran Informasi Sensitif, Pembelajaran Mesin

Abstract

Sensitive information leaks are a growing concern in cybersecurity, often caused by insider threats. To address this, a Random Forest classification model was developed to detect user activities that may lead to data leaks. By applying SMOTE-ENN for class balancing and optimizing model parameters, the study achieved remarkable accuracy. The model demonstrated a strong performance with an average F1-Score of 0.9167 in cross-validation and 0.9231 on the test data, reflecting its ability to identify abnormal activities with a balanced approach to precision and recall. Specifically, the model detected abnormal activities with Recall of 94.28%, meaning it effectively identified most of the risky activities while minimizing false positives. The AUC-ROC score of 0.9721 highlights the model's ability to distinguish between normal and abnormal behaviors. The results indicate that Random Forest, paired with SMOTE-ENN and parameter optimization, is an effective tool for detecting data leakage risks and insider threats, with potential for use in information security systems to monitor suspicious activities.

Downloads

Download data is not yet available.

References

K. Inayah and K. Ramli, “Analisis Kinerja Intrusion Detection System Berbasis Algoritma Random Forest Menggunakan Dataset Unbalanced Honeynet BSSN,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 11, no. 4, pp. 867–876, Aug. 2024, doi: 10.25126/jtiik.1148911.

I. Herrera Montano, J. J. García Aranda, J. Ramos Diaz, S. Molina Cardín, I. de la Torre Díez, and J. J. P. C. Rodrigues, “Survey of Techniques on Data Leakage Protection and Methods to address the Insider threat,” Cluster Comput, vol. 25, no. 6, pp. 4289–4302, Dec. 2022, doi: 10.1007/s10586-022-03668-2.

O. Arerebo Profit, M. Ifeanyi, and E. Abel, “HANDLING THREAT DETECTION AND PREVENTION VIA RANDOM FOREST AND XGBOOST FOR SENSITIVE DATA SECURITY AND PRIVACY-PRESERVING SYSTEM,” Aug. 2024. Accessed: Nov. 24, 2024. [Online]. Available: https://www.researchgate.net/publication/383040258_HANDLING_THREAT_DETECTION_AND_PREVENTION_VIA_RANDOM_FOREST_AND_XGBOOST_FOR_SENSITIVE_DATA_SECURITY_AND_PRIVACY-PRESERVING_SYSTEM

M. Soleh and Z. Tjenreng, “Strategi Pencegahan Kebocoran Data Pelayanan Publik Di Era Digital,” Jurnal Kajian Pemerintah: Journal of Government, Social and Politics, vol. 11, no. 1, pp. 1–10, Dec. 2024, Accessed: May 12, 2025. [Online]. Available: https://journal.uir.ac.id/index.php/JKP/article/view/20524

A. Guha, D. Samanta, A. Banerjee, and D. Agarwal, “A Deep Learning Model for Information Loss Prevention From Multi-Page Digital Documents,” IEEE Access, vol. 9, pp. 80451–80465, 2021, doi: 10.1109/ACCESS.2021.3084841.

W. Feng et al., “Multi-Granularity User Anomalous Behavior Detection,” Applied Sciences, vol. 15, no. 1, p. 128, Dec. 2024, doi: 10.3390/app15010128.

U. Ahmed et al., “Signature-based intrusion detection using machine learning and deep learning approaches empowered with fuzzy clustering,” Sci Rep, vol. 15, no. 1, p. 1726, Jan. 2025, doi: 10.1038/s41598-025-85866-7.

A. F. Mahmud and S. Wirawan, “Phishing Website Detection Using Machine Learning Classification Method,” SISTEMASI, vol. 13, no. 4, p. 1368, Jul. 2024, doi: 10.32520/stmsi.v13i4.3456.

I. H. Sarker, A. S. M. Kayes, S. Badsha, H. Alqahtani, P. Watters, and A. Ng, “Cybersecurity data science: an overview from machine learning perspective,” J Big Data, vol. 7, no. 1, p. 41, Dec. 2020, doi: 10.1186/s40537-020-00318-5.

Mosope Williams and Tina Charles Mbakwe-Obi, “Integrated strategies for database protection: Leveraging anomaly detection and predictive modelling to prevent data breaches,” World Journal of Advanced Research and Reviews, vol. 24, no. 3, pp. 1098–1115, Dec. 2024, doi: 10.30574/wjarr.2024.24.3.3795.

S. Al and M. Dener, “STL-HDL: A new hybrid network intrusion detection system for imbalanced dataset on big data environment,” Comput Secur, vol. 110, p. 102435, Nov. 2021, doi: 10.1016/j.cose.2021.102435.

A. K. Balyan et al., “A Hybrid Intrusion Detection Model Using EGA-PSO and Improved Random Forest Method,” Sensors, vol. 22, no. 16, p. 5986, Aug. 2022, doi: 10.3390/s22165986.

T. Al-Shehari, M. Al-Razgan, T. Alfakih, R. A. Alsowail, and S. Pandiaraj, “Insider Threat Detection Model Using Anomaly-Based Isolation Forest Algorithm,” IEEE Access, vol. 11, pp. 118170–118185, 2023, doi: 10.1109/ACCESS.2023.3326750.

H. Teymourlouei and V. E. Harris, “Preventing Data Breaches: Utilizing Log Analysis and Machine Learning for Insider Attack Detection,” in 2022 International Conference on Computational Science and Computational Intelligence (CSCI), IEEE, Dec. 2022, pp. 1022–1027. doi: 10.1109/CSCI58124.2022.00181.

M. F. Faiz, J. Arshad, M. Alazab, and A. Shalaginov, “Predicting likelihood of legitimate data loss in email DLP,” Future Generation Computer Systems, vol. 110, pp. 744–757, Sep. 2020, doi: 10.1016/j.future.2019.11.004.

R. Ranjan and S. S. Kumar, “User behaviour analysis using data analytics and machine learning to predict malicious user versus legitimate user,” High-Confidence Computing, vol. 2, no. 1, p. 100034, Mar. 2022, doi: 10.1016/j.hcc.2021.100034.

Muttaqin et al., Pengenalan Data Mining. Yayasan Kita Menulis, 2023.

I. Riantika, B. Sartono, and K. Anwar Notodiputro, “Effectiveness of SMOTE-ENN to Reduce Complexity in Classification Model,” Indonesian Journal of Statistics and Its Applications, vol. 8, no. 1, pp. 70–82, Jun. 2024, doi: 10.29244/ijsa.v8i1p70-82.

G. Sosa-Cabrera, S. Gómez-Guerrero, M. García-Torres, and C. E. Schaerer, “Feature selection: a perspective on inter-attribute cooperation,” Int J Data Sci Anal, vol. 17, no. 2, pp. 139–151, Mar. 2024, doi: 10.1007/s41060-023-00439-z.

Sheena p Shaji, Renju R, Julie Varghese, Lakshmi Sathyan, and Dhannya J, “Optimizing Hyperparameters: Techniques for Improving Machine Learning Models,” International Research Journal on Advanced Engineering and Management (IRJAEM), vol. 2, no. 12, pp. 3782–3787, Dec. 2024, doi: 10.47392/IRJAEM.2024.0561.

M. Bhagat and Dr. Brijesh Bakariya, “A Comprehensive Review of Cross-Validation Techniques in Machine Learning,” International Journal on Science and Technology, vol. 16, no. 1, Jan. 2025, doi: 10.71097/IJSAT.v16.i1.1305.

A. Ferdita Nugraha, R. F. A. Aziza, and Y. Pristyanto, “Penerapan metode Stacking dan Random Forest untuk Meningkatkan Kinerja Klasifikasi pada Proses Deteksi Web Phishing,” Jurnal Infomedia, vol. 7, no. 1, p. 39, Jun. 2022, doi: 10.30811/jim.v7i1.2959.

T. F. Monaghan et al., “Foundational Statistical Principles in Medical Research: Sensitivity, Specificity, Positive Predictive Value, and Negative Predictive Value,” Medicina (B Aires), vol. 57, no. 5, p. 503, May 2021, doi: 10.3390/medicina57050503.

W. Hilal, S. A. Gadsden, and J. Yawney, “Financial Fraud: A Review of Anomaly Detection Techniques and Recent Advances,” Expert Syst Appl, vol. 193, p. 116429, May 2022, doi: 10.1016/j.eswa.2021.116429.

Downloads

Published

2025-06-25

How to Cite

Alda Amorita Azza, Asep Id Hadiana, & Agus Komarudin. (2025). Klasifikasi Aktivitas Pengguna yang Berpotensi Menyebabkan Kebocoran Informasi Sensitif Menggunakan Algoritma Random Forest. TEMATIK, 12(1), 41–49. https://doi.org/10.38204/tematik.v12i1.2325