Klasifikasi Aktivitas Pengguna yang Berpotensi Menyebabkan Kebocoran Informasi Sensitif Menggunakan Algoritma Random Forest
DOI:
https://doi.org/10.38204/tematik.v12i1.2325Keywords:
Random Forest, SMOTE-ENN, Ancaman Dari Dalam, Kebocoran Informasi Sensitif, Pembelajaran MesinAbstract
Sensitive information leaks are a growing concern in cybersecurity, often caused by insider threats. To address this, a Random Forest classification model was developed to detect user activities that may lead to data leaks. By applying SMOTE-ENN for class balancing and optimizing model parameters, the study achieved remarkable accuracy. The model demonstrated a strong performance with an average F1-Score of 0.9167 in cross-validation and 0.9231 on the test data, reflecting its ability to identify abnormal activities with a balanced approach to precision and recall. Specifically, the model detected abnormal activities with Recall of 94.28%, meaning it effectively identified most of the risky activities while minimizing false positives. The AUC-ROC score of 0.9721 highlights the model's ability to distinguish between normal and abnormal behaviors. The results indicate that Random Forest, paired with SMOTE-ENN and parameter optimization, is an effective tool for detecting data leakage risks and insider threats, with potential for use in information security systems to monitor suspicious activities.
Downloads
References
K. Inayah and K. Ramli, “Analisis Kinerja Intrusion Detection System Berbasis Algoritma Random Forest Menggunakan Dataset Unbalanced Honeynet BSSN,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 11, no. 4, pp. 867–876, Aug. 2024, doi: 10.25126/jtiik.1148911.
I. Herrera Montano, J. J. García Aranda, J. Ramos Diaz, S. Molina Cardín, I. de la Torre Díez, and J. J. P. C. Rodrigues, “Survey of Techniques on Data Leakage Protection and Methods to address the Insider threat,” Cluster Comput, vol. 25, no. 6, pp. 4289–4302, Dec. 2022, doi: 10.1007/s10586-022-03668-2.
O. Arerebo Profit, M. Ifeanyi, and E. Abel, “HANDLING THREAT DETECTION AND PREVENTION VIA RANDOM FOREST AND XGBOOST FOR SENSITIVE DATA SECURITY AND PRIVACY-PRESERVING SYSTEM,” Aug. 2024. Accessed: Nov. 24, 2024. [Online]. Available: https://www.researchgate.net/publication/383040258_HANDLING_THREAT_DETECTION_AND_PREVENTION_VIA_RANDOM_FOREST_AND_XGBOOST_FOR_SENSITIVE_DATA_SECURITY_AND_PRIVACY-PRESERVING_SYSTEM
M. Soleh and Z. Tjenreng, “Strategi Pencegahan Kebocoran Data Pelayanan Publik Di Era Digital,” Jurnal Kajian Pemerintah: Journal of Government, Social and Politics, vol. 11, no. 1, pp. 1–10, Dec. 2024, Accessed: May 12, 2025. [Online]. Available: https://journal.uir.ac.id/index.php/JKP/article/view/20524
A. Guha, D. Samanta, A. Banerjee, and D. Agarwal, “A Deep Learning Model for Information Loss Prevention From Multi-Page Digital Documents,” IEEE Access, vol. 9, pp. 80451–80465, 2021, doi: 10.1109/ACCESS.2021.3084841.
W. Feng et al., “Multi-Granularity User Anomalous Behavior Detection,” Applied Sciences, vol. 15, no. 1, p. 128, Dec. 2024, doi: 10.3390/app15010128.
U. Ahmed et al., “Signature-based intrusion detection using machine learning and deep learning approaches empowered with fuzzy clustering,” Sci Rep, vol. 15, no. 1, p. 1726, Jan. 2025, doi: 10.1038/s41598-025-85866-7.
A. F. Mahmud and S. Wirawan, “Phishing Website Detection Using Machine Learning Classification Method,” SISTEMASI, vol. 13, no. 4, p. 1368, Jul. 2024, doi: 10.32520/stmsi.v13i4.3456.
I. H. Sarker, A. S. M. Kayes, S. Badsha, H. Alqahtani, P. Watters, and A. Ng, “Cybersecurity data science: an overview from machine learning perspective,” J Big Data, vol. 7, no. 1, p. 41, Dec. 2020, doi: 10.1186/s40537-020-00318-5.
Mosope Williams and Tina Charles Mbakwe-Obi, “Integrated strategies for database protection: Leveraging anomaly detection and predictive modelling to prevent data breaches,” World Journal of Advanced Research and Reviews, vol. 24, no. 3, pp. 1098–1115, Dec. 2024, doi: 10.30574/wjarr.2024.24.3.3795.
S. Al and M. Dener, “STL-HDL: A new hybrid network intrusion detection system for imbalanced dataset on big data environment,” Comput Secur, vol. 110, p. 102435, Nov. 2021, doi: 10.1016/j.cose.2021.102435.
A. K. Balyan et al., “A Hybrid Intrusion Detection Model Using EGA-PSO and Improved Random Forest Method,” Sensors, vol. 22, no. 16, p. 5986, Aug. 2022, doi: 10.3390/s22165986.
T. Al-Shehari, M. Al-Razgan, T. Alfakih, R. A. Alsowail, and S. Pandiaraj, “Insider Threat Detection Model Using Anomaly-Based Isolation Forest Algorithm,” IEEE Access, vol. 11, pp. 118170–118185, 2023, doi: 10.1109/ACCESS.2023.3326750.
H. Teymourlouei and V. E. Harris, “Preventing Data Breaches: Utilizing Log Analysis and Machine Learning for Insider Attack Detection,” in 2022 International Conference on Computational Science and Computational Intelligence (CSCI), IEEE, Dec. 2022, pp. 1022–1027. doi: 10.1109/CSCI58124.2022.00181.
M. F. Faiz, J. Arshad, M. Alazab, and A. Shalaginov, “Predicting likelihood of legitimate data loss in email DLP,” Future Generation Computer Systems, vol. 110, pp. 744–757, Sep. 2020, doi: 10.1016/j.future.2019.11.004.
R. Ranjan and S. S. Kumar, “User behaviour analysis using data analytics and machine learning to predict malicious user versus legitimate user,” High-Confidence Computing, vol. 2, no. 1, p. 100034, Mar. 2022, doi: 10.1016/j.hcc.2021.100034.
Muttaqin et al., Pengenalan Data Mining. Yayasan Kita Menulis, 2023.
I. Riantika, B. Sartono, and K. Anwar Notodiputro, “Effectiveness of SMOTE-ENN to Reduce Complexity in Classification Model,” Indonesian Journal of Statistics and Its Applications, vol. 8, no. 1, pp. 70–82, Jun. 2024, doi: 10.29244/ijsa.v8i1p70-82.
G. Sosa-Cabrera, S. Gómez-Guerrero, M. García-Torres, and C. E. Schaerer, “Feature selection: a perspective on inter-attribute cooperation,” Int J Data Sci Anal, vol. 17, no. 2, pp. 139–151, Mar. 2024, doi: 10.1007/s41060-023-00439-z.
Sheena p Shaji, Renju R, Julie Varghese, Lakshmi Sathyan, and Dhannya J, “Optimizing Hyperparameters: Techniques for Improving Machine Learning Models,” International Research Journal on Advanced Engineering and Management (IRJAEM), vol. 2, no. 12, pp. 3782–3787, Dec. 2024, doi: 10.47392/IRJAEM.2024.0561.
M. Bhagat and Dr. Brijesh Bakariya, “A Comprehensive Review of Cross-Validation Techniques in Machine Learning,” International Journal on Science and Technology, vol. 16, no. 1, Jan. 2025, doi: 10.71097/IJSAT.v16.i1.1305.
A. Ferdita Nugraha, R. F. A. Aziza, and Y. Pristyanto, “Penerapan metode Stacking dan Random Forest untuk Meningkatkan Kinerja Klasifikasi pada Proses Deteksi Web Phishing,” Jurnal Infomedia, vol. 7, no. 1, p. 39, Jun. 2022, doi: 10.30811/jim.v7i1.2959.
T. F. Monaghan et al., “Foundational Statistical Principles in Medical Research: Sensitivity, Specificity, Positive Predictive Value, and Negative Predictive Value,” Medicina (B Aires), vol. 57, no. 5, p. 503, May 2021, doi: 10.3390/medicina57050503.
W. Hilal, S. A. Gadsden, and J. Yawney, “Financial Fraud: A Review of Anomaly Detection Techniques and Recent Advances,” Expert Syst Appl, vol. 193, p. 116429, May 2022, doi: 10.1016/j.eswa.2021.116429.