Web-Based Real-Time Phishing Detection System Integrated with Machine Learning
Main Article Content
Abstract
Phishing attack prevention that relies on URL blacklist/whitelist analysis combined with machine learning techniques has proven highly effective. However, limitations still exist in the practical implementation of machine learning modules under real-world environments, particularly due to the lack of real-time deployment. Therefore, this research developed a web-based phishing detection system integrated with a machine learning module to analyze, verify, and determine whether to allow or deny access to websites. The system is deployed on a proxy server developed in Python and analyzes (1) domain features, (2) URL features, and (3) path and filename characteristics. It is designed with a user interface (UI) that supports model training and evaluation as needed and accommodates multiple algorithms, including Random Forest (RF), Support Vector Machine (SVM), Multilayer Perceptron (MLP), Deep Neural Network (DNN), and Extreme Gradient Boosting (XGBoost). The evaluation results revealed that the Random Forest (RF) algorithm achieved the highest performance, with an accuracy of 96.70% and an F1-score of 96.68%. Moreover, a user satisfaction survey of 10 participants indicated a high level of satisfaction mean = 4.73, particularly in system reliability and stability, 4.91, while the UI capability for model uploading and training received a relatively lower score of 4.36.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
References
ลัดดา ภู่เกษม. (2562). สถิติสำหรับการวิจัยทางการศึกษา. กรุงเทพมหานคร: สำนักพิมพ์แห่งจุฬาลงกรณ์มหาวิทยาลัย.
Alkhalil, Z., Hewage, C., Nawaf, L., & Khan, I. (2021). Phishing attacks: A recent comprehensive study and a new anatomy. Frontiers in Computer Science, 3, Article 563060. https://doi.org/10.3389/fcomp.2021.563060
Anupam, S., & Kar, A. K. (2021).Phishing website detection using support vector machines and nature-inspired optimization algorithms. Telecommunication Systems, 76(1), 17–32. https://doi.org/10.1007/s11235-020-00739-w
Anti-Phishing Working Group. (2023). Phishing Activity Trends Report 2023. Retrieved from https://apwg.org/trendsreports/
Asiri, S., Xiao, Y., & Li, T. (2024).PhishTransformer: A novel approach to detect phishing attacks using URL collection and transformer. Electronics, 13(1), 30. https://doi.org/10.3390/electronics13010030
Chen, Q., Li, X., & Wang, Y. (2022).Phishing website detection based on XGBoost algorithm. Journal of Information Security and Applications, 66, 103149. https://doi.org/10.1016/j.jisa.2022.103149
Cranor, L. F. (2008). A framework for reasoning about the human in the loop. In Proceedings of the 1st Conference on Usability, Psychology, and Security (UPSEC ’08) (pp. 1–15). USENIX Association. Retrieved from https://www.usenix.org/legacy/event/upsec08/tech/full_papers/cranor/cranor.pdf
Dean, J., & Barroso, L. A. (2013). The tail at scale. Communications of the ACM, 56(2), 74–80. https://doi.org/10.1145/2408776.2408794
Fielding, R., Reschke, J., & Berners-Lee, T. (2022). Hypertext Transfer Protocol (HTTP/1.1): Semantics and content. IETF RFC 9110. https://doi.org/10.17487/RFC9110
Géron, A. (2022). Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow (3rd ed.). O’Reilly Media.
Google. (2021). Safe Browsing: Warning pages and protective behavior. Google Security Blog. Retrieved from https://safebrowsing.google.com/
Han, J., Kamber, M., & Pei, J. (2012). Data Mining: Concepts and Techniques (3rd ed.). Morgan Kaufmann.
Haq, Q. E. ul, Faheem, M. H., & Ahmad, I. (2024). Detecting phishing URLs based on a deep learning approach to prevent cyber-attacks. Applied Sciences, 14(22), 10086. https://doi.org/10.3390/app142210086
Harinahalli Lokesh, G., & BoreGowda, G. (2021). Phishing website detection based on effective machine learning approach. Journal of Cyber Security Technology, 5(1), 1–14. https://doi.org/10.1080/23742917.2020.1813396
Kaggle. (2024). Phishing websites dataset for machine learning. Retrieved from https://www.kaggle.com/datasets
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the International Joint Conference on Artificial Intelligence, 1137–1145.
Le, T. D., Pham, D. T., & Le, H. T. (2022). Detecting phishing websites using graph-based features and machine learning. Computers & Security, 112, 102508. https://doi.org/10.1016/j.cose.2021.102508
Mozilla. (2023). Server Name Indication (SNI) documentation. Retrieved from https://wiki.mozilla.org/Security
Mushtaq, S., Javed, T., & Mohd Su’ud, M. (2024). Ensemble learning-powered URL phishing detection: A
performance driven approach. Journal of Informatics and Web Engineering, 3(2), 134–145.
https://doi.org/10.33093/jiwe.2024.3.2.10
National Institute of Standards and Technology. (2006). NIST SP 800-92: Guide to Computer Security Log Management. https://doi.org/10.6028/NIST.SP.800-92
Narváez, A., Curipallo, M., Reyes, E., Lara, F., Reyes, E. P., & Barba, M. (2025). Evaluation framework for false positives in open-source WAFs based on OWASP CRS paranoia levels. Engineering Proceedings, 115(1), 1. https://doi.org/10.3390/engproc2025115001
OpenPhish. (2024). Phishing Intelligence Feeds. Retrieved from https://openphish.com
OWASP. (2023). OWASP Secure Coding Practices. Retrieved from https://owasp.org
PhishTank. (2024). Verified phishing data. Retrieved from https://phishtank.com
Powers, D. M. W. (2011). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. International Journal of Machine Learning Technology, 2(1), 37–63. Retrieved from https://arxiv.org/abs/2010.16061
Rao, R. S., Kondaiah, C., Pais, A. R., et al. (2025). A hybrid super learner ensemble for phishing detection on mobile devices. Scientific Reports, 15, 16839. https://doi.org/10.1038/s41598-025-02009-8
Raschka, S., & Mirjalili, V. (2020). Python Machine Learning (3rd ed.). Packt Publishing.
Rehman, A. U., Imtiaz, I., Javaid, S., & Muslih, M. (2025). Real-time phishing URL detection using machine learning. Engineering Proceedings, 107, 108. https://doi.org/10.3390/engproc2025107108
Salmi, M. (2024). GitHub phishing blacklist/whitelist repository. Retrieved from https://github.com/salmi/phishing-lists
Xu, P. (2021). A transformer-based model to detect phishing URLs. arXiv preprint arXiv:2109.02138. https://doi.org/10.48550/arXiv.2109.02138
Yang, R., Zheng, K., Wu, B., Wu, C., & Wang, X. (2021). Phishing website detection based on deep convolutional neural network and random forest ensemble learning. Sensors, 21(24), 8281. https://doi.org/10.3390/s21248281