Web-Based Real-Time Phishing Detection System Integrated with Machine Learning

Main Article Content

Pawina Chapanya
Chartaporn Jullasri
Satit Kravenkit
Chakchai So-In

Abstract

Phishing attack prevention that relies on URL blacklist/whitelist analysis combined with machine learning techniques has proven highly effective. However, limitations still exist in the practical implementation of machine learning modules under real-world environments, particularly due to the lack of real-time deployment. Therefore, this research developed a web-based phishing detection system integrated with a machine learning module to analyze, verify, and determine whether to allow or deny access to websites. The system is deployed on a proxy server developed in Python and analyzes (1) domain features, (2) URL features, and (3) path and filename characteristics. It is designed with a user interface (UI) that supports model training and evaluation as needed and accommodates multiple algorithms, including Random Forest (RF), Support Vector Machine (SVM), Multilayer Perceptron (MLP), Deep Neural Network (DNN), and Extreme Gradient Boosting (XGBoost). The evaluation results revealed that the Random Forest (RF) algorithm achieved the highest performance, with an accuracy of 96.70% and an F1-score of 96.68%. Moreover, a user satisfaction survey of 10 participants indicated a high level of satisfaction mean = 4.73, particularly in system reliability and stability, 4.91, while the UI capability for model uploading and training received a relatively lower score of 4.36.

Article Details

How to Cite
Chapanya, P. ., Jullasri, C. ., Kravenkit, S., & So-In, C. (2025). Web-Based Real-Time Phishing Detection System Integrated with Machine Learning. Journal of Science Engineering and Technology Rajabhat Maha Sarakham University, 4(2), 37–57. retrieved from https://ph03.tci-thaijo.org/index.php/jsetRMU/article/view/4436
Section
Research Articles

References

ลัดดา ภู่เกษม. (2562). สถิติสำหรับการวิจัยทางการศึกษา. กรุงเทพมหานคร: สำนักพิมพ์แห่งจุฬาลงกรณ์มหาวิทยาลัย.

Alkhalil, Z., Hewage, C., Nawaf, L., & Khan, I. (2021). Phishing attacks: A recent comprehensive study and a new anatomy. Frontiers in Computer Science, 3, Article 563060. https://doi.org/10.3389/fcomp.2021.563060

Anupam, S., & Kar, A. K. (2021).Phishing website detection using support vector machines and nature-inspired optimization algorithms. Telecommunication Systems, 76(1), 17–32. https://doi.org/10.1007/s11235-020-00739-w

Anti-Phishing Working Group. (2023). Phishing Activity Trends Report 2023. Retrieved from https://apwg.org/trendsreports/

Asiri, S., Xiao, Y., & Li, T. (2024).PhishTransformer: A novel approach to detect phishing attacks using URL collection and transformer. Electronics, 13(1), 30. https://doi.org/10.3390/electronics13010030

Chen, Q., Li, X., & Wang, Y. (2022).Phishing website detection based on XGBoost algorithm. Journal of Information Security and Applications, 66, 103149. https://doi.org/10.1016/j.jisa.2022.103149

Cranor, L. F. (2008). A framework for reasoning about the human in the loop. In Proceedings of the 1st Conference on Usability, Psychology, and Security (UPSEC ’08) (pp. 1–15). USENIX Association. Retrieved from https://www.usenix.org/legacy/event/upsec08/tech/full_papers/cranor/cranor.pdf

Dean, J., & Barroso, L. A. (2013). The tail at scale. Communications of the ACM, 56(2), 74–80. https://doi.org/10.1145/2408776.2408794

Fielding, R., Reschke, J., & Berners-Lee, T. (2022). Hypertext Transfer Protocol (HTTP/1.1): Semantics and content. IETF RFC 9110. https://doi.org/10.17487/RFC9110

Géron, A. (2022). Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow (3rd ed.). O’Reilly Media.

Google. (2021). Safe Browsing: Warning pages and protective behavior. Google Security Blog. Retrieved from https://safebrowsing.google.com/

Han, J., Kamber, M., & Pei, J. (2012). Data Mining: Concepts and Techniques (3rd ed.). Morgan Kaufmann.

Haq, Q. E. ul, Faheem, M. H., & Ahmad, I. (2024). Detecting phishing URLs based on a deep learning approach to prevent cyber-attacks. Applied Sciences, 14(22), 10086. https://doi.org/10.3390/app142210086

Harinahalli Lokesh, G., & BoreGowda, G. (2021). Phishing website detection based on effective machine learning approach. Journal of Cyber Security Technology, 5(1), 1–14. https://doi.org/10.1080/23742917.2020.1813396

Kaggle. (2024). Phishing websites dataset for machine learning. Retrieved from https://www.kaggle.com/datasets

Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the International Joint Conference on Artificial Intelligence, 1137–1145.

Le, T. D., Pham, D. T., & Le, H. T. (2022). Detecting phishing websites using graph-based features and machine learning. Computers & Security, 112, 102508. https://doi.org/10.1016/j.cose.2021.102508

Mozilla. (2023). Server Name Indication (SNI) documentation. Retrieved from https://wiki.mozilla.org/Security

Mushtaq, S., Javed, T., & Mohd Su’ud, M. (2024). Ensemble learning-powered URL phishing detection: A

performance driven approach. Journal of Informatics and Web Engineering, 3(2), 134–145.

https://doi.org/10.33093/jiwe.2024.3.2.10

National Institute of Standards and Technology. (2006). NIST SP 800-92: Guide to Computer Security Log Management. https://doi.org/10.6028/NIST.SP.800-92

Narváez, A., Curipallo, M., Reyes, E., Lara, F., Reyes, E. P., & Barba, M. (2025). Evaluation framework for false positives in open-source WAFs based on OWASP CRS paranoia levels. Engineering Proceedings, 115(1), 1. https://doi.org/10.3390/engproc2025115001

OpenPhish. (2024). Phishing Intelligence Feeds. Retrieved from https://openphish.com

OWASP. (2023). OWASP Secure Coding Practices. Retrieved from https://owasp.org

PhishTank. (2024). Verified phishing data. Retrieved from https://phishtank.com

Powers, D. M. W. (2011). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. International Journal of Machine Learning Technology, 2(1), 37–63. Retrieved from https://arxiv.org/abs/2010.16061

Rao, R. S., Kondaiah, C., Pais, A. R., et al. (2025). A hybrid super learner ensemble for phishing detection on mobile devices. Scientific Reports, 15, 16839. https://doi.org/10.1038/s41598-025-02009-8

Raschka, S., & Mirjalili, V. (2020). Python Machine Learning (3rd ed.). Packt Publishing.

Rehman, A. U., Imtiaz, I., Javaid, S., & Muslih, M. (2025). Real-time phishing URL detection using machine learning. Engineering Proceedings, 107, 108. https://doi.org/10.3390/engproc2025107108

Salmi, M. (2024). GitHub phishing blacklist/whitelist repository. Retrieved from https://github.com/salmi/phishing-lists

Xu, P. (2021). A transformer-based model to detect phishing URLs. arXiv preprint arXiv:2109.02138. https://doi.org/10.48550/arXiv.2109.02138

Yang, R., Zheng, K., Wu, B., Wu, C., & Wang, X. (2021). Phishing website detection based on deep convolutional neural network and random forest ensemble learning. Sensors, 21(24), 8281. https://doi.org/10.3390/s21248281