Comparing the Performance of Probabilistic Weighting Classification Techniques for Water Quality Assessment
Keywords:
Water Quality, Classification, Probabilistic Weighting Ensemble, Machine Learning, Model PerformanceAbstract
This research investigation explores the comparative performance of probability weighting classification techniques in the assessment of water quality. The dataset, sourced from Kaggle, comprises 7,999 records detailing water quality, characterized by 21 dimensions of chemical component quantities and another binary-class quality indicator. Through the integration of ensemble methods and the utilization of pairwise comparison techniques, the study demonstrates enhancements in precision, recall, and F-measure, achieving a minimum increase of 6.68%, albeit with a maximum trade-off of 5.16% in accuracy, when compared to single classifiers. These findings not only contribute to advancing single classification techniques but also lay the groundwork for the development of more resilient and dependable models. The implications of this research extend to practical applications in environmental monitoring practices, influencing policy decisions, and guiding interventions aimed at safeguarding water quality. By establishing a foundation for robust modeling, the study underscores its significance in shaping proactive measures for sustaining and preserving the quality of water resources.
References
World Health Organization. (2017). Guidelines for drinking-water quality (4th ed.). Geneva: WHO.
Zhou, Z.-H., Wu, J., & Tang, W. (2002). Ensembling neural networks: Many could be better than all. Artificial Intelligence, 137(1–2), 239–263.
Dutta, S., Mishra, S., & Roy, R. (2020). Water quality prediction using machine learning and IoT. Procedia Computer Science, 167,2241–2248.
Jia, X., Wang, M., & Du, P. (2019). A new class-imbalance learning method for environmental data. Environmental Modelling & Software, 118, 231–245.
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.
Dietterich, T. G. (2000). Ensemble methods in machine learning. In Multiple classifier systems (pp. 1–15). Springer.
Rokach, L. (2010). Ensemble-based classifiers. Artificial Intelligence Review, 33(1–2), 1–39.
Opitz, D. W., & Maclin, R. (1999). Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11, 169–198.
Zhou, Z.-H. (2012). Ensemble methods: Foundations and algorithms. Boca Raton: CRC Press.
Khan, S., Shishir, M. A., Tazin, M. M., & Hoque, M. A. (2022). Water quality monitoring system using IoT and machine learning: A review. Sensors, 22(21), 8265.
World Health Organization. (2019). Water pollution and global health. Geneva: WHO.
Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29(2–3), 103–130.
Vapnik, V. (1998). Statistical learning theory. New York: Wiley.
Zhang, J., Wang, T., & Wang, Y. (2021). Feature engineering strategies for water quality classification. Environmental Science & Technology, 55(3), 1750–1760.
Das, A. A., & Mishra, N. (2020). Integrating domain knowledge into ML-based water quality models. Applied Water Science,10(3),67–78.
Farhangfar, M., Kurgan, L., & Dy, J. (2008). Impact of imputation of missing values on classification error for discrete data. Pattern Recognition, 41(12), 3692–3705.
Kaggle. (n.d.). Water Quality Dataset. Retrieved June 5, 2025,
Leys, C., Ley, C., Klein, O., Bernard, P., & Licata, L. (2013). Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. Journal of Experimental Social Psychology, 49(4),744 -746
Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: A review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374(2065),20150202.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297
Dietterich, T. G. (2000). Ensemble methods in machine learning. In J. Kittler & F. Roli (Eds.), Multiple classifier systems (pp. 1–15). Springer.
Kuncheva, L. I. (2004). Combining pattern classifiers: Methods and algorithms. Wiley-Interscience.
M. Zajac et al., “Precision vs. Recall: Metrics definitions, tradeoffs & use cases,” V7 Labs Blog, 2022.
Google Developers. (n.d.). Classification: Accuracy, precision, recall, and related metrics [Crash Course module]. Retrieved June 20, 2025, https://developers.google.com/machine-learning/crash-course/classification/accuracy-precision-recall
“Classification: Accuracy, precision, recall, and related metrics,” Google Developers, n.d.
Author, B., Smith, J., & Doe, A. (2024, May). A hybrid machine learning approach for imbalanced irrigation water quality classification. ScienceDirect.Retrieved https://www.sciencedirect.com/science/article/pii/S1944398624204203sciencedirect.com+6sciencedirect.com+6researchgate.net+6
Downloads
Published
How to Cite
Issue
Section
Categories
License
Copyright (c) 2025 Journal of Engineering and Innovative Research

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
