Water Quality Analysis and Consumption Feasibility Using Support Vector Machine and CatBoosting with Hyperparameter Tuning

Authors

  • Christa Putri Rahayu Universitas Amikom Yogyakarta, Indonesia Author
  • Kusnawi Universitas Amikom Yogyakarta, Indonesia Author

DOI:

https://doi.org/10.5281/zenodo.17342085

Keywords:

Water Quality, Support Vector Machine, CatBoosting, SMOTE, Hyperparameter Tuning

Abstract

Water quality analysis plays an important role in determining the suitability of water for human consumption. This study aims to build a machine learning model that is able to classify water quality based on several parameters such as pH, hardness, solids content, chloramines, sulfate, conductivity, organic carbon, trihalomethanes, and turbidity. The dataset used comes from Kaggle with a total of 3,276 sample data. The two main algorithms applied in this study are Support Vector Machine (SVM) and CatBoost. The research process includes data preprocessing, data balancing using SMOTE, modeling, and model performance evaluation. Hyperparameter tuning is applied to both algorithms to improve performance. The results show that CatBoost has the best performance with an accuracy of 95.8% after hyperparameter tuning, compared to SVM which achieved an accuracy of 77.9%. In addition, CatBoost excels in all evaluation metrics, including precision, recall, and F1-score.

Downloads

Download data is not yet available.

References

Adi, R. P. (2023). Mengenal CatBoost: Algoritma Boosting yang Membuat Machine Learning Lebih Efektif. Medium. https://medium.com/@rezapurnama1997/mengenal-catboost-algoritma-boosting-yang-membuat-machine-learning-lebih-efektif-5d679bab4966

Agung, G., Sri, D., Ningsih, A., & Pramartha, C. (2024). Klasifikasi Kualitas Air Layak Minum menggunakan Algoritma Random Forest Classifier dan GridsearchCV. 12(1), 217–226.

Azmi, B. N., Hermawan, A., & Avianto, D. (2022). Jurnal Sistem dan Teknologi Informasi Analisis Pengaruh PCA Pada Klasifikasi Kualitas Air Menggunakan Algoritma K-Nearest Neighbor dan Logistic Regression. 7(2). http://jurnal.unmuhjember.ac.id/index.php/JUSTINDO

Bidang Komputer Sains dan Pendidikan Informatika, P., Akademi Perekam dan Informasi Kesehatan Iris Padang Jl Gajah Mada No, D., & Barat, S. (n.d.). Jurnal Edik Informatika Data Mining : Klasifikasi Menggunakan Algoritma C4.5 Yuli Mardi.

CatBoost. (n.d.). https://catboost.ai/docs/en/concepts/algorithm-main-stages_cat-to-numberic

Muhamad Malik Matin, I. (2023). Hyperparameter Tuning Menggunakan GridsearchCV pada Random Forest untuk Deteksi Malware. Multinetics, 9(1), 43–50. https://doi.org/10.32722/multinetics.v9i1.5578

PENGAMANAN KUALITAS AIR MINUM. (n.d.).

Pratiwi, K. S. (2020). Support Vector Machine Classification with Python. Medium. https://medium.com/@kurniasp/support-vector-machine-classification-with-python-64521fbd5b57

Riyantoko, P. A., Fahrudin, T. M., Hindrayani, K. M., Data, S., & Timur, J. (n.d.). Analisis Sederhana Pada Kualitas Air Minum Berdasarkan Akurasi Model Klasifikasi Dengan Menggunakan Lucifer Machine Learning. Seminar Nasional Sains Data, 2021.

Simbolon, I. N. (2024). PREDIKSI KUALITAS AIR SUNGAI DI JAKARTA MENGGUNAKAN KNN YANG DIOPTIMALISASI DENGAN PSO. Jurnal Informatika Dan Teknik Elektro Terapan, 12(2). https://doi.org/10.23960/jitet.v12i2.4191

UDARA KOTA PALEMBANG Oleh : NURCHAERANI KADIR. (2024).

Wijiyanto, W., Pradana, A. I., Sopingi, S., & Atina, V. (2024). Teknik K-Fold Cross Validation untuk Mengevaluasi Kinerja Mahasiswa. Jurnal Algoritma, 21(1). https://doi.org/10.33364/algoritma/v.21-1.1618

Downloads

Published

13-10-2025

Issue

Section

Articles

How to Cite

Water Quality Analysis and Consumption Feasibility Using Support Vector Machine and CatBoosting with Hyperparameter Tuning. (2025). SITEKNIK: Sistem Informasi, Teknik Dan Teknologi Terapan, 2(4), 317-323. https://doi.org/10.5281/zenodo.17342085

Share

Similar Articles

1-10 of 12

You may also start an advanced similarity search for this article.