Water Quality Analysis and Consumption Feasibility Using Support Vector Machine and CatBoosting with Hyperparameter Tuning
DOI:
https://doi.org/10.5281/zenodo.17342085Keywords:
Water Quality, Support Vector Machine, CatBoosting, SMOTE, Hyperparameter TuningAbstract
Water quality analysis plays an important role in determining the suitability of water for human consumption. This study aims to build a machine learning model that is able to classify water quality based on several parameters such as pH, hardness, solids content, chloramines, sulfate, conductivity, organic carbon, trihalomethanes, and turbidity. The dataset used comes from Kaggle with a total of 3,276 sample data. The two main algorithms applied in this study are Support Vector Machine (SVM) and CatBoost. The research process includes data preprocessing, data balancing using SMOTE, modeling, and model performance evaluation. Hyperparameter tuning is applied to both algorithms to improve performance. The results show that CatBoost has the best performance with an accuracy of 95.8% after hyperparameter tuning, compared to SVM which achieved an accuracy of 77.9%. In addition, CatBoost excels in all evaluation metrics, including precision, recall, and F1-score.
Downloads
References
Adi, R. P. (2023). Mengenal CatBoost: Algoritma Boosting yang Membuat Machine Learning Lebih Efektif. Medium. https://medium.com/@rezapurnama1997/mengenal-catboost-algoritma-boosting-yang-membuat-machine-learning-lebih-efektif-5d679bab4966
Agung, G., Sri, D., Ningsih, A., & Pramartha, C. (2024). Klasifikasi Kualitas Air Layak Minum menggunakan Algoritma Random Forest Classifier dan GridsearchCV. 12(1), 217–226.
Azmi, B. N., Hermawan, A., & Avianto, D. (2022). Jurnal Sistem dan Teknologi Informasi Analisis Pengaruh PCA Pada Klasifikasi Kualitas Air Menggunakan Algoritma K-Nearest Neighbor dan Logistic Regression. 7(2). http://jurnal.unmuhjember.ac.id/index.php/JUSTINDO
Bidang Komputer Sains dan Pendidikan Informatika, P., Akademi Perekam dan Informasi Kesehatan Iris Padang Jl Gajah Mada No, D., & Barat, S. (n.d.). Jurnal Edik Informatika Data Mining : Klasifikasi Menggunakan Algoritma C4.5 Yuli Mardi.
CatBoost. (n.d.). https://catboost.ai/docs/en/concepts/algorithm-main-stages_cat-to-numberic
Muhamad Malik Matin, I. (2023). Hyperparameter Tuning Menggunakan GridsearchCV pada Random Forest untuk Deteksi Malware. Multinetics, 9(1), 43–50. https://doi.org/10.32722/multinetics.v9i1.5578
PENGAMANAN KUALITAS AIR MINUM. (n.d.).
Pratiwi, K. S. (2020). Support Vector Machine Classification with Python. Medium. https://medium.com/@kurniasp/support-vector-machine-classification-with-python-64521fbd5b57
Riyantoko, P. A., Fahrudin, T. M., Hindrayani, K. M., Data, S., & Timur, J. (n.d.). Analisis Sederhana Pada Kualitas Air Minum Berdasarkan Akurasi Model Klasifikasi Dengan Menggunakan Lucifer Machine Learning. Seminar Nasional Sains Data, 2021.
Simbolon, I. N. (2024). PREDIKSI KUALITAS AIR SUNGAI DI JAKARTA MENGGUNAKAN KNN YANG DIOPTIMALISASI DENGAN PSO. Jurnal Informatika Dan Teknik Elektro Terapan, 12(2). https://doi.org/10.23960/jitet.v12i2.4191
UDARA KOTA PALEMBANG Oleh : NURCHAERANI KADIR. (2024).
Wijiyanto, W., Pradana, A. I., Sopingi, S., & Atina, V. (2024). Teknik K-Fold Cross Validation untuk Mengevaluasi Kinerja Mahasiswa. Jurnal Algoritma, 21(1). https://doi.org/10.33364/algoritma/v.21-1.1618
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Christa Putri Rahayu, Kusnawi S.Kom, M.Eng (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
Share
Most read articles by the same author(s)
- Raffa Nur Listiawan Dhito Eka Santoso, Kusnawi, Optimization of Stress Classification Among Students Using Random Forest Algorithm , SITEKNIK: Sistem Informasi, Teknik dan Teknologi Terapan: Vol. 2 No. 2 (2025): April
- Karisma Septa Kresna, Kusnawi, Performance Analysis of SVM and Random Forest Algorithms in the Case of the Influence of Music on Mental Health , SITEKNIK: Sistem Informasi, Teknik dan Teknologi Terapan: Vol. 2 No. 2 (2025): April
- Tegar Wirawan, Kusnawi, Performance Analysis of Support Vector Machine and Gradient Boosting Machine Algorithms for Heart Disease Prediction , SITEKNIK: Sistem Informasi, Teknik dan Teknologi Terapan: Vol. 2 No. 2 (2025): April
- Muhammad Irvan Shandika, Kusnawi, S.Kom, M.Eng, AI Web-based Computer Service Management System at PUSCOM , SITEKNIK: Sistem Informasi, Teknik dan Teknologi Terapan: Vol. 2 No. 3 (2025): July
- RIYAN BAYU SATRIYA, Kusnawi Kusnawi, Random Search Optimization Using Random Forest Algorithm For Liver Disease Prediction , SITEKNIK: Sistem Informasi, Teknik dan Teknologi Terapan: Vol. 2 No. 3 (2025): July
- Aryawijaya, Kusnawi, Chili Leaf Disease Classification Using Transfer Learning with VGG16 and MobileNetV2 Combined with Random Search Hyperparameter Tuning , SITEKNIK: Sistem Informasi, Teknik dan Teknologi Terapan: Vol. 2 No. 4 (2025): October
Similar Articles
- Karisma Septa Kresna, Kusnawi, Performance Analysis of SVM and Random Forest Algorithms in the Case of the Influence of Music on Mental Health , SITEKNIK: Sistem Informasi, Teknik dan Teknologi Terapan: Vol. 2 No. 2 (2025): April
- Tegar Wirawan, Kusnawi, Performance Analysis of Support Vector Machine and Gradient Boosting Machine Algorithms for Heart Disease Prediction , SITEKNIK: Sistem Informasi, Teknik dan Teknologi Terapan: Vol. 2 No. 2 (2025): April
- Aryawijaya, Kusnawi, Chili Leaf Disease Classification Using Transfer Learning with VGG16 and MobileNetV2 Combined with Random Search Hyperparameter Tuning , SITEKNIK: Sistem Informasi, Teknik dan Teknologi Terapan: Vol. 2 No. 4 (2025): October
- Raffa Nur Listiawan Dhito Eka Santoso, Kusnawi, Optimization of Stress Classification Among Students Using Random Forest Algorithm , SITEKNIK: Sistem Informasi, Teknik dan Teknologi Terapan: Vol. 2 No. 2 (2025): April
- Muhammad Fauzan Nasrullah, RD. Rohmat Saedudin, Faqih Hamami, COMPARISON ACCURACY OF C4.5 ALGORITHM AND K-NEAREST NEIGHBORS FOR RAINFALL CLASSIFICATION , SITEKNIK: Sistem Informasi, Teknik dan Teknologi Terapan: Vol. 1 No. 2 (2024): July
- Hasyim Sri Wahyudi, Ferian Fauzi Abdulloh, Optimization of Random Forest Algorithm Using Random Search for Alzheimer's Disease Detection , SITEKNIK: Sistem Informasi, Teknik dan Teknologi Terapan: Vol. 2 No. 3 (2025): July
- RIYAN BAYU SATRIYA, Kusnawi Kusnawi, Random Search Optimization Using Random Forest Algorithm For Liver Disease Prediction , SITEKNIK: Sistem Informasi, Teknik dan Teknologi Terapan: Vol. 2 No. 3 (2025): July
- Muhammad Irvan Shandika, Kusnawi, S.Kom, M.Eng, AI Web-based Computer Service Management System at PUSCOM , SITEKNIK: Sistem Informasi, Teknik dan Teknologi Terapan: Vol. 2 No. 3 (2025): July
- Hasan Abdullah Muhammad, Fitri Adini Firdaus, Ni Ketut Mega Diana Putri, Customer Relationship Management (CRM) Strategy of PT ASDP Indonesia Ferry (Persero): A Customer Satisfaction and Digital Transformation Approach , SITEKNIK: Sistem Informasi, Teknik dan Teknologi Terapan: Vol. 2 No. 2 (2025): April
- Mega Fitri Yani, Cindy Muhdiantini, Syifa Nur Aini, Risk Management in Financial Technology: A Systematic Literature Reviewto Support Sustainability and Security of Digital Financial Services , SITEKNIK: Sistem Informasi, Teknik dan Teknologi Terapan: Vol. 2 No. 1 (2025): January
You may also start an advanced similarity search for this article.








