Optimizing Text Preprocessing for Accurate Sentiment Analysis on E-Wallet Reviews
Mengoptimalkan Preprocessing Teks untuk Analisis Sentimen yang Akurat pada Ulasan E-Wallet
Abstract
This Research aims to optimize preprocessing techniques in sentiment analysis of reviews for the E-Wallet Dana application on the Google Play Store. Text preprocessing is a crucial step in Natural Language Processing (NLP) that affects the accuracy and efficiency of sentiment analysis. This study employs various preprocessing methods, including stopwords removal, stemming, and lemmatization, to clean and prepare the review data before analysis. The results show that lemmatization techniques significantly improve accuracy compared to basic preprocessing techniques such as stopwords removal and stemming. With proper preprocessing optimization, sentiment analysis can provide more accurate and informative results, which can be used to enhance the application's quality and user experience. This study uses SVM classification testing models with 4 kernels, where the highest results were achieved with cleaning, case folding, tokenization, and lemmatization techniques at 100% for Linear; 100% for RBF, 99% for Polynomial, and 99.50% for Sigmoid with an average accuracy of 99.63%.
Highlights:
- Preprocessing Optimization: Lemmatization significantly improves sentiment analysis accuracy compared to basic techniques like stemming or stopword removal.
- High SVM Accuracy: The best preprocessing combination achieved an average accuracy of 99.63% across multiple SVM kernels, with linear and RBF kernels reaching 100%.
- Real-World Dataset: Analysis used 1000 authentic reviews from Google Play Store, highlighting practical insights for improving E-Wallet services like DANA.
Keywords: DANA, Google Play Store, Preprocessing, Sentiment Analysis
References
Abrilia, N. D., & Tri, S. (2020). Pengaruh Persepsi Kemudahan Dan Fitur Layanan Terhadap Minat Menggunakan E-Wallet Pada Aplikasi Dana Di Surabaya. Jurnal Pendidikan Tata Niaga, 8(3), 1006–1012.
Almuzaini, H. A., & Azmi, A. M. (2020). Impact of Stemming and Word Embedding on Deep Learning-Based Arabic Text Categorization, 8, 127913–127928. https://doi.org/10.1109/ACCESS.2020.3009217
Angelina, S. J., Bijaksana, A., Negara, P., & Muhardi, H. (2023). Analisis Pengaruh Penerapan Stopword Removal Pada Performa Klasifikasi Sentimen Tweet Bahasa Indonesia. 02(1), 165–173. https://doi.org/10.26418/juara.v2i1.69680
Anggraini, S. P., & Suaidah, S. (2022). Sistem Informasi Sentral Pelayanan Publik dan Administrasi Kependudukan Terpadu dalam Peningkatan Kualitas Pelayanan Kepada Masyarakat Berbasis Website. Jurnal Teknologi Dan Sistem Informasi, 3(1), 12–19.
BUNTORO, G. A., ARIFIN, R., SYAIFUDDIIN, KREJCAR, O., & FUJITA, H. (2021). Implementation of a Machine Learning Algorithm for Sentiment Analysis of Indonesia‘s 2019 Presidential Election. IIUM Engineering Journal, 22(1), 78–92. https://doi.org/10.31436/IIUMEJ.V22I1.1532
Duei Putri, Sulistiono, W. E. (2022). Analisis Sentimen Kinerja Dewan Perwakilan Rakyat (DPR) Pada Twitter Menggunakan Metode Naive Bayes Classifier. Jurnal Informatika Dan Teknik Elektro Terapan, 10(1), 34–40. https://doi.org/10.23960/jitet.v10i1.2262
Iskandar, J. W., & Nataliani, Y. (2021). Perbandingan Naïve Bayes, SVM, dan k-NN untuk Analisis Sentimen Gadget Berbasis Aspek. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(6), 1120–1126. https://doi.org/10.29207/resti.v5i6.3588
Manullang, O., Prianto, C., & Harani, N. H. (2023). Analisis Sentimen Untuk Memprediksi Hasil Calon Pemilu Presiden Menggunakan Lexicon Based Dan Random Forest. Jurnal Ilmiah Informatika, 11(02), 159–169. https://doi.org/10.33884/jif.v11i02.7987
Mubaroroh, H. H., Yasin, H., & Rusgiyono, A. (2022). Analisis Sentimen Data Ulasan Aplikasi Ruangguru Pada Situs Google Play Menggunakan Algoritma Naïve Bayes Classifier Dengan Normalisasi Kata Levenshtein Distance. Jurnal Gaussian, 11(2), 248–257. https://doi.org/10.14710/j.gauss.v11i2.35472
Noer Azzahra, F., Rohana, T., Ratna Juwita, (2024). Penerapan Metode Naive Bayes Dalam Klasifikasi Spam SMS Menggunakan Fitur Teks Untuk Mengatasi Ancaman Pada Pengguna. Journal of Information System Research (JOSH), 5(3), 880. https://doi.org/10.47065/josh.v5i3.5070
Prasetija, Z. R. N. S., Romadhony, A., & Setiawan, E. B. (2022). Analisis Pengaruh Normalisasi Teks pada Klasifikasi Sentimen Ulasan Produk Kecantikan. E-Proceeding of Engineering, 9(3), 1769–1775.
Putra Negara, A. B. (2023). The Influence Of Applying Stopword Removal And Smote On Indonesian Sentiment Classification. Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, 14(3), 172. https://doi.org/10.24843/lkjiti.2023.v14.i03.p05
Rosid, M. A., Fitrani, A. S., Astutik, I. R. I., Mulloh, N. I., & Gozali, H. A. (2020). Improving Text Preprocessing for Student Complaint Document Classification Using Sastrawi. IOP Conference Series: Materials Science and Engineering, 874(1). https://doi.org/10.1088/1757-899X/874/1/012017
Ulgasesa, Tursina, T. (2022). Pengaruh Stemming Terhadap Performa Klasifikasi Sentimen Masyarakat Tentang Kebijakan New Normal. Jurnal Sistem Dan Teknologi Informasi (JustIN), 10(3), 286. https://doi.org/10.26418/justin.v10i3.53880
Vonega, D. A., Fadila, A., & Kurniawan, D. E. (2022). Analisis Sentimen Twitter Terhadap Opini Publik Atas Isu Pencalonan Puan Maharani dalam PILPRES 2024. Journal of Applied Informatics and Computing, 6(2), 129–135. https://doi.org/10.30871/jaic.v6i2.4300
Copyright (c) 2023 Arrizqi Fauzy Aufar, Mochamad Alfan Rosid, Ade Eviyanti, Ika Ratna Indra Astutik
This work is licensed under a Creative Commons Attribution 4.0 International License.