Optimizing Text Preprocessing for Accurate Sentiment Analysis on E-Wallet Reviews


Mengoptimalkan Preprocessing Teks untuk Analisis Sentimen yang Akurat pada Ulasan E-Wallet


  • (1)  Arrizqi Fauzy Aufar            Program Studi Infornatika, Universitas Muhammadiyah Sidoarjo  
            Indonesia

  • (2) * Mochamad Alfan Rosid            Program Studi Infornatika, Universitas Muhammadiyah Sidoarjo  
            Indonesia

  • (3)  Ade Eviyanti            Program Studi Infornatika, Universitas Muhammadiyah Sidoarjo  
            Indonesia

  • (4)  Ika Ratna Indra Astutik            Program Studi Infornatika, Universitas Muhammadiyah Sidoarjo  
            Indonesia

    (*) Corresponding Author

Abstract

This Research aims to optimize preprocessing techniques in sentiment analysis of reviews for the E-Wallet Dana application on the Google Play Store. Text preprocessing is a crucial step in Natural Language Processing (NLP) that affects the accuracy and efficiency of sentiment analysis. This study employs various preprocessing methods, including stopwords removal, stemming, and lemmatization, to clean and prepare the review data before analysis. The results show that lemmatization techniques significantly improve accuracy compared to basic preprocessing techniques such as stopwords removal and stemming. With proper preprocessing optimization, sentiment analysis can provide more accurate and informative results, which can be used to enhance the application's quality and user experience. This study uses SVM classification testing models with 4 kernels, where the highest results were achieved with cleaning, case folding, tokenization, and lemmatization techniques at 100% for Linear; 100% for RBF, 99% for Polynomial, and 99.50% for Sigmoid with an average accuracy of 99.63%.

Highlights:

  • Preprocessing Optimization: Lemmatization significantly improves sentiment analysis accuracy compared to basic techniques like stemming or stopword removal.
  • High SVM Accuracy: The best preprocessing combination achieved an average accuracy of 99.63% across multiple SVM kernels, with linear and RBF kernels reaching 100%.
  • Real-World Dataset: Analysis used 1000 authentic reviews from Google Play Store, highlighting practical insights for improving E-Wallet services like DANA.

Keywords: DANA, Google Play Store, Preprocessing, Sentiment Analysis

References

Abrilia, N. D., & Tri, S. (2020). Pengaruh Persepsi Kemudahan Dan Fitur Layanan Terhadap Minat Menggunakan E-Wallet Pada Aplikasi Dana Di Surabaya. Jurnal Pendidikan Tata Niaga, 8(3), 1006–1012.

Almuzaini, H. A., & Azmi, A. M. (2020). Impact of Stemming and Word Embedding on Deep Learning-Based Arabic Text Categorization, 8, 127913–127928. https://doi.org/10.1109/ACCESS.2020.3009217

Angelina, S. J., Bijaksana, A., Negara, P., & Muhardi, H. (2023). Analisis Pengaruh Penerapan Stopword Removal Pada Performa Klasifikasi Sentimen Tweet Bahasa Indonesia. 02(1), 165–173. https://doi.org/10.26418/juara.v2i1.69680

Anggraini, S. P., & Suaidah, S. (2022). Sistem Informasi Sentral Pelayanan Publik dan Administrasi Kependudukan Terpadu dalam Peningkatan Kualitas Pelayanan Kepada Masyarakat Berbasis Website. Jurnal Teknologi Dan Sistem Informasi, 3(1), 12–19.

BUNTORO, G. A., ARIFIN, R., SYAIFUDDIIN, KREJCAR, O., & FUJITA, H. (2021). Implementation of a Machine Learning Algorithm for Sentiment Analysis of Indonesia‘s 2019 Presidential Election. IIUM Engineering Journal, 22(1), 78–92. https://doi.org/10.31436/IIUMEJ.V22I1.1532

Duei Putri, Sulistiono, W. E. (2022). Analisis Sentimen Kinerja Dewan Perwakilan Rakyat (DPR) Pada Twitter Menggunakan Metode Naive Bayes Classifier. Jurnal Informatika Dan Teknik Elektro Terapan, 10(1), 34–40. https://doi.org/10.23960/jitet.v10i1.2262

Iskandar, J. W., & Nataliani, Y. (2021). Perbandingan Naïve Bayes, SVM, dan k-NN untuk Analisis Sentimen Gadget Berbasis Aspek. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(6), 1120–1126. https://doi.org/10.29207/resti.v5i6.3588

Manullang, O., Prianto, C., & Harani, N. H. (2023). Analisis Sentimen Untuk Memprediksi Hasil Calon Pemilu Presiden Menggunakan Lexicon Based Dan Random Forest. Jurnal Ilmiah Informatika, 11(02), 159–169. https://doi.org/10.33884/jif.v11i02.7987

Mubaroroh, H. H., Yasin, H., & Rusgiyono, A. (2022). Analisis Sentimen Data Ulasan Aplikasi Ruangguru Pada Situs Google Play Menggunakan Algoritma Naïve Bayes Classifier Dengan Normalisasi Kata Levenshtein Distance. Jurnal Gaussian, 11(2), 248–257. https://doi.org/10.14710/j.gauss.v11i2.35472

Noer Azzahra, F., Rohana, T., Ratna Juwita, (2024). Penerapan Metode Naive Bayes Dalam Klasifikasi Spam SMS Menggunakan Fitur Teks Untuk Mengatasi Ancaman Pada Pengguna. Journal of Information System Research (JOSH), 5(3), 880. https://doi.org/10.47065/josh.v5i3.5070

Prasetija, Z. R. N. S., Romadhony, A., & Setiawan, E. B. (2022). Analisis Pengaruh Normalisasi Teks pada Klasifikasi Sentimen Ulasan Produk Kecantikan. E-Proceeding of Engineering, 9(3), 1769–1775.

Putra Negara, A. B. (2023). The Influence Of Applying Stopword Removal And Smote On Indonesian Sentiment Classification. Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, 14(3), 172. https://doi.org/10.24843/lkjiti.2023.v14.i03.p05

Rosid, M. A., Fitrani, A. S., Astutik, I. R. I., Mulloh, N. I., & Gozali, H. A. (2020). Improving Text Preprocessing for Student Complaint Document Classification Using Sastrawi. IOP Conference Series: Materials Science and Engineering, 874(1). https://doi.org/10.1088/1757-899X/874/1/012017

Ulgasesa, Tursina, T. (2022). Pengaruh Stemming Terhadap Performa Klasifikasi Sentimen Masyarakat Tentang Kebijakan New Normal. Jurnal Sistem Dan Teknologi Informasi (JustIN), 10(3), 286. https://doi.org/10.26418/justin.v10i3.53880

Vonega, D. A., Fadila, A., & Kurniawan, D. E. (2022). Analisis Sentimen Twitter Terhadap Opini Publik Atas Isu Pencalonan Puan Maharani dalam PILPRES 2024. Journal of Applied Informatics and Computing, 6(2), 129–135. https://doi.org/10.30871/jaic.v6i2.4300

Picture in here are illustration from public domain image (License) or provided by the author, as part of their works
Published
2023-10-02
 
Section
Articles