Data Collection Methods in the Big Data Era: A Systematic Literature Review on Technical and Ethical Challenges
DOI:
https://doi.org/10.58485/jie.v4i3.398Keywords:
Big data, systematic literature review, technical challenges, ethical issues, Islamic educationAbstract
The Big Data era, characterized by the massive Volume, Velocity, and Variety of information, has revolutionized decision-making processes across various sectors. However, this paradigm shift has created significant methodological gaps, particularly related to population bias and the absence of standardized frameworks for validating non-probabilistic data representations. This study aims to bridge these gaps through a Systematic Literature Review, employing academic documentation and theoretical triangulation to synthesize both the challenges and solutions in the data acquisition phase. The findings identify three dominant data collection methods Web Scraping, Application Programming Interface, and the Internet of Things (IoT) as direct responses to the 3V characteristics of Big Data. Crucial insights reveal a persistent tension between the massive data volume and its validity, further complicated by technical risks (such as application programming interface rate limiting) and legal or ethical concerns (including compliance with Terms of Service and data privacy regulations). Research implementation in this era must therefore adopt a strategic framework, emphasizing essential practices such as personal identifiable information de-identification to ensure privacy rights and the application of Exponential Backoff techniques to overcome application programming interface quota limitations. This review presents a comprehensive synthesis of the pre-analysis phase of Big Data research, underscoring that the integrity and reliability of scientific findings in this era depend heavily on the adoption of rigorous methodological and ethical frameworks.
Downloads
References
Alfiandi, A., & Hapzi Ali. (2024). Pengaruh Big Data, Pengambilan Keputusan dan Strategis Pemerintah terhadap Kesejahteraan Sosial. JIM: Jurnal Ilmu Multidisplin, 2(4), 384–390. https://doi.org/10.38035/jim.v2i4.430
Andriani, W. (2022). Penggunaan Metode Sistematik Literatur Review dalam Penelitian Ilmu Sosiologi. Jurnal PTK Dan Pendidikan, 7(2). https://doi.org/10.18592/ptk.v7i2.5632
Collier, J. E. (2020). Data Screening, Assessing Reliability, Validity, and Identification. In Applied Structural Equation Modeling Using AMOS (pp. 17–35). Routledge. https://doi.org/10.4324/9781003018414-2
Covey, B., Theriault, J., O’Blenis, P., & Hart, D. (2024). Big Data Processing Platform for Large Acoustic Datasets and Complex Data Pipelines: Leveraging Cutting Edge Open Source Software to Build Scalable Cost Effective Solutions. OCEANS 2024 - Halifax, 1–6. https://doi.org/10.1109/OCEANS55160.2024.10754471
Engkizar, E., Jaafar, A., Hamzah, M. I., Fakhruddin, F. M., Oktavia, G., & Febriani, A. (2023). Changes in Students’ Motivation to Memorize the Quran: A Study at Quranic Higher Education Institutions in Indonesia. International Journal of Islamic Studies Higher Education, 2(3), 240–258. https://doi.org/10.24036/insight.v2i3.240
Engkizar, E., Jaafar, A., Hamzah, M. I., Langputeh, S., Rahman, I., & Febriani, A. (2025). Analysis Problems of Quranic Education Teachers in Indonesia: A Systematic Literature Review. International Journal of Islamic Studies Higher Education, 4(2), 92–108. https://doi.org/10.24036/insight.v4i2.232
Engkizar, E., Jaafar, A., Muslim, H., Mulyadi, I., & Putra, Y. A. (2025). Ten Criteria for an Ideal Teacher to Memorize the Quran. Journal of Theory and Research Memorization Quran, 1(1), 26–39. https://joqer.intischolar.id/index.php/joqer
Engkizar, E., Jaafar, A., Taufan, M., Rahman, I., Oktavia, G., & Guspita, R. (2023). Quran Teacher: Future Profession or Devotion to the Ummah? International Journal of Multidisciplinary of Higher Education (IJMURHICA), 6(4), 633–644. https://doi.org/10.24036/ijmurhica.v6i4.321
Engkizar, E., Sarianti, Y., Namira, S., Budiman, S., Susanti, H., & Albizar, A. (2022). Five Methods of Quran Memorization in Tahfidz House of Fastabiqul Khairat Indonesia. International Journal of Islamic Studies Higher Education, 1(1), 54–67. https://doi.org/10.24036/insight.v1i1.27
Granados-Duque, V., & García-Perdomo, H. A. (2021). Systematic review and meta-analysis: Which pitfalls to avoid during this process. International Braz J Urol, 47(5), 1037–1041. https://doi.org/10.1590/S1677-5538.IBJU.2020.0746
Ibna, A. Z., & Nasution, M. I. P. (2024). Implikasi Penggunaan Basis Data dalam Era Big Data. Jurnal Sains Student Research, 2(4), 255–265.
Karthiyayini, J., & Anandhi, R. J. (2024). To Analyze the Various Machine Learning Algorithms That Can Effectively Process Large Volumes of Data and Extract Relevant Information for Personalized Travel Recommendations. SN Computer Science, 5(4), 336. https://doi.org/10.1007/s42979-024-02667-x
Lemon, L., & Hayes, J. (2020). Enhancing Trustworthiness of Qualitative Findings: Using Leximancer for Qualitative Data Analysis Triangulation. The Qualitative Report. https://doi.org/10.46743/2160-3715/2020.4222
Mach-Król, M., & Hadasik, B. (2021). On a certain research gap in big data mining for customer insights. Applied Sciences (Switzerland), 11(15). https://doi.org/10.3390/app11156993
Martinez-Mosquera, D., Navarrete, R., & Lujan-Mora, S. (2020). Modeling and management big data in databases-A systematic literature review. Sustainability (Switzerland), 12(2). https://doi.org/10.3390/su12020634
Pinto, G. P., Donta, P. K., Dustdar, S., & Prazeres, C. (2024). A Systematic Review on Privacy-Aware IoT Personal Data Stores. Sensors, 24(7), 1–23. https://doi.org/10.3390/s24072197
Poggi, C. (2024). Mammographic Findings, Recalls and Triangulation. In Breast Imaging Techniques for Radiographers (pp. 115–123). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-63314-0_13
Purssell, E., & McCrae, N. (2020). Reviewing Qualitative and Quantitative Studies and Mixed-Method Reviews. In How to Perform a Systematic Literature Review (pp. 113–121). Springer International Publishing. https://doi.org/10.1007/978-3-030-49672-2_9
Rumata, V. M. (2016). The Opportunities and Challenges of The Big Data Implementation in Social Science Reseacrh: A Literature Review. Jurnal Penelitian Komunikasi Dan Opini Publik, 20(1), 123337.
Shen, Y., & Smith, T. (2024). Qualitative research in finance: a systematic literature review. In How to Conduct Qualitative Research in Finance (pp. 17–45). Edward Elgar Publishing. https://doi.org/10.4337/9781803927008.00009
Tosi, D., Kokaj, R., & Roccetti, M. (2024). 15 years of Big Data: a systematic literature review. Journal of Big Data, 11(1), 1–45. https://doi.org/10.1186/s40537-024-00914-9
vanden Broucke, S., & Baesens, B. (2018). From Web Scraping to Web Crawling. In Practical Web Scraping for Data Science (pp. 155–172). Apress. https://doi.org/10.1007/978-1-4842-3582-9_6
Wee, C., Hwang, J., Kwon, E., & Shin, K. (2021). Reliability and Validity Study of Family Supportive Supervisor Behavior(FSSB). The Korean Data Analysis Society, 23(4), 1965–1982. https://doi.org/10.37727/jkdas.2021.23.4.1965
Zeng, J., & Yu, H. (2019). Effectively Unified optimization for Large-scale Graph Community Detection. 2019 IEEE International Conference on Big Data (Big Data), 475–482. https://doi.org/10.1109/BigData47090.2019.9005481
Zhao, B. (2022). Web Scraping. In Encyclopedia of Big Data (pp. 951–953). Springer International Publishing. https://doi.org/10.1007/978-3-319-32010-6_483
Downloads
Published
How to Cite
Issue
Section
License
Creative Commons Attribution 4.0 (CC BY)
